Friday, 16 April 2010

A little return oriented exploitation on Windows x86 (Part 2)

In part 1 of this blog post I showed a simple return oriented attack which utilized some ROP in order to bypass permanent DEP and execute arbitrary code. The rest of this post will look at an experimental compiler convention (currently dubbed saferet) that aims to mitigate such return oriented attacks.

The technique itself is built around the commonly used concept of using a shadow stack to maintain, at run time, a list of valid return addresses for each thread. The first example I know of for using a shadow stack is Stack Shield by Vendicator circa 2000 to mitigate stack based buffer overflows. More recent work has been done for ROP specific mitigation's, which include the Transparent Runtime Shadow Stack (TRUSS) by Saravanan Sinnadurai, Qin Zhao, and Weng-Fai Wong in 2008 and a more recent effort called ROPdefender by Lucas Davi, Ahmad-Reza Sadeghi and Marcel Winandy in 2010. Both TRUSS and ROPdefender use a shadow stack system and run time binary instrumentation to monitor the control flow of a thread. saferet differs from the above, apart from being a compile time solution, by relying on the exception handling features of the Windows OS to ensure the integrity of a threads call stack.

saferet maintains a unique shadow stack, containing both the return address and current stack pointer, for each thread of execution by modifying the function prolog of all functions at compile time to perform a call to saferet_prolog(). The saferet prolog will overwrite the real return address on the stack with a cookie value at run time. We also modify at compile time how a RET instruction is performed, as a return instruction is the most common way (albeit not the only way) to create gadgets for use in a ROP attack. By inserting a HLT instruction (HLT being a privileged instruction which does not effect any EFLAGS) before every RET instruction we can ensure that an exception will be generated before any attempt to return to an address on the stack (remembering that due to saferet_prolog() the actual return address is no longer present on the stack). By providing an exception handler, called saferet_handler(), we can verify at run time that the return address has not been modified and that the stack pointer is as expected, before allowing execution to continue to return to the expected return address as previously recorded on the shadow stack.

We can see below an example of a simple function compiled with and without the saferet convention with the modification highlighted in red.

push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, [ebp+12]
add eax, ecx
mov esp, ebp
pop ebp
ret



push esp
call saferet_prolog
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov ecx, [ebp+12]
add eax, ecx
mov esp, ebp
pop ebp
hlt
ret

The size overhead for every function compiled with the saferet convention is an extra 7 bytes. The basic operation of the saferet convention is shown in the diagram below.

If the saferet_handler detects a return violation the process can be terminated. If no return violation has been detected the saferet_handler will restore the expected return address to the stack and then allow execution to proceed to the intended return instruction. The Vectored Exception Handler mechanism present in Windows OS is used to register the saferet_handler. This provides us with a process wide exception filter and has a number of advantages over stack based Structured Exception Handlers. Specifically, Microsoft's /GS protection has a weakness that allows (during a stack based buffer overflow) an attacker to gain EIP control via an overwritten Structured Exception Handler on the stack even if the GS stack cookie has been overwritten, saferet avoids this as the saferet_handler is able to detect the return violation before the SEH chain on the stack is processed, even if the attacker triggers an access violation by writing past the end of the allocated stack memory. This is possible because Vectored Exception Handlers are processed before Structured Exception Handlers. ROP attacks are mitigated by reducing the number of function tails that can be used as gadgets in a ROP attack, although ret2libc style attacks which call the whole function (and do not use gadgets) would not be mitigated.

Caveats
A number of caveats exist with this technique:

  • This is a compile time mitigation and as such all code for a module must be recompiled to take advantage, unlike run time mitigation's that use binary instrumentation.
  • The return instruction itself is not prevented from use by an attacker in a ROP attack, neither is the tail end of the saferet_handler, as the saferet_handler function cannot protect itself (unlike the saferet_prolog which can).
  • Due to an exception (technically two exceptions, see the source below) being generated for every call, a major performance hit is encountered due to the transition into kernel mode in order to process the exception and give our user mode exception handler control. Some changes at an OS level could help optimize this, for example a designated handler to only process saferet exceptions in order to avoid some overhead.

Implementation
The implementation itself is relatively simple and is composed of two separate functions as mentioned previously. A little more work is needed on the compiler side in order to support the saferet convention. I have modified the open source TinyC Compiler (TCC) (Written by Fabrice Bellard, Licensed under LGPL) to support saferet as a proof of concept. The saferet_prolog() function is as follows:

/*
* The global TLS index we use to make this all thread safe.
*/
DWORD tls = 0;

/*
* The global heap we use for all SAFERET releated allocations.
*/
HANDLE heap = NULL;

/*
* This is the function prolog which is used by the compiler for each function
* to be protected with SAFERET
*
* __sr_prolog__ uses the __stdcall calling convention and places a HLT instruction
* before all RET/RETN instructions, however it does not insert a call to saferet_prolog
* as this would introduce a recusive loop, instead we can mimic the results of a
* call to saferet_prolog from within saferet_prolog itself.
*/
VOID __declspec(__sr_prolog__) saferet_prolog( DWORD esp )
{
register THREADINFO * thread;

// for now just sanity check ESP has a value, but we should probably check this for a valid read/write address
if( !esp )
return;

// if this is the first time this thread has called saferet_prolog we create the TLS
if( !tls )
{
// get a TLS index to store this threads THREADINFO structure
tls = TlsAlloc();
if( tls == TLS_OUT_OF_INDEXES )
RaiseException( 0xDEADBEEF, EXCEPTION_NONCONTINUABLE, 0, NULL );
// if we dont have a process wide SAFERET heap yet, create one...
if( !heap )
{
heap = HeapCreate( HEAP_GENERATE_EXCEPTIONS, 0, 0 );
// and register the process wide vectored exception handler
AddVectoredExceptionHandler( 1, &saferet_handler );
}
}

// grab this threads THREADINFO structure and if one doesnt exist create it
thread = TlsGetValue( tls );
if( !thread )
{
LARGE_INTEGER counter;

QueryPerformanceCounter( &counter );

thread = (THREADINFO *)HeapAlloc( heap, HEAP_GENERATE_EXCEPTIONS, sizeof(THREADINFO)+(sizeof(RECORD)*RECORDS) );

thread->index = 0;
thread->total = RECORDS;

// set a simple cookie value for this thread.
thread->cookie = GetTickCount() ^ GetCurrentProcessId() ^ GetCurrentThreadId() ^ counter.LowPart;

// set this threads THREADINFO structure with our tls index
if( !TlsSetValue( tls, thread ) )
RaiseException( 0xDEADC0DE, EXCEPTION_NONCONTINUABLE, 0, NULL );
}

// push a new record into our record stack for the callee's caller
thread->records[ thread->index ].ret = *(DWORD *)esp;
thread->records[ thread->index++ ].esp = esp;
// patch over the real return address with this threads cookie
*(DWORD *)esp = thread->cookie;

// modify the ESP value to point to the return value for this call to saferet_prolog
esp -= 8;

// push a new record into our record stack for the call to saferet_prolog
thread->records[ thread->index ].ret = *(DWORD *)esp;
thread->records[ thread->index++ ].esp = esp;
// patch over the real return address with this threads cookie
*(DWORD *)esp = thread->cookie;

// if we need to we grow the record stack
if( thread->index >= thread->total )
{
thread->total += RECORDS;
thread = (THREADINFO *)HeapReAlloc( heap, HEAP_GENERATE_EXCEPTIONS|HEAP_ZERO_MEMORY, thread, sizeof(THREADINFO)+(sizeof(RECORD)*thread->total) );
TlsSetValue( tls, thread );
}
}

And the saferet_handler() is as follows:

/*
* This is the exception handler used by SAFERET to filter all exceptions raised by a process.
*
* __sr_handler__ uses the __stdcall calling convention and is intended to be used by the
* saferet_handler function. This function has to opt out of the saferet convention itself.
*/
LONG __declspec(__sr_handler__) saferet_handler( EXCEPTION_POINTERS * Exception )
{
register THREADINFO * thread;
register DWORD esp;

// Get the info structure for this thread
thread = TlsGetValue( tls );

// Get the esp value for the last record
esp = thread->records[ thread->index-1 ].esp;

// Test if the cookie we wrote to the stack via saferet_prolog() is still present.
// We do the test here (as opposed to soley during an exception from a HLT instruction)
// as saferet_handler may be called during a access violation and not just upon a
// desired HLT/RET. This lets us detect a return violation before passing control to a
// structured exception handler which, as the SEH chain is also stored on the stack, may
// have been overwritten, allowing the attacker to gain EIP control. As our handler is a
// vectored exception handler, we are given the access violation to process before the
// (potentially corrupted) SEH chain gets a chance to process it.
if( *(DWORD *)(esp) != thread->cookie )
{
char message[1024];
char program[MAX_PATH];
GetModuleFileName( NULL, program, MAX_PATH );
snprintf( message, 1024, "The return address 0x%08X located at 0x%08X does not match the cookie 0x%08X.\n\nThis process will be terminated.\n\nEIP:0x%08X, Exception:0x%08X, Process:%d, Thread:%d\nFile:%s\n", *(DWORD *)(esp), esp, thread->cookie, Exception->ContextRecord->Eip, Exception->ExceptionRecord->ExceptionCode, GetCurrentProcessId(), GetCurrentThreadId(), program );
MessageBoxA( NULL, message, "Return Violation!", MB_OK|MB_ICONEXCLAMATION );
ExitProcess( -1 );
}

// Check if we are processing an exception generated via a privileged instruction
if( Exception->ExceptionRecord->ExceptionCode == EXCEPTION_PRIV_INSTRUCTION )
{
// And only process 'HLT' instructions (We could also check here
// if the next instruction is indeed a RET or RETN instruction).
if( *(BYTE *)(Exception->ContextRecord->Eip) == HALT_OPCODE )
{
// If we are returning to an address from a different ESP we can signal a return violation.
// We will allready have checked the cookie value (above) if ESP is correct.
if( esp != Exception->ContextRecord->Esp )
{
char message[1024];
char program[MAX_PATH];
GetModuleFileName( NULL, program, MAX_PATH );
snprintf( message, 1024, "The stack pointer 0x%08X does not match the expected stack pointer 0x%08X.\n\nThis process will be terminated.\n\nEIP:0x%08X, Exception:0x%08X, Process:%d, Thread:%d\nFile:%s\n", Exception->ContextRecord->Esp, esp, Exception->ContextRecord->Eip, Exception->ExceptionRecord->ExceptionCode, GetCurrentProcessId(), GetCurrentThreadId(), program );
MessageBoxA( NULL, message, "Return Violation!", MB_OK|MB_ICONEXCLAMATION );
ExitProcess( -1 );
}

// Patch in the real return address and pop it off the threads record list
*(DWORD *)(Exception->ContextRecord->Esp) = thread->records[ --thread->index ].ret;

// Advance execution past the 'HLT' instruction
Exception->ContextRecord->Eip += sizeof(BYTE);

// Continue execution and allow the return to the real return address
return EXCEPTION_CONTINUE_EXECUTION;
}
}

// If we get here we allow the remaining vectored exception handlers to process this
// exception (if any present) before handing the exception off to be processed by the
// threads current SEH chain before finally ending with the unhandled exception filter.
return EXCEPTION_CONTINUE_SEARCH;
}

Download
The modified version of TCC can be downloaded here, which includes several simple applications (namely 'hanoi.exe', 'overflow_test.exe' and 'overflow_seh.exe') to see saferet in action: tcc-0.9.25-saferet.zip. The examples can be compiled with TCC using saferet via the -SR switch, e.g. "tcc -SR .\win32\examples\overflow_seh.c". If you want to rebuild TCC from source you will need to use MinGW.

No comments: