RE: February 2013

For each thread quantum (in this case, the time that a single logical thread gets on a physical processor), windows will keep track of each time KiSwapContext is called and returns to the saved thread state (stack, registers) for that thread. Each time this happens, SwapContext will increment the ContextSwitchCount member of the KTHREAD structure. We will be using the following native API's:

NtQuerySystemInformation
NtQueryInformationThread

I recommend using 2 threads for probing ContextSwitchCount as an anti-debug mechanism, it's not required but otherwise you have to ensure the current thread is near the beginning of it's cycle time. Otherwise a context switch could occur at the next DPC interrupt. As for probing cycle time itself as an anti-debug mechanism, 2 threads is required.

First I will explain probing ContextSwitchCount then afterwards, the thread cycle time.

Step 1 is to create an additional thread in our application. These will be extremely simple and vague examples ;p

All this thread will do is wait on a synchronization object.

ULONG Waiter(HANDLE event1)
{
        WaitForSingleObject(event1,INFINITE);
}

int main()
{
     HANDLE event1=CreateEvent(NULL,FALSE,FALSE,NULL);

     CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)Waiter,(LPVOID)event1,0,NULL);

    //...

    //...
}

Step 2. We will call NtQuerySystemInformation and locate our SYSTEM_PROCESS_INFORMATION structure. We will then navigate to the SYSTEM_THREAD_INFORMATION structure for the thread we have just created. We will wait until it has entered a waiting state (0x5). Once we have established that the thread is waiting, we will store it's ContextSwitchCount.

int main()
{
      //...
      //...

      SYSTEM_PROCESS_INFORMATION π
      SYSTEM_THREAD_INFORMATION &ti;
      ULONG SwitchCount;

      //Call NtQuerySystemInformation. Assign a structure pointer.

     do
   {
   NtQuerySystemInformation(SystemProcessandThreadInformation,&heapbuffer,heapbuffersize,&len);

   } while(ti->ThreadState!=0x5);

    SwitchCount=ti->ContextSwitches;

}

Like I said, vague examples ;p

Step 3. At this point we have established the fact that our secondary thread is waiting on our synchronization object. We have also stored and saved it's last ContextSwitchCount. When a thread is waiting on a synchronization object, it is not added to the ready queue until either a kernel APC is queued to the thread, or the sync object is signaled.

In our main thread we will trigger an exception, this can be anything. For the sake of simplicity we will just use int3.

int main()
{
       //...
       //...

       _asm
   {
   push handler
   push fs:[0x0]
   mov fs:[0x0], esp
   int 3
   }
}

I don't really know why I'm putting a code example for that one, but there it is. At this point, lets assume since int3 is a trap exception, (but even though SEH uses ExceptionAddress, so EIP-1), we advance our instruction pointer ahead one byte, then resume execution.

Step 4. We once again call NtQuerySytemInformation and walk through the SystemProcessandThreadInformation buffer to locate our process and our waiting thread, and probe it's context switch count.

int main()
{
   //...
   //...

   //...

   //Call NtQuerySystemInformation, walk buffer to our thread data

   if(ti->ContextSwitches>SwitchCount)
   {
//debugger detected, do something
   }

}

As you can see, we compare our waiting thread's current context switch count to the previous value we probed. If it is higher, a debugger was attached to the process when we generated our exception and here is why:

When a thread generates an exception and a debug port is present for the process, it calls DbgkSuspendProcess to suspend all remaining threads in the process, while the thread that generated the exception will go on to wait on the debug object's synchronization mutex until the debugger continues the exception.

The context switch count is incremented because thread suspension is done via kernel APC's. As stated earlier, the waiting thread will be entered into the ready queue in one of 2 cases. Kernel APC's or the object being signaled. The same goes for cycle time. Using the above logic, we can probe the thread's cycle time, generate an exception and then probe it again. If incremented, a debugger is present. To probe cycle time we use NtQueryInformationThread with an infoclass of 0x17.

If no debugger is present, the faulting thread does not suspend remaining threads in wait for the debugger, instead it will resume its execution at KiUserExceptionDispatcher, and the thread we probed which is waiting on the synchronization object will have it's context switch count and cycle time unchanged.

This entry was updated on March 14, 2015

A week or so ago I posted an article on CodeProject related to InstrumentationCallback and how this feature facilitates code instrumentation for important transitions, as well as works as an interesting anti-debug and analysis mechanism.

You can find the article here.

What I failed to mention in the article, is that while 32 bit processes running in the WOW64 layer can also make use of this functionality, they are left void of the KiRaiseUserExceptionDispatcher and system call transitions. This is not a major problem because under WOW64, system calls can still be instrumented in a number of interesting ways without kernel code. One of them being usage of the wow64log library. You can read more about that here.

Under WOW64, you can still instrument:

LdrInitializeThunk
KiUserExceptionDispatcher
KiUserApcDispatcher
KiUserCallbackDispatcher

RE

Monday, February 25, 2013

Infer debugger presense by counting context switches and cycle time

Tuesday, February 19, 2013

RTL_USER_PROCESS_PARAMETERS anti-debug mechanism

Branch tracing and LBR's

Instrumentationcallback and advanced debugging