I recently came across a neat technique for process injection called NINA that uses NtSetContextThread to modify registers in a thread inside another process and does your dirty work without having to directly modify foreign memory on your own. Various sources on the internet suggest that NtSetContextThread doesn't let you set volatile registers on x64 (RBX, RCX, RDX, R8, R9, R10, R11 and XMM0-5), but the NINA article rightly points out that this is only the case when returning from a syscall. If the thread is currently scheduled and in usermode, you can modify all registers and they will be updated correctly when the thread is resumed.
Note: If you want to follow along at home I'm doing all this on kernel 20H2, the latest Windows 10 x64 kernel at the time of writing.
The function we want to look at is called KiSystemCall64, and it's the function that is called when (drum roll please)... you make a syscall in x64 mode. It's a big function that basically does the following:
- Save state
- Dispatch system call
- Wait until thread is ready to run
- Restore state
- Return (sysret)
This is why syscalls mess us up - if the thread is at all waiting it'll end up stuck at step 3 and guarantee any context changes we make are subject to the results of steps 4 and 5. The way the restore state part works is the key here. It makes a call to KiRestoreSetContextState which puts all the normal registers back to normal in two chunks:
You might notice that RBP and RSP don't get restored here, that's because they're saved in the prologue to KiSystemCall64 and get restored before the ret, just like any other __fastcall x64 function.
Otherwise this looks good, everything gets restored, so why is it all screwy when we leave? Let's look further down KiSystemCall64 and see what happens...
Here we blow away R10, check if the process has an InstrumentationCallback set; if so, then we store our sysret return address in R10 so that the InstrumentationCallback can return from it. In either case, R10 is no longer what we set it to. We skip over the next couple bits to get to the end and that's where everything gets destroyed:
Just before the sysret we do the following:
- RAX = result from the syscall function that was called
- R8 and R9 are set to the RSP and RBP (and then loaded into those registers right before the sysret)
- EDX = 0
- Clear XMM0-5
- RCX = return address for the sysret
- R11 = flags for after the sysret
So here's what happens, and why your volatile registers are either clobbered or zeroed out. I don't know why this is done, but I suspect it's useful to enforce the nature of volatile registers.
Here's the good news:
- You can set RIP/RBP/RSP so you can still send a thread wherever you like, even if it's inside a syscall
- RBP and RSP get stored in R8 and R9, so if you don't care about clobbering RSP/RBP you can use that to set args 3 and 4 in a function call
This does make NtSetContextThread somewhat less useful for setting up ROP/JOP chains with the exception of the NINA approach that makes use of a split instruction as an infinite loop gadget (nice find).
That's all for now, happy hacking!