Guide to reversing VMProtect (old versions)
This is one I've been working on for a while, and most of the ideas here come from Rolf Rolles, so I'd encourage you to read through his article series on this and other virtualization obfuscators.
I'm not going to go into details of how VMProtect works, have a look at xeroxz's or r0da's posts on the subject if you want more details.
What we're looking at here is how to convert the bytecode into symbolic operations, and then evaluate these to figure out what the code is doing. The bulk of this is something that you'll need to do yourself by trial and error, but I'm hoping these notes will help anyone who gets stuck. Let's get started.
Convert to asm (or something close to it)
I've seen other people trying to "lift" vmprotect bytecode to a higher level intermediate language, but I've found this has limited my options when it comes to decompilation. Rolf Rolles recommends converting each bytecode instruction to a set of asm-like instructions (fairly similar to what the VM handlers do), and this has made the later steps a lot easier. It makes the code a little bulky, but we can always fix this later with copy and constant propagation.
The big advantage I've found with this approach is that it's kept me focused on what each individual instruction is doing: What does a push
mean here? What does a mov
mean here? There are a couple of edge cases but focusing on individual commands has helped me come up with some consistent rules that make life a lot easier when decompiling/emulating.
Here are some examples:
Load constant onto stack:
mov eax, dword(0x35010FF3)
push eax
Pop into scratch register:
pop eax
mov scratch:[0x0A], eax
Add with flags
pop eax
pop edx
add eax, edx
push eax
pushfw
Nor
pop ax
pop dx
nor ax, dx
push ax
These don't exactly match how the VM handlers have been done, but what matters is that they return the same results. Care needs to be taken with a couple of operations. Even the bitwise NOR
which is done back-to-front (NOT
on both operands and then an AND
) doesn't really matter, but in the VM handler having the AND
at the end gives you the flags for free (the NOT
opcode doesn't set any flags).
Keep track of sizes
For the most part you can manage the stack with objects rather than trying to be clever and convert into bytes (like a complex symbolic execution engine like Triton would do). There are a few cases to consider though:
- Sometimes an argument will come in as a DWORD (in the initial stack frame), get stored in a scratch register, and then accessed as a WORD
- Sometimes a WORD will get sign-extended by pushing a 0 WORD before hand and then popping into a DWORD
These are all easily solved if you keep track of sizes and have your own operators to manage these (I use a DoubleWord operator for combining two words, and a LoWord operator to do this in reverse).
Here's an example of flags being added to the stack pointer (branch emulation, discussed more below)
push esp
mov ax, word(0x0000)
push ax
mov ax, scratch:[0x09]
push ax
pop eax
pop edx
add eax, edx
push eax
Eliminate the stack
There are two important things to know about the VMProtect stack machine:
- It's an implementation detail
- It doesn't exist in the original x86 code
Rolles mentions this as a useful optimization step and I agree. There's an edge case with stack pointers (we'll cover that in a second) but otherwise the stack is just a pain. Something gets pushed onto the stack? Call it a variable. You pop 2 things off, add them, then put them back on? That's just c = a + b
. Get this part right and the rest is very easy.
For example, say we want to add two numbers:
mov eax, dword(0x12345678)
push eax
mov eax, dword(0x4444AAAA)
push eax
pop eax
pop edx
add eax, edx
push eax
I would symbolize it like this:
stack = []
symbol0 = Symbol(Immediate(4, 0x12345678))
stack.append(symbol0)
symbol1 = Symbol(Immediate(4, 0x4444AAAA))
stack.append(symbol1)
symbol2 = Symbol(AddOperation(stack.pop(), stack.pop()))
stack.append(symbol2)
You can choose how you do this, I create a new symbol with every push (unless it's already a symbol, e.g. coming from a scratch register), but ultimately having nested symbols isn't the end of the world. The big bonus with creating symbol objects is that you can store them in a table and use them to cache evaluated symbols to speed things up later.
Symbolize from the top down, evaluate from the bottom up
If you were simply writing an emulator then you'd put real values in the top and run the whole way through and get your results at the bottom. We can do the same thing when symbolizing, we can put placeholders in at the top, run through and build up a symbol table, and then we're left with our outputs at the bottom. Once we have these though, it's trivial to go backwards and resolve all of the symbols that were used to get to the result. The nice thing with this approach is that you don't need to think too hard about dead store removal as dead-end symbols just don't get referenced from the bottom so they get ignored implicitly.
With VMProtect we know it's a stack machine and that the result of any operation gets loaded back onto the stack. Because of this, we can create a new symbol every time we see a push
opcode.
Apps will also need to do memory IO, so I keep a journal of memory writes (when we get a pop [eax]
instruction). These can be populated at emulation time.
When it comes to emulating I do the following:
- Memory writes in order
- Stack variables at the end (in any order)
Treat normal memory different from stack memory
Because I disassemble into an x86-like language, I end up just handling push [eax]
and push esp
as Dereference
and Pointer
objects. These are fine to leave hanging around when symbolizing, but when emulating we need to resolve them differently.
As a rule, we should only see Pointer
objects generated when we see push esp
, so these are always going to be stack pointers. We can dereference both memory and stack pointers though, so I check for these at emulation time; if we have an immediate value then I pull this from the emulated memory of the app, otherwise we have a pointer and we just get them to cancel each other out.
Note that the VM handlers exist for accessing with any segment register, and the stack pointer work appears to use SS
for all stack pointer work, but YMMV on this.
There's just one edge case to handle when dealing with stack pointers:
Be careful with branch emulation
VMProtect emulates branching by loading two function addresses onto the stack, executing the opcode that would set the flags (e.g. a CMP
to compare registers is emulated as a SUB
which in practice is ADD a, -b
), converting the flags into an offset (either 0 or 4), and then advancing the stack pointer by that much, reading that value, then sometimes overwriting another stack pointer (?!).
For example,
push esp
mov ax, word(0x0000)
push ax
mov ax, scratch:[0x09]
push ax
pop eax
pop edx
add eax, edx
push eax
pop, eax,
push, ss:[eax]
This pushes the stack pointer, loads the flag-based offset from a previous calculation, and adds them to the stack pointer. We then read that pointer and put it onto the stack.
This part is easy enough to handle, we create a Pointer object which keeps the original stack, and when we handle addition we just advance the pointer. Here's how I did it in python:
class Pointer:
def __init__(self, param, stack):
self.param = param
self.stack = stack
def advance(self, immediate):
value = int(immediate.value, 0x10)
if value % 4:
raise Exception("Advancing pointer by weird number: %d" % value)
if value == 0:
return self
offset = value // 4
if offset > len(self.stack):
raise Exception("Advancing pointer by %d with stack size %d" % offset, len(self.stack))
newstack = copy(self.stack)
for x in range(offset):
newstack.pop()
last_arg = newstack[-1]
return Pointer(last_arg, newstack)
So this will propagate pointers down until they're needed. We do end up with another problem when it comes to writing to a stack pointer:
push esp
mov eax, dword(0x00000008)
push eax
pop eax
pop edx
add eax, edx
push eax
pop eax
pop ss:[eax]
pop eax
This code does the following:
- Loads the stack pointer
- Adds 8 to it
- Writes the previous result to it
- Cleans up the stack
Here's the whole picture put together just to show how it works:
- Stack = [branch1, branch2]
- Load stack pointer, stack = [&stack[1], branch1, branch2]
- Load flags result, stack = [0 or 4, &stack[1], branch1, branch2]
- Add flags result to stack pointer, stack = [&stack[1 or 2], branch1, branch2]
- Dereference pointer, stack = [branch1 or branch2, branch1, branch2]
- Load stack pointer, stack = [&stack[1], branch1 or branch2, branch1, branch2]
- Load 8, stack = [8, &stack[1], branch1 or branch2, branch1, branch2]
- Add offset to stack, stack = [&stack[3], branch1 or branch2, branch1, branch2]
- Overwrite stack[0] with stack[1], stack = [branch1, branch1 or branch2]
- Pop head of stack, stack = [branch1 or branch2]
This is a little convoluted but this leaves us with a stack with a pointer based on the result of a comparison earlier on.
There's one big problem: we got rid of the stack at the start! The way I dealt with this was keeping one thing in mind - the stack depth is the only thing we really care about.
The way I handle this is that when I process a dereference (pop [eax]
) I go and wrap everything in the stack inside a new object called a ConditionalOverride
which includes the old value, the potential new value, the pointer, and the stack depth of this object. When it comes time to evaluate the result, I evaluate the pointer, and test the stack size of the pointer against the stack depth of this object. If they're the same then I evaluate the overwritten value, otherwise I evaluate the original value. The code looks like this:
def process_pop(self, dest):
if is_deref(dest):
size = 4 # we are guessing this
override_symbol = self.pop_size(size)
pointer_symbol = self.registers[get_deref_value(dest)]
self.memory.append((pointer_symbol, override_symbol))
# need to wrap everything on the stack in case we overwrote it
new_stack = []
while self.stack:
stack_size = sum(map(lambda x: x.size, self.stack))
default_symbol = self.stack.pop()
override = ConditionalOverride(pointer_symbol, override_symbol, stack_size, default_symbol)
wrapped_symbol = self.symbols.put(override, size)
new_stack.insert(0, wrapped_symbol)
self.stack = new_stack
Then when it comes time to evaluate one of these:
def evaluate_conditional_override(self, conditional_override):
conditional_value = self.evaluate(conditional_override.conditional_symbol)
# we need the size of the stack
burnable_stack = copy(conditional_value.stack)
stack_size = 0
while burnable_stack:
value = burnable_stack.pop()
stack_size += value.size
if stack_size == conditional_override.stack_trigger_size:
return self.evaluate(conditional_override.override_symbol)
else:
return self.evaluate(conditional_override.default_symbol)
The other option is creating a new entity every time something hits the stack and overwriting it just-in-time, but that would mean going back through and updating everything in order (as well as global state, ew). This way keeps a bit more state but means every terminal symbol contains all the information it needs, provided any global memory has been updated correctly.
Store a journal of global memory writes
One thing that got me stuck early on was how to resolve something like this:
pop [eax]
Every other command moves a symbol to somewhere, be it a register, a stack register, or the stack. Here we store some information somewhere but because we haven't resolved anything we don't actually know where it's going.
I store these in a simple list as pairs of (location, value)
. Depending on how complicated your code is, it's probably worth storing the height of the symbol table here in case this gets read in between writes. Then when it comes to emulation we just emulate these. If the location is a stack pointer then we discard it, otherwise we log a write at that point in time. I haven't built this part yet, but a copy-on-write setup would be really useful here.
When it comes to reading it's super easy as we can read directly from memory (I'm using Lief for this), but it should be simple to wrap this in a copy-on-write layer that handles our writes.
Bonus: some samples
I've made some simple apps to test and I thought I'd share how they work out.
Here's an app that creates a MessageBox. Here are the terminal symbols (the ones that either end up on the stack or getting written to memory:
(note: this ended up being so large that I ended up writing a simplifier to count multiple references and move them to the top)
Symbols with multiple references:
Symbol9 size=4 value=(
arg1
)
Symbol10 size=4 value=(
Immediate(4, 0x0x00)
)
Symbol11 size=4 value=(
Immediate(4, 0x00000000)
)
Symbol34 size=4 value=(
Symbol33 size=4 value=(
Symbol32 size=4 value=(
Symbol31 size=4 value=(
Immediate(4, 0xA166874E)
)
+
Symbol30 size=4 value=(
Immediate(4, 0xF93E78B3)
)
)
+
Symbol29 size=4 value=(
Dereference(
Symbol28 size=4 value=(
Symbol10 size=4
+
Symbol27 size=4 value=(
Symbol26 size=4 value=(
Immediate(4, 0x00202075)
)
<<
Symbol25 size=2 value=(
Immediate(2, 0x0001)
)
)
)
)
)
)
+
Symbol24 size=4 value=(
Symbol23 size=4 value=(
Symbol22 size=4 value=(
Symbol21 size=4 value=(
Symbol20 size=4 value=(
Immediate(4, 0xA7A9831E)
)
+
Symbol19 size=4 value=(
Immediate(4, 0x946D2BFA)
)
)
+
Symbol18 size=4 value=(
Dereference(
Symbol17 size=4 value=(
Symbol10 size=4
+
Symbol16 size=4 value=(
Symbol15 size=4 value=(
Immediate(4, 0x0808A980)
)
>>
Symbol14 size=2 value=(
Immediate(2, 0x0005)
)
)
)
)
)
)
+
Symbol13 size=4 value=(
Immediate(4, 0xFAAF021A)
)
)
nor
Symbol12 size=4 value=(
Immediate(4, 0xFFBFCFFF)
)
)
)
Symbol57 size=4 value=(
Symbol56 size=4 value=(
Symbol55 size=4 value=(
Symbol54 size=4 value=(
Immediate(4, 0x000D662E)
)
>>
Symbol53 size=2 value=(
Immediate(2, 0x0001)
)
)
+
Symbol52 size=4 value=(
Dereference(
Symbol51 size=4 value=(
Symbol10 size=4
+
Symbol50 size=4 value=(
Symbol49 size=4 value=(
Immediate(4, 0x96336EDA)
)
+
Symbol48 size=4 value=(
Immediate(4, 0x6A0CD7EB)
)
)
)
)
)
)
+
Symbol47 size=4 value=(
Symbol46 size=4 value=(
Symbol45 size=4 value=(
Symbol44 size=4 value=(
Symbol43 size=4 value=(
Immediate(4, 0xF3FCA000)
)
>>
Symbol42 size=2 value=(
Immediate(2, 0x000A)
)
)
+
Symbol41 size=4 value=(
Dereference(
Symbol40 size=4 value=(
Symbol10 size=4
+
Symbol39 size=4 value=(
Symbol38 size=4 value=(
Immediate(4, 0x101009C0)
)
>>
Symbol37 size=2 value=(
Immediate(2, 0x0006)
)
)
)
)
)
)
+
Symbol36 size=4 value=(
Immediate(4, 0x0080602E)
)
)
>>
Symbol35 size=2 value=(
Immediate(2, 0x0001)
)
)
)
Symbol58 size=4 value=(
Immediate(4, 0x00000000)
)
Symbol62 size=4 value=(
Symbol10 size=4
+
Symbol61 size=4 value=(
Symbol60 size=4 value=(
Immediate(4, 0x1F242027)
)
nor
Symbol59 size=4 value=(
Immediate(4, 0xFFBFB277)
)
)
)
Symbol66 size=4 value=(
Symbol10 size=4
+
Symbol65 size=4 value=(
Symbol64 size=4 value=(
Immediate(4, 0x28BD90A4)
)
nor
Symbol63 size=4 value=(
Immediate(4, 0xFFBFB3A4)
)
)
)
Symbol76 size=4 value=(
Symbol75 size=4 value=(
Symbol74 size=4 value=(
Immediate(4, 0xDBDA6419)
)
+
Symbol73 size=4 value=(
Immediate(4, 0x24259BE8)
)
)
+
Symbol72 size=4 value=(
Dereference(
Symbol71 size=4 value=(
Symbol70 size=4 value=(
Symbol69 size=4 value=(
Immediate(4, 0x00000002)
)
<<
Symbol68 size=2 value=(
Immediate(2, 0x0001)
)
)
+
Symbol67 size=4 value=(
Pointer(
Stack=[
Symbol11 size=4
Symbol34 size=4
Symbol57 size=4
Symbol58 size=4
Symbol62 size=4
Symbol66 size=4
]
)
)
)
)
)
)
Memory symbols:
location:
Symbol76 size=4
value:
Symbol66 size=4
Stack symbols:
Symbol81 size=4 value=(
ConditionalOverride(
conditional_symbol=(
Symbol76 size=4
)
override_symbol=(
Symbol66 size=4
)
default_symbol=(
Symbol11 size=4
)
stack_trigger_size=4
)
)
Symbol80 size=4 value=(
ConditionalOverride(
conditional_symbol=(
Symbol76 size=4
)
override_symbol=(
Symbol66 size=4
)
default_symbol=(
Symbol34 size=4
)
stack_trigger_size=8
)
)
Symbol79 size=4 value=(
ConditionalOverride(
conditional_symbol=(
Symbol76 size=4
)
override_symbol=(
Symbol66 size=4
)
default_symbol=(
Symbol57 size=4
)
stack_trigger_size=12
)
)
Symbol78 size=4 value=(
ConditionalOverride(
conditional_symbol=(
Symbol76 size=4
)
override_symbol=(
Symbol66 size=4
)
default_symbol=(
Symbol58 size=4
)
stack_trigger_size=16
)
)
Symbol77 size=4 value=(
ConditionalOverride(
conditional_symbol=(
Symbol76 size=4
)
override_symbol=(
Symbol66 size=4
)
default_symbol=(
Symbol62 size=4
)
stack_trigger_size=20
)
)
Symbol95 size=4 value=(
Symbol10 size=4
+
Symbol94 size=4 value=(
Symbol93 size=4 value=(
Symbol92 size=4 value=(
Symbol91 size=4 value=(
Symbol90 size=4 value=(
Immediate(4, 0x76AEFC80)
)
+
Symbol89 size=4 value=(
Immediate(4, 0x895107C9)
)
)
+
Symbol88 size=4 value=(
Dereference(
Symbol87 size=4 value=(
Symbol10 size=4
+
Symbol86 size=4 value=(
Symbol85 size=4 value=(
Immediate(4, 0xF20A0A20)
)
nor
Symbol84 size=4 value=(
Immediate(4, 0xFFBFBBA4)
)
)
)
)
)
)
+
Symbol83 size=4 value=(
Immediate(4, 0xAC00EADC)
)
)
nor
Symbol82 size=4 value=(
Immediate(4, 0xFFBFEFDF)
)
)
)
Symbol1 size=4 value=(
arg9
)
Symbol2 size=4 value=(
arg8
)
Symbol3 size=4 value=(
arg7
)
Symbol4 size=4 value=(
arg6
)
Symbol5 size=4 value=(
arg5
)
Symbol9 size=4
Symbol7 size=4 value=(
arg3
)
Symbol8 size=4 value=(
arg2
)
Symbol9 size=4
Symbol10 size=4
This is enormous, and you can read about why this is done elsewhere. Long story short though, here's what happens when we emulate it:
Memory writes:
[(Immediate(4, 0x00404D89), Immediate(4, 0x00404C5B))]
Stack:
[
// stack state when we hit RET
Immediate(4, 0x00000000),
Immediate(4, 0x00403000),
Immediate(4, 0x00403017),
Immediate(4, 0x00000000),
Immediate(4, 0x00404D88),
Immediate(4, 0x00401020),
// these get loaded into registers
Symbol(1 size=4, value=arg9),
Symbol(2 size=4, value=arg8),
Symbol(3 size=4, value=arg7),
Symbol(4 size=4, value=arg6),
Symbol(5 size=4, value=arg5),
Symbol(9 size=4, value=arg1),
Symbol(7 size=4, value=arg3),
Symbol(8 size=4, value=arg2),
Symbol(9 size=4, value=arg1),
Immediate(4, 0x0x00)]
So it does the following:
- Sets
0x00404D89
to0x00404C5B
(the address of the next set of VM code, this is set to junk at compile time) - Calls
0x00401020
(this is the thunk for MessageBoxA) with the decoded parameters - Returns to
0x00404D88
(this pushes the address above and then jumps to vmenter)
We have time for one more, here's some code that does a CMP
and a JE
:
Symbols with multiple references:
Symbol1 size=4 value=(
arg9
)
Symbol2 size=4 value=(
arg8
)
Symbol3 size=4 value=(
arg7
)
Symbol4 size=4 value=(
arg6
)
Symbol5 size=4 value=(
arg5
)
Symbol10 size=4 value=(
Immediate(4, 0x0x00)
)
Symbol19 size=2 value=(
Symbol18 size=2 value=(
Symbol17 size=2 value=(
Immediate(2, 0xFBFF)
)
nor
Symbol16 size=2 value=(
Symbol15 size=2 value=(
LoWord(
Symbol1 size=4
)
)
nor
Symbol14 size=2 value=(
LoWord(
Symbol1 size=4
)
)
)
)
+
Symbol13 size=2 value=(
FlagResult(
Symbol12 size=4 value=(
Symbol2 size=4
+
Symbol11 size=4 value=(
Immediate(4, 0x35010FF3)
)
)
)
)
)
Symbol29 size=2 value=(
Symbol28 size=2 value=(
Symbol27 size=2 value=(
Immediate(2, 0xFBFF)
)
nor
Symbol26 size=2 value=(
Symbol19 size=2
nor
Symbol19 size=2
)
)
+
Symbol25 size=2 value=(
Symbol24 size=2 value=(
Symbol23 size=2 value=(
Immediate(2, 0x0011)
)
nor
Symbol19 size=2
)
nor
Symbol22 size=2 value=(
Symbol21 size=2 value=(
Immediate(2, 0xFFEE)
)
nor
Symbol20 size=2 value=(
Symbol19 size=2
nor
Symbol19 size=2
)
)
)
)
Symbol31 size=4 value=(
Symbol10 size=4
+
Symbol30 size=4 value=(
Immediate(4, 0x00404D2B)
)
)
Symbol33 size=4 value=(
Symbol10 size=4
+
Symbol32 size=4 value=(
Immediate(4, 0x00404CAB)
)
)
Symbol42 size=4 value=(
Dereference(
Symbol41 size=4 value=(
Symbol40 size=4 value=(
DoubleWord(
Symbol37 size=2 value=(
Symbol35 size=2 value=(
Symbol34 size=2 value=(
Immediate(2, 0xFFBF)
)
nor
Symbol29 size=2
)
>>
Symbol36 size=2 value=(
Immediate(2, 0x0004)
)
)
Symbol39 size=2 value=(
Immediate(2, 0x0000)
)
)
)
+
Symbol38 size=4 value=(
Pointer(
Stack=[
Symbol31 size=4
Symbol33 size=4
]
)
)
)
)
)
Symbol45 size=4 value=(
Symbol44 size=4 value=(
Immediate(4, 0x00000008)
)
+
Symbol43 size=4 value=(
Pointer(
Stack=[
Symbol31 size=4
Symbol33 size=4
Symbol42 size=4
]
)
)
)
Symbol47 size=4 value=(
ConditionalOverride(
conditional_symbol=(
Symbol45 size=4
)
override_symbol=(
Symbol42 size=4
)
default_symbol=(
Symbol31 size=4
)
stack_trigger_size=4
)
)
Symbol49 size=4 value=(
Symbol10 size=4
+
Symbol48 size=4 value=(
Immediate(4, 0x00404000)
)
)
Symbol51 size=4 value=(
DoubleWord(
Symbol29 size=2
Symbol50 size=2 value=(
Immediate(2, 0x0)
)
)
)
Memory symbols:
location:
Symbol45 size=4
value:
Symbol42 size=4
Stack symbols:
Symbol47 size=4
Symbol49 size=4
Symbol51 size=4
Symbol2 size=4
Symbol3 size=4
Symbol4 size=4
Symbol5 size=4
Symbol52 size=4 value=(
Pointer(
Stack=[
Symbol47 size=4
Symbol49 size=4
Symbol51 size=4
Symbol2 size=4
Symbol3 size=4
Symbol4 size=4
Symbol5 size=4
]
)
)
Symbol7 size=4 value=(
arg3
)
Symbol8 size=4 value=(
arg2
)
Symbol9 size=4 value=(
arg1
)
Symbol10 size=4
Once again, a lot of things going on here. Here's two options for executing it, when the JE
passes:
Memory writes:
[]
Stack:
[Immediate(4, 0x00404CAB),
Immediate(4, 0x00404000),
Immediate(4, 0x00000257),
Immediate(4, 0x0xCAFEF00D),
Symbol(3 size=4, value=arg7),
Symbol(4 size=4, value=arg6),
Symbol(5 size=4, value=arg5),
Pointer[Immediate(4, 0x0)],
Symbol(7 size=4, value=arg3),
Symbol(8 size=4, value=arg2),
Symbol(9 size=4, value=arg1),
Immediate(4, 0x0x00)]
And when it fails:
Memory writes:
[]
Stack:
[Immediate(4, 0x00404D2B),
Immediate(4, 0x00404000),
Immediate(4, 0x00000213),
Immediate(4, 0x0x12345678),
Symbol(3 size=4, value=arg7),
Symbol(4 size=4, value=arg6),
Symbol(5 size=4, value=arg5),
Pointer[Immediate(4, 0x0)],
Symbol(7 size=4, value=arg3),
Symbol(8 size=4, value=arg2),
Symbol(9 size=4, value=arg1),
Immediate(4, 0x0x00)]
Finding the value of arg8 that makes the JE pass is an exercise for the reader :)
Happy hacking!