Extracting VMProtect handlers with Binary Ninja

Automatically converting Binary Ninja Low Level IL (LLIL) into python

I've started looking into the Adylkuzz malware, as mentioned by Tim Blazytko in his article on Automated Detection of Obfuscated Code. Initial analysis shows a TLS entry handler that dumps us straight into a VMProtect VMEnter() function, that looks like this in the HLIL:

005becad      int32_t var_4 = arg4
005becb0      _bswap(not.d(arg4))
005becb5      int32_t var_8 = arg2
005becb6      void* const var_c = 0xea6bdba7
005becb7      int32_t ebx  // junk
005becb7      bool s
005becb7      bool o
005becb7      ebx.b = s != o
005becbc      int32_t eax
005becbc      int32_t var_10 = eax
005becbd      eax:1.b = 0xcf  // junk
005becc2      int32_t var_14 = arg1
005becc6      int32_t edi
005becc6      int32_t var_18 = edi
005becc7      int32_t var_1c = arg3
005becc8      int32_t ebx_1  // junk
005becc8      ebx_1.w = 0x7f28
005beccf      bool c
005beccf      bool p
005beccf      bool a
005beccf      bool z
005beccf      bool d
005beccf      int32_t var_20 = (o ? 1 : 0) << 0xb | (d ? 1 : 0) << 0xa | (s ? 1 : 0) << 7 | (z ? 1 : 0) << 6 | (a ? 1 : 0) << 4 | (p ? 1 : 0) << 2 | (c ? 1 : 0) << 0
005becd5      edi.w = 0x5e8d  // junk
005becd9      int32_t var_24 = 0
005bece3      arg3.w = arg3.w & not.w(1 << modu.w(arg2.w, 0x10))  // junk
005bece8      int32_t esi_4 = neg.d(arg5 + 1)
005becea      int32_t eflags  // junk
005becea      uint16_t temp0
005becea      temp0, eflags = _bit_scan_reverse(esi_4.w)
⋯005becf9      bool c_1 = unimplemented  {ror esi, 0x1}
005bed02      edi.w = rlc.w(edi.w, 0x2a, c_1)
005bed0e      int32_t eax_1
005bed0e      eax_1.w = 0x253c
005bed3c      int16_t eax_2
005bed3c      eax_2:1.b = ror.w(0 ^ (ror.d(esi_4 ^ 0x27e9128c, 1) + 1).w, 0x72):1.b << arg1.b
005bed54      int32_t eax_8 = rol.d(not.d((*(ror.d(esi_4 ^ 0x27e9128c, 1) - 3) ^ (ror.d(esi_4 ^ 0x27e9128c, 1) + 1)) - 0x5ca20a41) - 0x40d54c06, 2)
005bed61      int32_t var_e8 = 0x5bed29 + eax_8
005bed62      return eax_8

It's a little bit hard to follow, partially because VMProtect is well known for using a lot of junk instructions. If we clean up the ASM it looks like this:

push    esi
push    edx
push    ebx
push    eax
push    ecx
push    edi
push    ebp
pushfd
mov     eax, 0x0 // this gets relocated
push    eax
mov     esi, dword [esp+0x28] // encrypted VIP
inc     esi
neg     esi
xor     esi, 0x27e9128c
ror     esi, 0x1
inc     esi
lea     esi, [esi+eax]
mov     ebp, esp
lea     esp, [esp-0xc0]
mov     ebx, esi
mov     eax, 0x0 // this gets relocated
sub     ebx, eax
lea     edi, [0x5bed29]
lea     esi, [esi-0x4]
mov     eax, dword [esi]
xor     eax, ebx
lea     eax, [eax-0x5ca20a41]
not     eax
sub     eax, 0x40d54c06
rol     eax, 0x2
xor     ebx, eax
add     edi, eax
push    edi
retn // obfuscated jump to first VM handler

This is a little yuck for us, even once we've removed the junk instructions by hand. Effectively, it pushes all the registers and flags, decrypts the VIP that is passed as the only argument on the stack, initialises the stream cipher with this, then decrypts the first instruction handler pointer and jumps to it. If we look at the way the HLIL has been done, you can see this actually sums this up fairly well for us. If we change to SSA view we can make sure nothing is getting clobbered that we care about:

005bed54      int32_t eax_8#1 = rol.d(not.d((*(ror.d(esi_4#1 ^ 0x27e9128c, 1) - 3) @ mem#2 ^ (ror.d(esi_4#1 ^ 0x27e9128c, 1) + 1)) - 0x5ca20a41) - 0x40d54c06, 2)
005bed61      int32_t var_e8#1 = 0x5bed29 + eax_8#1
005bed62      return eax_8#1

The value of var_e8#1 is what we're most interested in as in practice we aren't returning anything, we are jumping to that location. Because we've put this in SSA view, we can be confident when we look back up the view to see this is only based on esi#1, and this is defined earlier:

int32_t esi_4#1 = neg.d(arg5#0 + 1) // arg5 is actually the only arg pushed on the stack

So if we want to find the address of the first handler, we just need to get the original encrypted VIP passed on the stack (0xd8cb8f6d in this case), and we can evaluate this equation ourself to get the first handler at 0x5babbd. Looking at the first handler in the HLIL though, we can see we need to look a little deeper to get the information we want:

005babc7      arg1#1.b = arg1#0.b & nullptr
005babcc      uint32_t eax
005babcc      eax#1.b = *(arg3#0 - 1) @ mem#0 ^ 0xa7
005babde      eax#2.b = not.b(eax#1.b)
005babe0      eax#3.b = eax#2.b - 0xa3
005babe2      eax#4.b = rol.b(eax#3.b, 1)
005babea      eax#5.b = eax#4.b + 1
005babee      eax#6.b = ror.b(eax#5.b, 1)
005babf0      int32_t ebx
005babf0      ebx#1.b = 0xa7 ^ eax#6.b
005babfd      *(&__return_addr + eax#6) @ mem#0 @ mem#1 = *arg2#0 @ mem#0 @ mem#0005bac3d      jump(arg4#0 + rol.d(not.d((*(arg3#0 - 5) @ mem#1 ^ ebx#1) - 0x5ca20a41) - 0x40d54c06, 2))

Where is ebx#0? What is *(&__return_addr + eax#6)? From what I can see, a big part of the problem is that Binary Ninja is assuming that the functions we reverse are playing along nicely with the x86 standard, and for example, that esp is the stack pointer and that it holds a reference to a return address. I had a play with trying to extract something useful out of the HLIL and MLIL, but I had a hard time with the following:

  • If you follow the AST back to the input arguments, there's no direct way to see if they're backed by variables. You can do this by checking the function type information but it's a little clunky
  • We really want to track the key registers that drive the VMProtect virtual machine, and the MLIL and HLIL views are a little bit abstracted from the assembly, so it's not the best tool for the job

That leaves us with the LLIL, and the short answer is, this gives us the sweet spot where we're very close to the original assembly, but also have some of the heavy lifting (SSA and lifting into IL) done for us.

VMProtect 3 has been described elsewhere (here and here among others), and the basic idea is this:

  • esi is the virtual instruction pointer, VIP
  • edi is the offset of the current VM handler (opcodes are offsets from the previous handler so we need to track this)
  • esp is the offset to the scratch registers
  • ebp is the stack pointer for the VM
  • ebx is the stream cipher that is used to decrypt the stream of opcodes

If we can see what these registers resolve to at the end of the handler then we can find the address of the next handler, and profile the current one to automatically identify it. Firstly though, let's start getting into how the interface works

Accessing LLIL instructions from the Python Console

Click on an instruction to highlight it, and get the function that holds this (the functions we need hang off the binaryninja.function.Function class):

>>> func = bv.get_functions_containing(here)[0]
>>> func
<func: x86@0x5babbd>

It's possible that bv.get_functions_containing() could return multiple functions (or none, if we're outside a function), but let's live dangerously here and assume there's only going to be one function returned. From here, we want to get the LLIL in SSA form:

>>> llil_ssa = func.llil.ssa_form
>>> llil_ssa
<llil func: x86@0x5babbd>
>>> llil_ssa.registers
[<reg ecx>, <reg esp>, <reg ebp>, <reg edx>, <reg eax>, <reg esi>, <reg ebx>, <reg temp1>, <reg edi>, <reg temp0>]
>>> llil_ssa.ssa_registers
[<ssa ecx version 0>, <ssa ecx version 1>, <ssa ecx version 2>, <ssa ecx version 3>, <ssa ecx version 4>, <ssa ecx version 5>, <ssa ecx version 6>, <ssa ecx version 7>, <ssa esp version 0>, <ssa ebp version 0>, <ssa ebp version 1>, <ssa edx version 0>, <ssa eax version 1>, <ssa eax version 2>, <ssa eax version 3>, <ssa eax version 4>, <ssa eax version 5>, <ssa eax version 6>, <ssa eax version 7>, <ssa eax version 8>, <ssa eax version 9>, <ssa eax version 10>, <ssa eax version 11>, <ssa eax version 12>, <ssa eax version 13>, <ssa eax version 14>, <ssa eax version 15>, <ssa eax version 16>, <ssa esi version 0>, <ssa esi version 1>, <ssa esi version 2>, <ssa ebx version 0>, <ssa ebx version 1>, <ssa ebx version 2>, <ssa temp1 version 1>, <ssa edi version 0>, <ssa edi version 1>, <ssa temp0 version 1>]
>>> llil_ssa[0]
<llil: esi#1 = esi#0 - 1>
>>> llil_ssa[1]
<llil: eax#1 = zx.d([esi#1].b @ mem#0)>

From here we can access both the base registers and their SSA forms. At a glance, we can see eax gets heavily used, whereas edi and esi don't see a huge amount of action. We can also subscript the llil.ssa_form object, which returns instructions for each line in the view.

At this point it's going to be quite useful to look at the LLIL docs. There are two main types of instructions we'll care about here:

>>> llil_ssa[1]
<llil: eax#1 = zx.d([esi#1].b @ mem#0)>
>>> type(llil_ssa[1])
<class 'binaryninja.lowlevelil.LowLevelILSetRegSsa'>
>>> llil_ssa[5]
<llil: eax#2.al = eax#1.al ^ ebx#0.bl>
>>> type(llil_ssa[5])
<class 'binaryninja.lowlevelil.LowLevelILSetRegSsaPartial'>

As you can probably guess from the name, the LowLevelILSetRegSsa class represents a register being set to a new value, whereas the LowLevelILSetRegSsaPartial class represents part of the register being set, e.g. bl, which is the low byte of ebx, or si, which is the low word of esi. As far as I could tell, all the instructions subclass LowLevelILInstruction directly, rather than subclassing something more specific like a LowLevelILAssignment class, so we need to handle these directly. It's important to note that various instructions that modify flags but otherwise don't do anything get represented here too, and you often find these in the junk code, for example:

>>> llil_ssa[7]
<llil: esi#1 & 0x4de74ba7>
>>> type(llil_ssa[7])
<class 'binaryninja.lowlevelil.LowLevelILAnd'>
>>> bv.get_disassembly(llil_ssa[7].address)
'test    esi, 0x4de74ba7'

The good news is we shouldn't have to worry about this as we'll just be tracking register definitions (although this will be a pain later when we get to managing flags). If we look at these objects, we have a few parameters that we care about:

>>> llil_ssa[1]
<llil: eax#1 = zx.d([esi#1].b @ mem#0)>
>>> llil_ssa[1].dest
<ssa eax version 1>
>>> type(llil_ssa[1].dest)
<class 'binaryninja.lowlevelil.SSARegister'>
>>> llil_ssa[1].src
<llil: zx.d([esi#1].b @ mem#0)>
>>> type(llil_ssa[1].src)
<class 'binaryninja.lowlevelil.LowLevelILZx'>

For a LowLevelILSetRegSsa object we can use the dest property to get the SSARegister that is being written to. In the src property we will see the tree of LowLevelILInstruction objects that will end with either registers or constants. For nearly all of these, we can use the operands property to access the nodes further up the tree:

>>> llil_ssa[40]
<llil: eax#15 = eax#14 - 0x40d54c06>
>>> type(llil_ssa[40])
<class 'binaryninja.lowlevelil.LowLevelILSetRegSsa'>
>>> llil_ssa[40].src.operands
[<llil: eax#14>, <llil: 0x40d54c06>]
>>> [type(x) for x in llil_ssa[40].src.operands]
[<class 'binaryninja.lowlevelil.LowLevelILRegSsa'>, <class 'binaryninja.lowlevelil.LowLevelILConst'>]
>>> llil_ssa[19]
<llil: ebx#1.bl = ebx#0.bl ^ eax#7.al>
>>> type(llil_ssa[19])
<class 'binaryninja.lowlevelil.LowLevelILSetRegSsaPartial'>
>>> llil_ssa[19].src.operands
[<llil: ebx#0.bl>, <llil: eax#7.al>]
>>> [type(x) for x in llil_ssa[19].src.operands]
[<class 'binaryninja.lowlevelil.LowLevelILRegSsaPartial'>, <class 'binaryninja.lowlevelil.LowLevelILRegSsaPartial'>]

Because this has been lifted directly from the assembly, we should generally see only one layer of instructions, unless it's a complex instruction like a movsx which will both access a memory location and zero extends it

>>> llil_ssa[1]
<llil: eax#1 = zx.d([esi#1].b @ mem#0)>
>>> llil_ssa[1].src
<llil: zx.d([esi#1].b @ mem#0)>
>>> llil_ssa[1].src.src
<llil: [esi#1].b @ mem#0>
>>> llil_ssa[1].src.src.src
<llil: esi#1>
>>> bv.get_disassembly(llil_ssa[1].address)
'movzx   eax, byte [esi]'

We have all the pieces we need to build our extractor now. We could resolve the SSA registers directly and build complete ASTs, but I've chosen to just resolve each instruction one at a time and output something close to Python (you'll need to implement the zx() and mem_read() functions yourself).

Building the extractor

We'll start with a simple function to get going:

def resolve_dest(dest):
  if type(dest) == SSARegister:
    return "%s_%s" % (dest.reg, dest.version)
  else:
    raise Exception("Couldn't resolve destination %s type %s" % (dest, type(dest)))

We can copy-paste this directly into the python console and call it whenever we like. Let's try this - select an instruction (I'm going to do line 1 in the LLIL in SSA form), and execute this:

>>> bv.get_functions_containing(here)[0].get_llil_at(here)
<llil: eax = zx.d([esi].b)>
>>> bv.get_functions_containing(here)[0].get_llil_at(here).ssa_form
<llil: eax#1 = zx.d([esi#1].b @ mem#0)>
>>> bv.get_functions_containing(here)[0].get_llil_at(here).ssa_form.dest
<ssa eax version 1>
>>> resolve_dest(bv.get_functions_containing(here)[0].get_llil_at(here).ssa_form.dest)
'eax_1'

That was pretty easy. Let's try resolving the sources. I've decided to do this through a loop rather than through recursion, partially because I got confused debugging this when I tried it the recursive way, and partially because I haven't done it this way for a while and needed the practice. What we do is we do a depth first traversal of the tree, ordering objects in our todo array, and adding any non-leaf nodes back onto the sources array to make sure we traverse them too:

sources = [source]
todo = []
output = []
while sources:
  source = sources.pop()
  if type(source) in [LowLevelILSub, LowLevelILZx, LowLevelILSx, LowLevelILAnd, LowLevelILXor, LowLevelILOr, LowLevelILNot, LowLevelILLsl, LowLevelILLsr, LowLevelILRol, LowLevelILRor, LowLevelILAdd]:
    todo.append(source)
    for operand in source.operands:
      sources.append(operand)
  elif type(source) in [LowLevelILLoadSsa]:
    # operands are [src, src_memory] and src_memory is just an int ref we don't want
    todo.append(source)
    sources.append(source.src)
  elif type(source) in [LowLevelILConst, LowLevelILRegSsa, LowLevelILRegSsaPartial]:
    todo.append(source)
  else:
    raise Exception("Couldn't process instruction %s type %s" % (source, type(source)))

Now we can process the outputs. Some of the assignments will be directly setting a value, so we can handle these first:

if type(value) == LowLevelILConst:
  output.append(hex(value.constant))
elif type(value) == LowLevelILRegSsa:
  output.append("%s_%s" % (value.src.reg.name, value.src.version))
elif type(value) == LowLevelILRegSsaPartial:
  result = "(%s & %s_%s)" % (hex(masks[value.src.name]), value.full_reg.reg.name, value.full_reg.version)
  if value.src.name in shifts:
    result = "(%s %s)" % (result, shifts[value.src.name])
  output.append(result)

Feel free to ignore the LowLevelILRegSsaPartial implementation here, or skip forward to the source to see how this all hooks up. This is always going to be an implementation decision, and a framework like Triton has complex objects for registers that manage the smaller parts, but I've chosen here just to mask things, which complicates the output, but it makes it easy to follow. We could easily decide here to resolve the registers and insert them in place if we wanted to build an AST, this is an exercise for the reader.

Note: python doesn't have unsigned integers, and things will behave weirdly when we have negative numbers interacting with bitwise arithmetic. I haven't implemented this very carefully and there will be bugs with this.

Disclaimers aside, all we need to do is print out a representation of the constants and registers we come across, they will be the leaf nodes.

Most of the rest look more or less the same, I've chosen to output textual representations of these, but there's no reason we couldn't output other objects that can perform the calculations themselves.

elif type(value) == LowLevelILAdd:
  rhs = output.pop()
  lhs = output.pop()
  output.append("(%s + %s)" % (lhs, rhs))
elif type(value) == LowLevelILSub:
  rhs = output.pop()
  lhs = output.pop()

Finally, we resolve the assignments:

if type(assignment) == LowLevelILSetRegSsa:
  return "%s = %s" % (resolve_dest(assignment.dest), resolve_source(assignment.src))
elif type(assignment) == LowLevelILSetRegSsaPartial:
  previous_version = "%s_%s" % (assignment.full_reg.reg, assignment.full_reg.version - 1)
  output = resolve_dest(assignment.full_reg)
  original = "(%s & %s)" % (hex(inverse_masks[assignment.dest.name]), previous_version)
  change = "(%s & %s)" % (hex(masks[assignment.dest.name]), resolve_source(assignment.src))
  full_src = "%s & %s" % (original, change)
  if assignment.dest.name in shifts:
    full_src = "(%s %s)" % (full_src, shifts[assignment.dest.name])
  return "%s = %s" % (output, full_src)

We've outsourced the source and destination resolution so the LowLevelILSetRegSsa case is very straightforward, and the LowLevelILSetRegSsaPartial just adds a bunch of masking and shifting to make the partial registers behave correctly.

Looking up dependencies

The goal is to extract the operations from the handler, so let's resolve all dependent registers back to the top and output all the lines we need to calculate the outputs ourselves.

def find_all_dependent_registers(func, llil_ssa, base_assignment):
  assignments = [base_assignment]
  output_assignments = []
  while assignments:
    assignment = assignments.pop()
    log_info("Analysing assignment %s" % assignment)
    output_assignments.append(assignment)
    dependent_registers = find_dependent_registers(assignment)
    for register in dependent_registers:
      log_info("Adding dependent register %s" % register)
      assignment = llil_ssa.get_ssa_reg_definition(register)
      if assignment:
        log_info("Defined at: %s" % assignment)
        assignments.append(assignment)
      else:
        log_info("Register %s has no definition, skipping" % register)
  # convert to pythonesque
  output_python = []
  while output_assignments:
    output_python.append(resolve_assignment(output_assignments.pop()))
  return output_python

We use the get_ssa_reg_definition() to find where our registers are defined, and then apply the same iterative depth-first traversal as before. This leaves us with a bunch of assignments in an array. We want to start from the top so we read this array back in reverse, and the resolve_assignment() function generates the output code we want. This will produce duplicate lines of code and we could remove these from the output_python array if we want, but SSA should mean all of our lines are idempotent so it shouldn't hurt to repeat them.

We'll add some helper functions too, in case we want to start from a specific address, or use a register name to find the final SSA version of it and calculate for this.

So what does the output look like?

>>> print("\n".join(find_all_dependent_registers_from_register_name(func, "esi")))
esi_1 = (esi_0 - 0x1)
esi_2 = (esi_1 - 0x4)
>>> print("\n".join(find_all_dependent_registers_from_register_name(func, "edi")))
esi_1 = (esi_0 - 0x1)
eax_1 = zx(read_mem(esi_1,1), 4)
eax_2 = (0xffffff00 & eax_1) & (0xff & ((0xff & eax_1) ^ (0xff & ebx_0)))
eax_3 = (0xffffff00 & eax_2) & (0xff & not((0xff & eax_2), 1))
eax_4 = (0xffffff00 & eax_3) & (0xff & ((0xff & eax_3) - -0x5d))
eax_5 = (0xffffff00 & eax_4) & (0xff & (0xFF & (((0xff & eax_4) << 0x1) | ((0xff & eax_4) >> (8 - 0x1)))))
eax_6 = (0xffffff00 & eax_5) & (0xff & ((0xff & eax_5) + 0x1))
eax_7 = (0xffffff00 & eax_6) & (0xff & (0xFF & (((0xff & eax_6) >> 0x1) | ((0xff & eax_6) << (8 - 0x1)))))
ebx_1 = (0xffffff00 & ebx_0) & (0xff & ((0xff & ebx_0) ^ (0xff & eax_7)))
esi_1 = (esi_0 - 0x1)
esi_2 = (esi_1 - 0x4)
eax_11 = read_mem(esi_2,4)
eax_12 = (eax_11 ^ ebx_1)
eax_13 = (eax_12 + -0x5ca20a41)
eax_14 = not(eax_13, 4)
eax_15 = (eax_14 - 0x40d54c06)
eax_16 = (0xFFFFFFFF & ((eax_15 << 0x2) | (eax_15 >> (32 - 0x2))))
edi_1 = (edi_0 + eax_16)
>>> print("\n".join(find_all_dependent_registers_from_register_name(func, "ebp")))
ebp_1 = (ebp_0 + 0x4)
>>> print("\n".join(find_all_dependent_registers_from_register_name(func, "esp")))

>>> print("\n".join(find_all_dependent_registers_from_register_name(func, "ebx")))
esi_1 = (esi_0 - 0x1)
eax_1 = zx(read_mem(esi_1,1), 4)
eax_2 = (0xffffff00 & eax_1) & (0xff & ((0xff & eax_1) ^ (0xff & ebx_0)))
eax_3 = (0xffffff00 & eax_2) & (0xff & not((0xff & eax_2), 1))
eax_4 = (0xffffff00 & eax_3) & (0xff & ((0xff & eax_3) - -0x5d))
eax_5 = (0xffffff00 & eax_4) & (0xff & (0xFF & (((0xff & eax_4) << 0x1) | ((0xff & eax_4) >> (8 - 0x1)))))
eax_6 = (0xffffff00 & eax_5) & (0xff & ((0xff & eax_5) + 0x1))
eax_7 = (0xffffff00 & eax_6) & (0xff & (0xFF & (((0xff & eax_6) >> 0x1) | ((0xff & eax_6) << (8 - 0x1)))))
ebx_1 = (0xffffff00 & ebx_0) & (0xff & ((0xff & ebx_0) ^ (0xff & eax_7)))
esi_1 = (esi_0 - 0x1)
esi_2 = (esi_1 - 0x4)
eax_11 = read_mem(esi_2,4)
eax_12 = (eax_11 ^ ebx_1)
eax_13 = (eax_12 + -0x5ca20a41)
eax_14 = not(eax_13, 4)
eax_15 = (eax_14 - 0x40d54c06)
eax_16 = (0xFFFFFFFF & ((eax_15 << 0x2) | (eax_15 >> (32 - 0x2))))
esi_1 = (esi_0 - 0x1)
eax_1 = zx(read_mem(esi_1,1), 4)
eax_2 = (0xffffff00 & eax_1) & (0xff & ((0xff & eax_1) ^ (0xff & ebx_0)))
eax_3 = (0xffffff00 & eax_2) & (0xff & not((0xff & eax_2), 1))
eax_4 = (0xffffff00 & eax_3) & (0xff & ((0xff & eax_3) - -0x5d))
eax_5 = (0xffffff00 & eax_4) & (0xff & (0xFF & (((0xff & eax_4) << 0x1) | ((0xff & eax_4) >> (8 - 0x1)))))
eax_6 = (0xffffff00 & eax_5) & (0xff & ((0xff & eax_5) + 0x1))
eax_7 = (0xffffff00 & eax_6) & (0xff & (0xFF & (((0xff & eax_6) >> 0x1) | ((0xff & eax_6) << (8 - 0x1)))))
ebx_1 = (0xffffff00 & ebx_0) & (0xff & ((0xff & ebx_0) ^ (0xff & eax_7)))
ebx_2 = (ebx_1 ^ eax_16)

Lots of repetition caused by the ebx decryption, but we can also see a couple of main things:

  • our VIP register, esi gets decremented by 5 (in this VMProtect VM, the VIP counts backwards), which means we're reading 1 byte from the bytecode, and then a final DWORD to get the address of the next handler
  • our stack pointer register, ebp gets advanced by 4, which suggests we popped a DWORD off the virtual stack, but didn't put anything back on (so we haven't done any arithmetic)

These two things alone are pretty good clues that we've loaded a DWORD from the stack and put it into a virtual register in the scratch space. We haven't handled memory writes, and the next important step would be to find all LowLevelILStoreSsa instructions and collect them somewhere too:

>>> llil_ssa[24]
<llil: [esp#0 + eax#7].d = ecx#7 @ mem#0 -> mem#1>
>>> type(llil_ssa[24])
<class 'binaryninja.lowlevelil.LowLevelILStoreSsa'>

With heuristics we would know that since esp is our scratch space base, we just need to resolve eax#7 and we'll know which number register we are writing to.

In any case, code is at https://github.com/samrussell/vmprotect_binja_plugin, feel free to have a play with it and see what else you can do

Takeaways

It took a while to find the right level to look at, but ultimately the Binary Ninja LLIL is very useful, the Python interface is fantastic for interacting with it, and it does about 90% of the heavy lifting for us. I suspect once we get to the arithmetic operations we'll run into some problems with managing where the flags originate from, and that will require us to step backwards through the instruction array rather than directly access these. The LLIL does keep track of some flags that are directly set (there are 8 versions of the carry flag in this handler, for example), but we will have to implement the flag calculation for arithmetic ourselves. Having said this, the flag usage in the handlers is fairly straightforward in earlier versions of VMProtect, and the problems only arise when handling the lifted opcodes in later analysis.

Another nice surprise was how the HLIL was really useful in finding the address of the first VM handler, and it would be nice if there was a way to customise this more. The dead code and obfuscated jump handling isn't perfect, but we do get a bunch of stuff for free from both the HLIL and the LLIL, and I feel like Binary Ninja is going to be quite a useful tool for handling a sample like this.

Anyway, I hope you got something out of this. Good luck and happy reversing.