Reversing x86 apps with Triton

Demo code for this article: github.com/samrussell/tritondemos/blob/main..

This is hopefully going to be part of a longer series of articles on decompiling and reversing apps, but for now, let's jump in with the base case.

Let's start with a really simple function:

push ebp
mov ebp, esp

mov eax, [ebp+8]
mov edx, [ebp+12]
add eax, edx

pop ebp
add esp, 8
ret

This takes two numbers, adds them together, and returns the result. If we assemble it (using a tool like defuse.ca/online-x86-assembler.htm#disassem..) we get the following:

"\x55\x89\xE5\x8B\x45\x08\x8B\x55\x0C\x01\xD0\x5D\x83\xC4\x08\xC3"

Looking good so far. Get yourself a copy of Triton, I run mine on WSL and do my coding in vscode (but this runs fine on linux/mac too if you prefer).

Let's get a basic emulator up and running and see what the output is. This is heavily based on one of the samples that comes with Triton:

#!/usr/bin/env python3

from triton import TritonContext, ARCH, Instruction, MemoryAccess, OPCODE, MODE, AST_REPRESENTATION
from pprint import pprint

# Memory mapping
BASE_EBP = 0x80000000
BASE_ESP = 0xA0000000

# Emulate the binary.
def emulate(Triton, pc):
    count = 0
    while pc:
        # Fetch opcode
        opcode = Triton.getConcreteMemoryAreaValue(pc, 16)

        # Create the Triton instruction
        instruction = Instruction()
        instruction.setOpcode(opcode)
        instruction.setAddress(pc)

        # Process
        Triton.processing(instruction)
        count += 1

        print("Emulating %s" % (instruction))

        #print instruction

        if instruction.getType() == OPCODE.X86.RET:
            break

        # Next
        pc = Triton.getConcreteRegisterValue(Triton.registers.eip)

    print('Instructions executed: %d' %(count))
    return


def setup_triton():
    Triton = TritonContext()
    Triton.setArchitecture(ARCH.X86)
    Triton.setMode(MODE.ALIGNED_MEMORY, True)

    return Triton

def run():
    Triton = setup_triton()

    # AST representation as Python syntax
    Triton.setAstRepresentationMode(AST_REPRESENTATION.PYTHON)

    entrypoint = 0x401000
    program = b"\x55\x89\xE5\x8B\x45\x08\x8B\x55\x0C\x01\xD0\x5D\x83\xC4\x08\xC3"

    Triton.setConcreteMemoryAreaValue(entrypoint, program)

    # Define a fake stack
    Triton.setConcreteRegisterValue(Triton.registers.ebp, BASE_EBP)
    Triton.setConcreteRegisterValue(Triton.registers.esp, BASE_ESP)

    # enable symbolic execution
    Triton.enableSymbolicEngine(True)

    emulate(Triton, entrypoint)

    return Triton


if __name__ == '__main__':

    Triton = run()

    print("Expression tree:")
    pprint(Triton.getSymbolicExpressions())

    print("Final state of eax:")
    print(Triton.getAstContext().unroll(Triton.getRegisterAst(Triton.registers.eax)))

    exit()

Put this into a .py file and run it, and what do we get? We can use the function Triton.getSymbolicExpressions() to get the list of all the expressions that Triton generated while emulating our code:

$ python3 tritonbasic1.py
Emulating 0x401000: push ebp
Emulating 0x401001: mov ebp, esp
Emulating 0x401003: mov eax, dword ptr [ebp + 8]
Emulating 0x401006: mov edx, dword ptr [ebp + 0xc]
Emulating 0x401009: add eax, edx
Emulating 0x40100b: pop ebp
Emulating 0x40100c: add esp, 8
Emulating 0x40100f: ret
Instructions executed: 8
Expression tree:
{0: ref_0 = ((0xa0000000 - 0x4) & 0xffffffff) # Stack alignment - 0x401000: push ebp,
 1: ref_1 = 0x80000000 # Aligned Byte reference - PUSH operation - 0x401000: push ebp,
 2: ref_2 = ((0x80000000 >> 24) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 3: ref_3 = ((0x80000000 >> 16) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 4: ref_4 = ((0x80000000 >> 8) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 5: ref_5 = (0x80000000 & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 10: ref_10 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401003: mov eax, dword ptr [ebp + 8],
 12: ref_12 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401006: mov edx, dword ptr [ebp + 0xc],
 14: ref_14 = ((ref_10 + ref_12) & 0xffffffff) # ADD operation - 0x401009: add eax, edx,
 22: ref_22 = 0x80000000 # POP operation - 0x40100b: pop ebp,
 23: ref_23 = ((ref_0 + 0x4) & 0xffffffff) # Stack alignment - 0x40100b: pop ebp,
 25: ref_25 = ((ref_23 + 0x8) & 0xffffffff) # ADD operation - 0x40100c: add esp, 8,
 26: ref_26 = (0x1 if (0x10 == (0x10 & (ref_25 ^ (ref_23 ^ 0x8)))) else 0x0) # Adjust flag - 0x40100c: add esp, 8,
 27: ref_27 = ((((ref_23 & 0x8) ^ (((ref_23 ^ 0x8) ^ ref_25) & (ref_23 ^ 0x8))) >> 31) & 0x1) # Carry flag - 0x40100c: add esp, 8,
 28: ref_28 = ((((ref_23 ^ (~(0x8) & 0xffffffff)) & (ref_23 ^ ref_25)) >> 31) & 0x1) # Overflow flag - 0x40100c: add esp, 8,
 29: ref_29 = ((((((((0x1 ^ (((ref_25 & 0xff) >> 0x0) & 0x1)) ^ (((ref_25 & 0xff) >> 0x1) & 0x1)) ^ (((ref_25 & 0xff) >> 0x2) & 0x1)) ^ (((ref_25 & 0xff) >> 0x3) & 0x1)) ^ (((ref_25 & 0xff) >> 0x4) & 0x1)) ^ (((ref_25 & 0xff) >> 0x5) & 0x1)) ^ (((ref_25 & 0xff) >> 0x6) & 0x1)) ^ (((ref_25 & 0xff) >> 0x7) & 0x1)) # Parity flag - 0x40100c: add esp, 8,
 30: ref_30 = ((ref_25 >> 31) & 0x1) # Sign flag - 0x40100c: add esp, 8,
 31: ref_31 = (0x1 if (ref_25 == 0x0) else 0x0) # Zero flag - 0x40100c: add esp, 8,
 33: ref_33 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x40100f: ret,
 34: ref_34 = ((ref_25 + 0x4) & 0xffffffff) # Stack alignment - 0x40100f: ret}
Final state of eax:
((((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) + ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0)) & 0xffffffff)

Huh... there's quite a lot going on here.

What's happened is that Triton has created a new reference for everything that happens in the system. Let's look at the first instruction, a simple push...

0: ref_0 = ((0xa0000000 - 0x4) & 0xffffffff) # Stack alignment - 0x401000: push ebp

A push instruction decrements ESP far enough to hold the new value, in this case a DWORD, 4 bytes

1: ref_1 = 0x80000000 # Aligned Byte reference - PUSH operation - 0x401000: push ebp

We then see the value of EBP being put somewhere in memory, and Triton keeps a reference

 2: ref_2 = ((0x80000000 >> 24) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 3: ref_3 = ((0x80000000 >> 16) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 4: ref_4 = ((0x80000000 >> 8) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 5: ref_5 = (0x80000000 & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp

We can access memory as bytes too, so it creates references for each byte. Yikes. One instruction in and we've already created a ton of references. Let's look at the next two:

 10: ref_10 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401003: mov eax, dword ptr [ebp + 8],
 12: ref_12 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401006: mov edx, dword ptr [ebp + 0xc],

Okay, so this is loading 0x00000000 into EAX and EDX, and naming them as ref_10 and ref_12.

 14: ref_14 = ((ref_10 + ref_12) & 0xffffffff) # ADD operation - 0x401009: add eax, edx,

This is kinda cool, it's doing eax = eax + edx, but when we overwrite a register it creates a brand new reference for us.

We can skip to the end for now, and you can see that we're loading the AST for EAX with Triton.getRegisterAst(Triton.registers.eax) and then using the AstContext object to unroll it. If we substitute ref_10 and ref_12 with the values we get this really ugly (but precise) chunk of code that follows:

((((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) + ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0)) & 0xffffffff)

Now, it's using 0x00000000 for the values we passed on the stack, but what if we want it to work for anything? Triton lets us mark parts of memory as symbolic so we can think about them as variables. Let's add a couple lines after we set up our fake stack:

Triton.symbolizeMemory(MemoryAccess(BASE_ESP+0x04, 4), "arg_04")
Triton.symbolizeMemory(MemoryAccess(BASE_ESP+0x08, 4), "arg_08")

I've used the IDA conventions here but you can use any name you like. If we run the code again we get this:

{0: ref_0 = arg_04 # aligned Byte reference,
 1: ref_1 = ((arg_04 >> 24) & 0xff) # Byte reference,
 2: ref_2 = ((arg_04 >> 16) & 0xff) # Byte reference,
 3: ref_3 = ((arg_04 >> 8) & 0xff) # Byte reference,
 4: ref_4 = (arg_04 & 0xff) # Byte reference,
 5: ref_5 = arg_08 # aligned Byte reference,
 6: ref_6 = ((arg_08 >> 24) & 0xff) # Byte reference,
 7: ref_7 = ((arg_08 >> 16) & 0xff) # Byte reference,
 8: ref_8 = ((arg_08 >> 8) & 0xff) # Byte reference,
 9: ref_9 = (arg_08 & 0xff) # Byte reference,
 10: ref_10 = ((0xa0000000 - 0x4) & 0xffffffff) # Stack alignment - 0x401000: push ebp,
 11: ref_11 = 0x80000000 # Aligned Byte reference - PUSH operation - 0x401000: push ebp,
 12: ref_12 = ((0x80000000 >> 24) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 13: ref_13 = ((0x80000000 >> 16) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 14: ref_14 = ((0x80000000 >> 8) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 15: ref_15 = (0x80000000 & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 20: ref_20 = arg_04 # MOV operation - 0x401003: mov eax, dword ptr [ebp + 8],
 22: ref_22 = arg_08 # MOV operation - 0x401006: mov edx, dword ptr [ebp + 0xc],
 24: ref_24 = ((ref_20 + ref_22) & 0xffffffff) # ADD operation - 0x401009: add eax, edx,
 32: ref_32 = 0x80000000 # POP operation - 0x40100b: pop ebp,
 33: ref_33 = ((ref_10 + 0x4) & 0xffffffff) # Stack alignment - 0x40100b: pop ebp,
 35: ref_35 = ((ref_33 + 0x8) & 0xffffffff) # ADD operation - 0x40100c: add esp, 8,
 36: ref_36 = (0x1 if (0x10 == (0x10 & (ref_35 ^ (ref_33 ^ 0x8)))) else 0x0) # Adjust flag - 0x40100c: add esp, 8,
 37: ref_37 = ((((ref_33 & 0x8) ^ (((ref_33 ^ 0x8) ^ ref_35) & (ref_33 ^ 0x8))) >> 31) & 0x1) # Carry flag - 0x40100c: add esp, 8,
 38: ref_38 = ((((ref_33 ^ (~(0x8) & 0xffffffff)) & (ref_33 ^ ref_35)) >> 31) & 0x1) # Overflow flag - 0x40100c: add esp, 8,
 39: ref_39 = ((((((((0x1 ^ (((ref_35 & 0xff) >> 0x0) & 0x1)) ^ (((ref_35 & 0xff) >> 0x1) & 0x1)) ^ (((ref_35 & 0xff) >> 0x2) & 0x1)) ^ (((ref_35 & 0xff) >> 0x3) & 0x1)) ^ (((ref_35 & 0xff) >> 0x4) & 0x1)) ^ (((ref_35 & 0xff) >> 0x5) & 0x1)) ^ (((ref_35 & 0xff) >> 0x6) & 0x1)) ^ (((ref_35 & 0xff) >> 0x7) & 0x1)) # Parity flag - 0x40100c: add esp, 8,
 40: ref_40 = ((ref_35 >> 31) & 0x1) # Sign flag - 0x40100c: add esp, 8,
 41: ref_41 = (0x1 if (ref_35 == 0x0) else 0x0) # Zero flag - 0x40100c: add esp, 8,
 43: ref_43 = arg_08 # Program Counter - 0x40100f: ret,
 44: ref_44 = ((ref_35 + 0x4) & 0xffffffff) # Stack alignment - 0x40100f: ret}
Final state of eax:
((arg_04 + arg_08) & 0xffffffff)

There's a bit more noise at the start where set up our references, but look at how clean it makes things further down:

 20: ref_20 = arg_04 # MOV operation - 0x401003: mov eax, dword ptr [ebp + 8],
 22: ref_22 = arg_08 # MOV operation - 0x401006: mov edx, dword ptr [ebp + 0xc],
 24: ref_24 = ((ref_20 + ref_22) & 0xffffffff) # ADD operation - 0x401009: add eax, edx

We can easily just sub these in to get the output we get at the end:

((arg_04 + arg_08) & 0xffffffff)

Let's try with something a bit bigger:

push ebp
mov ebp, esp

mov eax, [ebp+8]
mov ecx, [ebp+12]
xor edx, edx
div ecx

mov ecx, [ebp+16]
add eax, ecx
add eax, edx

pop ebp
add esp, 8
ret

Give this a go, remember to symbolize the new argument, and we get this output:

{0: ref_0 = arg_04 # aligned Byte reference,
 1: ref_1 = ((arg_04 >> 24) & 0xff) # Byte reference,
 2: ref_2 = ((arg_04 >> 16) & 0xff) # Byte reference,
 3: ref_3 = ((arg_04 >> 8) & 0xff) # Byte reference,
 4: ref_4 = (arg_04 & 0xff) # Byte reference,
 5: ref_5 = arg_08 # aligned Byte reference,
 6: ref_6 = ((arg_08 >> 24) & 0xff) # Byte reference,
 7: ref_7 = ((arg_08 >> 16) & 0xff) # Byte reference,
 8: ref_8 = ((arg_08 >> 8) & 0xff) # Byte reference,
 9: ref_9 = (arg_08 & 0xff) # Byte reference,
 10: ref_10 = arg_0C # aligned Byte reference,
 11: ref_11 = ((arg_0C >> 24) & 0xff) # Byte reference,
 12: ref_12 = ((arg_0C >> 16) & 0xff) # Byte reference,
 13: ref_13 = ((arg_0C >> 8) & 0xff) # Byte reference,
 14: ref_14 = (arg_0C & 0xff) # Byte reference,
 15: ref_15 = ((0xa0000000 - 0x4) & 0xffffffff) # Stack alignment - 0x401000: push ebp,
 16: ref_16 = 0x80000000 # Aligned Byte reference - PUSH operation - 0x401000: push ebp,
 17: ref_17 = ((0x80000000 >> 24) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 18: ref_18 = ((0x80000000 >> 16) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 19: ref_19 = ((0x80000000 >> 8) & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 20: ref_20 = (0x80000000 & 0xff) # Byte reference - PUSH operation - 0x401000: push ebp,
 25: ref_25 = arg_04 # MOV operation - 0x401003: mov eax, dword ptr [ebp + 8],
 27: ref_27 = arg_08 # MOV operation - 0x401006: mov ecx, dword ptr [ebp + 0xc],
 29: ref_29 = (0x0 ^ 0x0) # XOR operation - 0x401009: xor edx, edx,
 36: ref_36 = ((((ref_29) << 32 | ref_25) / ref_27) & 0xffffffff) # DIV operation - 0x40100b: div ecx,
 37: ref_37 = ((((ref_29) << 32 | ref_25) % ref_27) & 0xffffffff) # DIV operation - 0x40100b: div ecx,
 39: ref_39 = arg_0C # MOV operation - 0x40100d: mov ecx, dword ptr [ebp + 0x10],
 41: ref_41 = ((ref_36 + ref_39) & 0xffffffff) # ADD operation - 0x401010: add eax, ecx,
 49: ref_49 = ((ref_41 + ref_37) & 0xffffffff) # ADD operation - 0x401012: add eax, edx,
 57: ref_57 = 0x80000000 # POP operation - 0x401014: pop ebp,
 58: ref_58 = ((ref_15 + 0x4) & 0xffffffff) # Stack alignment - 0x401014: pop ebp,
 60: ref_60 = ((ref_58 + 0x8) & 0xffffffff) # ADD operation - 0x401015: add esp, 8,
 61: ref_61 = (0x1 if (0x10 == (0x10 & (ref_60 ^ (ref_58 ^ 0x8)))) else 0x0) # Adjust flag - 0x401015: add esp, 8,
 62: ref_62 = ((((ref_58 & 0x8) ^ (((ref_58 ^ 0x8) ^ ref_60) & (ref_58 ^ 0x8))) >> 31) & 0x1) # Carry flag - 0x401015: add esp, 8,
 63: ref_63 = ((((ref_58 ^ (~(0x8) & 0xffffffff)) & (ref_58 ^ ref_60)) >> 31) & 0x1) # Overflow flag - 0x401015: add esp, 8,
 64: ref_64 = ((((((((0x1 ^ (((ref_60 & 0xff) >> 0x0) & 0x1)) ^ (((ref_60 & 0xff) >> 0x1) & 0x1)) ^ (((ref_60 & 0xff) >> 0x2) & 0x1)) ^ (((ref_60 & 0xff) >> 0x3) & 0x1)) ^ (((ref_60 & 0xff) >> 0x4) & 0x1)) ^ (((ref_60 & 0xff) >> 0x5) & 0x1)) ^ (((ref_60 & 0xff) >> 0x6) & 0x1)) ^ (((ref_60 & 0xff) >> 0x7) & 0x1)) # Parity flag - 0x401015: add esp, 8,
 65: ref_65 = ((ref_60 >> 31) & 0x1) # Sign flag - 0x401015: add esp, 8,
 66: ref_66 = (0x1 if (ref_60 == 0x0) else 0x0) # Zero flag - 0x401015: add esp, 8,
 68: ref_68 = arg_08 # Program Counter - 0x401018: ret,
 69: ref_69 = ((ref_60 + 0x4) & 0xffffffff) # Stack alignment - 0x401018: ret}
Final state of eax:
(((((((((0x0 ^ 0x0)) << 32 | arg_04) / arg_08) & 0xffffffff) + arg_0C) & 0xffffffff) + (((((0x0 ^ 0x0)) << 32 | arg_04) % arg_08) & 0xffffffff)) & 0xffffffff)

There's some obvious stuff we can remove. The constant case to 32-bit with the 0xffffffff mask is unnecessary, the (0x0 ^ 0x0) is always going to be 0 (this xor edx,edx is the cheapest way to zero out a register), so it comes down to something more like:

(arg_04 / arg_08) + (arg_04 % arg_08) + arg_0C

We set the output to python syntax too, so everything Triton generates should be fine to just copypaste into another .py file.

I'm still figuring out how this all works, but this is what I've got so far. Some things I'd like to figure out for next time:

  • How to manage expressions being moved in and out of memory (note we push the old EBP and when we pull it out we get the concrete value)
  • How to simplify out some of the extraneous stuff that is always going to cancel itself out

That's all for now though, have fun out there!