# Adding array-based memory access to Triton

Demo code for this article: https://github.com/samrussell/tritondemos/blob/main/tritonbasic2.py

In my [previous post](https://www.lodsb.com/reversing-x86-apps-with-triton), we looked at a basic intro to binary analysis with [Triton](https://triton.quarkslab.com/). In short, it allows us to take something like this:


```
push ebp
mov ebp, esp

mov eax, [ebp+8]
mov ecx, [ebp+12]
xor edx, edx
div ecx

mov ecx, [ebp+16]
add eax, ecx
add eax, edx

pop ebp
add esp, 8
ret
``` 

and turn it into this


```
(arg_04 / arg_08) + (arg_04 % arg_08) + arg_0C
``` 

Fun stuff, huh. There is one thing that interests me that Triton wasn't designed to do, and that is treating symbols as arrays. Let's look at an example:


```
mov eax, [esp]
mov edx, [eax]
mov eax, [edx]
mov eax, [eax]
ret
``` 

We've used a bunch of registers, but it's basically a lookup through a multi-level array. What happens when we put it through Triton? We get this:


```
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx]
Emulating 0x401007: mov eax, dword ptr [eax]
Emulating 0x401009: ret
Instructions executed: 5
Expression tree:
{0: ref_0 = esp,
 3: ref_3 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401003: mov edx, dword ptr [eax],
 7: ref_7 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401007: mov eax, dword ptr [eax],
 9: ref_9 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401009: ret,
 10: ref_10 = ((ref_0 + 0x4) & 0xffffffff) # Stack alignment - 0x401009: ret}
Final state of eax:
((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0)
Final state of ebp:
0x0
Final state of esp:
((esp + 0x4) & 0xffffffff)
``` 

Huh. `eax` = 0. You can see it handles us setting a symbol name into `esp`, but it when we dereference a symbolic register it just defaults to 0. The lovely people on the Triton project tell me they've attempted to implement this in the engine, but this is still a work-in-progress. There's nothing to stop us shoehorning something in by ourselves though...

Remember our emulation loop looks like this:


```
    while pc:
        # Fetch opcode
        opcode = Triton.getConcreteMemoryAreaValue(pc, 16)

        # Create the Triton instruction
        instruction = Instruction()
        instruction.setOpcode(opcode)
        instruction.setAddress(pc)

        # Process
        Triton.processing(instruction)
        count += 1

        print("Emulating %s" % (instruction))

        #print instruction

        if instruction.getType() == OPCODE.X86.RET:
            break

        # Next
        pc = Triton.getConcreteRegisterValue(Triton.registers.eip)
``` 

Once we've called `Triton.processing()`, we're totally able to access both the details of the `Instruction` and the state of the `TritonContext` engine. Here's a quick and dirty way to handle use this to handle basic dereferences:

```
        if instruction.isMemoryRead():
            read_register, read_ast_node = instruction.getReadRegisters()[0]
            written_register, write_ast_node = instruction.getWrittenRegisters()[0]
            if read_ast_node.getType() == AST_NODE.REFERENCE:
                expression = read_ast_node.getSymbolicExpression()
                variable = expression.getAst().getSymbolicVariable()
                alias = variable.getAlias()
                newalias = "(%s)[0]" % alias
                Triton.symbolizeRegister(written_register, newalias)
```

This works fine with simple derefs like `mov eax, [esp]`, and it's really up to us how we want to express these arrays that we're creating. Here's the end result:

```
$ python3 tritonbasic2.py
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx]
Emulating 0x401007: mov eax, dword ptr [eax]
Emulating 0x401009: ret
Instructions executed: 5
Expression tree:
{6: ref_6 = ((esp)[0])[0],
 12: ref_12 = ((((esp)[0])[0])[0])[0],
 13: ref_13 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401009: ret,
 15: ref_15 = (esp)[0]}
Final state of eax:
((((esp)[0])[0])[0])[0]
Final state of ebp:
0x0
Final state of esp:
(esp)[0]
```

Some cool things about this:

- Triton is smart enough to figure out when we've overwritten something. You'll note there are gaps in the reference numbers, this is because we override things (e.g. reusing `eax`, incrementing `eip`)
- It only took a few lines of code but we can now handle nested dereferences without any real effort

Things to watch out for:

- This dereferencing will work with `mov` instructions, but these aren't the only instructions that access memory (consider `pop` and `lodsb` for starters). Some of these other instructions will modify the pointer register, and we need to also handle incrementing/decrementing this and make sure we have some way of handling this
- If we alter the register that we're dereferencing we need to find a way to handle this too.

Let's consider some code like this:

```
mov eax, [esp]
mov edx, [eax]
mov eax, [edx+8]
mov eax, [eax]
mov ebx, [eax+4]
mov ecx, [eax-4]
ret
```

We get this for output:

```
$ python3 tritonbasic2.py
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx + 8]
Emulating 0x401008: mov eax, dword ptr [eax]
Emulating 0x40100a: mov ebx, dword ptr [eax + 4]
Emulating 0x40100d: mov ecx, dword ptr [eax - 4]
Emulating 0x401010: ret
Instructions executed: 7
Expression tree:
{6: ref_6 = ((esp)[0])[0],
 12: ref_12 = ((((esp)[0])[0])[0])[0],
 15: ref_15 = (((((esp)[0])[0])[0])[0])[0],
 18: ref_18 = (((((esp)[0])[0])[0])[0])[0],
 19: ref_19 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401010: ret,
 21: ref_21 = (esp)[0]}
Final state of eax:
((((esp)[0])[0])[0])[0]
Final state of ebp:
0x0
Final state of esp:
(esp)[0]
```

Clearly this is wrong... so we need to add a little more smarts to our lookup code. The big thing is that we look at `instruction.getReadRegisters()` and this doesn't get us the offset, so we need to find a `MemoryAccess` object instead. Let's try this on for size:

```
        if instruction.isMemoryRead():
            memory_access, read__memory_ast_node = instruction.getLoadAccess()[0]
            read_register, read_register_ast_node = instruction.getReadRegisters()[0]
            written_register, write_register_ast_node = instruction.getWrittenRegisters()[0]
            if read_register.getName() != "unknown":
                expression = read_register_ast_node.getSymbolicExpression()
                expression_ast = expression.getAst()
                #import pdb
                #pdb.set_trace()
                if expression_ast.getType() == AST_NODE.VARIABLE:
                    variable = expression_ast.getSymbolicVariable()
                    alias = variable.getAlias()
                    displacement = memory_access.getDisplacement().getValue()
                    newalias = "(%s)[0x%x]" % (alias, displacement)
                    #newalias = "(%s)[0]" % alias
                    Triton.symbolizeRegister(written_register, newalias)
                elif expression_ast.getType() == AST_NODE.CONCAT:
                    import pdb
                    pdb.set_trace()
                    pass
                else:
                    import pdb
                    pdb.set_trace()
                    raise Exception("Unexpected ast node")
```

And the output:

```
$ python3 tritonbasic2.py
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx + 8]
Emulating 0x401008: mov eax, dword ptr [eax]
Emulating 0x40100a: mov ebx, dword ptr [eax + 4]
Emulating 0x40100d: mov ecx, dword ptr [eax - 4]
Emulating 0x401010: ret
Instructions executed: 7
Expression tree:
{6: ref_6 = ((esp)[0x0])[0x0],
 12: ref_12 = ((((esp)[0x0])[0x0])[0x8])[0x0],
 15: ref_15 = (((((esp)[0x0])[0x0])[0x8])[0x0])[0x4],
 18: ref_18 = (((((esp)[0x0])[0x0])[0x8])[0x0])[0xfffffffc],
 19: ref_19 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401010: ret,
 21: ref_21 = (esp)[0x0]}
Final state of eax:
((((esp)[0x0])[0x0])[0x8])[0x0]
Final state of ebx:
(((((esp)[0x0])[0x0])[0x8])[0x0])[0x4]
Final state of ecx:
(((((esp)[0x0])[0x0])[0x8])[0x0])[0xfffffffc]
Final state of ebp:
0x0
Final state of esp:
(esp)[0x0]
```

So this looks to handle nested memory reads, including indices. There's a little more to handle though, as I mentioned this still doesn't handle pop/lodsb style instructions, nor the more complicated `mov eax, [ebx*4+ecx+12]`, but you can see how the pattern works.

I hope this was useful, have fun hacking!