# LODSB

In my previous post, we looked at a basic intro to binary analysis with Triton. In short, it allows us to take something like this:

``````push ebp
mov ebp, esp

mov eax, [ebp+8]
mov ecx, [ebp+12]
xor edx, edx
div ecx

mov ecx, [ebp+16]

pop ebp
ret
``````

and turn it into this

``````(arg_04 / arg_08) + (arg_04 % arg_08) + arg_0C
``````

Fun stuff, huh. There is one thing that interests me that Triton wasn't designed to do, and that is treating symbols as arrays. Let's look at an example:

``````mov eax, [esp]
mov edx, [eax]
mov eax, [edx]
mov eax, [eax]
ret
``````

We've used a bunch of registers, but it's basically a lookup through a multi-level array. What happens when we put it through Triton? We get this:

``````Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx]
Emulating 0x401007: mov eax, dword ptr [eax]
Emulating 0x401009: ret
Instructions executed: 5
Expression tree:
{0: ref_0 = esp,
3: ref_3 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401003: mov edx, dword ptr [eax],
7: ref_7 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # MOV operation - 0x401007: mov eax, dword ptr [eax],
9: ref_9 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401009: ret,
10: ref_10 = ((ref_0 + 0x4) & 0xffffffff) # Stack alignment - 0x401009: ret}
Final state of eax:
((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0)
Final state of ebp:
0x0
Final state of esp:
((esp + 0x4) & 0xffffffff)
``````

Huh. `eax` = 0. You can see it handles us setting a symbol name into `esp`, but it when we dereference a symbolic register it just defaults to 0. The lovely people on the Triton project tell me they've attempted to implement this in the engine, but this is still a work-in-progress. There's nothing to stop us shoehorning something in by ourselves though...

Remember our emulation loop looks like this:

``````    while pc:
# Fetch opcode
opcode = Triton.getConcreteMemoryAreaValue(pc, 16)

# Create the Triton instruction
instruction = Instruction()
instruction.setOpcode(opcode)

# Process
Triton.processing(instruction)
count += 1

print("Emulating %s" % (instruction))

#print instruction

if instruction.getType() == OPCODE.X86.RET:
break

# Next
pc = Triton.getConcreteRegisterValue(Triton.registers.eip)
``````

Once we've called `Triton.processing()`, we're totally able to access both the details of the `Instruction` and the state of the `TritonContext` engine. Here's a quick and dirty way to handle use this to handle basic dereferences:

``````        if instruction.isMemoryRead():
written_register, write_ast_node = instruction.getWrittenRegisters()
variable = expression.getAst().getSymbolicVariable()
alias = variable.getAlias()
newalias = "(%s)" % alias
Triton.symbolizeRegister(written_register, newalias)
``````

This works fine with simple derefs like `mov eax, [esp]`, and it's really up to us how we want to express these arrays that we're creating. Here's the end result:

``````\$ python3 tritonbasic2.py
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx]
Emulating 0x401007: mov eax, dword ptr [eax]
Emulating 0x401009: ret
Instructions executed: 5
Expression tree:
{6: ref_6 = ((esp)),
12: ref_12 = ((((esp)))),
13: ref_13 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401009: ret,
15: ref_15 = (esp)}
Final state of eax:
((((esp))))
Final state of ebp:
0x0
Final state of esp:
(esp)
``````

• Triton is smart enough to figure out when we've overwritten something. You'll note there are gaps in the reference numbers, this is because we override things (e.g. reusing `eax`, incrementing `eip`)
• It only took a few lines of code but we can now handle nested dereferences without any real effort

Things to watch out for:

• This dereferencing will work with `mov` instructions, but these aren't the only instructions that access memory (consider `pop` and `lodsb` for starters). Some of these other instructions will modify the pointer register, and we need to also handle incrementing/decrementing this and make sure we have some way of handling this
• If we alter the register that we're dereferencing we need to find a way to handle this too.

Let's consider some code like this:

``````mov eax, [esp]
mov edx, [eax]
mov eax, [edx+8]
mov eax, [eax]
mov ebx, [eax+4]
mov ecx, [eax-4]
ret
``````

We get this for output:

``````\$ python3 tritonbasic2.py
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx + 8]
Emulating 0x401008: mov eax, dword ptr [eax]
Emulating 0x40100a: mov ebx, dword ptr [eax + 4]
Emulating 0x40100d: mov ecx, dword ptr [eax - 4]
Emulating 0x401010: ret
Instructions executed: 7
Expression tree:
{6: ref_6 = ((esp)),
12: ref_12 = ((((esp)))),
15: ref_15 = (((((esp))))),
18: ref_18 = (((((esp))))),
19: ref_19 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401010: ret,
21: ref_21 = (esp)}
Final state of eax:
((((esp))))
Final state of ebp:
0x0
Final state of esp:
(esp)
``````

Clearly this is wrong... so we need to add a little more smarts to our lookup code. The big thing is that we look at `instruction.getReadRegisters()` and this doesn't get us the offset, so we need to find a `MemoryAccess` object instead. Let's try this on for size:

``````        if instruction.isMemoryRead():
written_register, write_register_ast_node = instruction.getWrittenRegisters()
expression_ast = expression.getAst()
#import pdb
#pdb.set_trace()
if expression_ast.getType() == AST_NODE.VARIABLE:
variable = expression_ast.getSymbolicVariable()
alias = variable.getAlias()
displacement = memory_access.getDisplacement().getValue()
newalias = "(%s)[0x%x]" % (alias, displacement)
#newalias = "(%s)" % alias
Triton.symbolizeRegister(written_register, newalias)
elif expression_ast.getType() == AST_NODE.CONCAT:
import pdb
pdb.set_trace()
pass
else:
import pdb
pdb.set_trace()
raise Exception("Unexpected ast node")
``````

And the output:

``````\$ python3 tritonbasic2.py
Emulating 0x401000: mov eax, dword ptr [esp]
Emulating 0x401003: mov edx, dword ptr [eax]
Emulating 0x401005: mov eax, dword ptr [edx + 8]
Emulating 0x401008: mov eax, dword ptr [eax]
Emulating 0x40100a: mov ebx, dword ptr [eax + 4]
Emulating 0x40100d: mov ecx, dword ptr [eax - 4]
Emulating 0x401010: ret
Instructions executed: 7
Expression tree:
{6: ref_6 = ((esp)[0x0])[0x0],
12: ref_12 = ((((esp)[0x0])[0x0])[0x8])[0x0],
15: ref_15 = (((((esp)[0x0])[0x0])[0x8])[0x0])[0x4],
18: ref_18 = (((((esp)[0x0])[0x0])[0x8])[0x0])[0xfffffffc],
19: ref_19 = ((((0x0) << 8 | 0x0) << 8 | 0x0) << 8 | 0x0) # Program Counter - 0x401010: ret,
21: ref_21 = (esp)[0x0]}
Final state of eax:
((((esp)[0x0])[0x0])[0x8])[0x0]
Final state of ebx:
(((((esp)[0x0])[0x0])[0x8])[0x0])[0x4]
Final state of ecx:
(((((esp)[0x0])[0x0])[0x8])[0x0])[0xfffffffc]
Final state of ebp:
0x0
Final state of esp:
(esp)[0x0]
``````

So this looks to handle nested memory reads, including indices. There's a little more to handle though, as I mentioned this still doesn't handle pop/lodsb style instructions, nor the more complicated `mov eax, [ebx*4+ecx+12]`, but you can see how the pattern works.

I hope this was useful, have fun hacking!