Bulk populating encrypted import tables in Binary Ninja
Using Binary Ninja's Python API to label all functions that are dynamically loaded by hash
Hashing function names slows down reversers
It's common for packed and otherwise obfuscated binaries to effectively user their own shellcode to populate the imports that they plan to use. This does two things:
- It hides imports from the reverser that might otherwise stick out in the import table
- It makes the packer/obfuscator code more portable, it can just add itself onto an existing binary
For those of you who aren't familiar with how you can populate your own import table, it looks something like this
- Load the PEB (fs:[0x18][0x30] in x86, gs:[0x30][0x60] in x64)
- Load the head of the module list at
PEB->Ldr->InLoadOrderModuleList
- Iterate through this list until you get back to the start (sad face) or find your module name
(By the way, Binary Ninja 3.2 now supports offset pointers which tidies things up a lot when traversing win32 structs that are tied together using LIST_ENTRY
pointers)
For packers that want to obfuscate things further, they can hash the names of the DLLs and imported functions so that these don't show up when searching for strings. All we need to do when iterating is to hash the name of every DLL we get to and compare it against the hash we're looking for.
Once we have a module base address, we parse the PE header, get to the export table, and traverse both the name table and offset table in lock-step (these don't point to each other but rather the names and addresses are stored in the same order, so entry X in the name table matches entry X in the offset table). When we find the (hashed) name we're looking for, we return the corresponding function address.
What this means for us as reversers is we end up finding a function that looks like this:
Note: I've already reversed the GetModuleBase_
and GetProcAddress_
functions, these are an exercise for the reader :)
Once we've extracted the hashing function, we can run it over a few standard DLL names that we know will be loaded (kernel32, ntdll), confirm in the code, then update our labels:
Now we can see that we're loading a bunch of functions from kernel32 and ntll. Let's see if we can speed up this labelling a little:
Getting all the references
The Binary Ninja API is extensive but it takes a bit of playing to find what we're looking for. I recommend using a combination of searching the API docs and playing around in the python console to piece together what you're looking for.
For starters, we'll go to our function by double clicking on it, and make sure the signature is right.
The first argument is where we store the function address, the second argument is the address of our module, the third is the hash we expect, and the fourth we don't care about here. We'll update them to make sure the types are correct (press Y to change a function type) and here we are:
If you look in the bottom-left-hand corner you'll see a list of references to this function, we want to get this programmatically
We can do this with the bv.get_code_refs()
command. Protip: if we highlight the top of the function, we'll get the address in the here
variable, so we can just call bv.get_code_refs(here)
and it'll do the same thing as manually typing in the address:
>>> [hex(x.address) for x in bv.get_code_refs(here)]
['0x12423a1', '0x1242c6d', '0x1242c7f', '0x1242c91', '0x1242ca3', '0x1242cb5', '0x1242ce3', '0x1242cf5', '0x1242d07', '0x1242d19', '0x1242d2b', '0x1242d3d', '0x1242d4f', '0x1242d61', '0x1242d73', '0x1242d85', '0x1242d97', '0x1242da9', '0x1242dbb', '0x1242dcd', '0x1242ddf', '0x1242df1', '0x1242e03', '0x1242e15', '0x1242e27', '0x1242e39', '0x1242e4b', '0x1242e5d', '0x1242e6f', '0x1242e81', '0x1242e93', '0x1242ea5', '0x1242eb7', '0x1242ec9', '0x1242edb', '0x1242eed', '0x1242eff', '0x1242f11', '0x1242f23', '0x1242f35', '0x1242f47', '0x1242f59', '0x1242f6b', '0x1242f7d', '0x1242f8f', '0x1242fa1', '0x1242fb3', '0x1243079', '0x124308b', '0x124309d', '0x12430af', '0x12430c1', '0x12430d3', '0x12430e5']
We want to get the HLIL instruction at that address. I couldn't find a direct way to get this, so we'll grab the function from the reference, look at the list of HLIL instructions, and extract the one that matches our address:
>>> list(filter(lambda x: x.address == ref.address, ref.function.hlil.instructions))[0]
<HLIL_CALL: GetProcAddress_(&data_12471ed, pKernel32, 0x4dd0a472, 0)>
Once we've got this we can extract the things we care about: the name of the DLL base variable we pass (pKernel32), the hash we're looking up (0x4dd0a472), and the var where we plan to store the result (data_12471ed). We can extract these from the instruction_operands property as follows:
DLL variable name:
>>> inst.instruction_operands[2]
<HLIL_VAR: pKernel32>
>>> type(inst.instruction_operands[2])
<class 'binaryninja.highlevelil.HighLevelILVar'>
>>> inst.instruction_operands[2].var.name
'pKernel32'
Hash value:
>>> inst.instruction_operands[3]
<HLIL_CONST: 0x4dd0a472>
>>> inst.instruction_operands[3].constant
1305519218
>>> hex(inst.instruction_operands[3].constant)
'0x4dd0a472'
Output variable address:
>>> inst.instruction_operands[1]
<HLIL_CONST_PTR: &data_12471ed>
>>> inst.instruction_operands[1].constant
19165677
>>> hex(inst.instruction_operands[1].constant)
'0x12471ed'
Finally, when we want to set a variable name we use bv.define_data_var()
to set it:
bv.define_data_var(hlil_inst.instruction_operands[1].constant, "void*", "p%s" % functionname)
Let's put it all together!
Scripting the bulk change
I've left the hash function out of this article as it is implementation specific, but once we have a function that will generate hashes we can use lief
to parse the export table of our DLL of choice and create a lookup table:
from hashfunctionname import hash_function_name
import lief
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("dllpath")
args = parser.parse_args()
binary = lief.PE.parse(args.dllpath)
lookup = {}
for function in binary.exported_functions:
lookup[hash_function_name(function.name)] = function.name
print(lookup)
We've named our DLL reference variables as pKernel32
and pNtDllLocal
(note the HLIL uses the local var rather than the global one for ntdll), so we can make a lookup table for multiple dlls if we structure it like this:
procaddress_by_hash = {
'pNtDllLocal' : {
2777868780: 'A_SHAFinal',
...
},
'pKernel32 : {
584423213: 'AcquireSRWLockExclusive',
..
}
}
Finally, here's the script that does all the heavy lifting:
for ref in bv.get_code_refs(here):
hlil_inst = list(filter(lambda x: x.address == ref.address, ref.function.hlil.instructions))[0]
if not isinstance(hlil_inst.instruction_operands[2], HighLevelILVar):
log_info("second arg isn't a var so can't check it")
continue
dll_name = hlil_inst.instruction_operands[2].var.name
if dll_name not in procaddress_by_hash:
log_info("Couldn't find %s in lookup table" % dll_name)
continue
lookup_table = procaddress_by_hash[dll_name]
hash = hlil_inst.instruction_operands[3].constant
if hash not in lookup_table:
log_info("couldn't find hash %08X in lookup table" % hash)
functionname = lookup_table[hash]
log_info("setting var at %08X to %s" % (ref.address, functionname))
bv.define_data_var(hlil_inst.instruction_operands[1].constant, "void*", "p%s" % functionname)
If we switch to the log tab we can see how it went, or just switch back to one of our references to see the results:
Now these are all labelled, we'll be able to identify when they're being called later in the code.
Future work
There's one reference that I couldn't handle automatically because the DLL base reference isn't stored in an intermediate variable that I could rename. We can handle this case by checking the name of the function that is called here and looking up the DLL name by hash if it's calling GetModuleBase_()
, but I'll leave this as an exercise for the reader.
I hope you find this helpful, happy hacking everyone.