16 April 2018

Binary Fingerprint

In recent update to my tiny Binja plugin I've added something named 'Binary Fingerprint'. It looks roughly like this:

It does not represent anything groundbreaking to be honest, but let me explain everything from the start.

In x86/x64 assembly language we can group instructions according to their primary function. Those are:

  • Data operations (mov*, pop, push, lea)
  • Floating point operations
  • Arthmetic operations (add, xor, shr ...)
  • Dataflow operations (call, jmp ...)
  • Other operations
To every instruction group we assign different color. Now lets map all assembly instructions in a function into two dimensional space. What we get is an image representing layout of different groups of instruction. What this tells us about function?

For experiments sake we will analyze /bin/tar using my technique to spot interesting functions. First type of function that attracts my eye are simple functions like this:

Let's see - a lot of data operations and some calls. What this function might do?
Ok, just freeing some data - we can probably mark it not so interesting for further analysis. Same principle applies to every fingerprint showing mostly one group of instruction.

Let's try to find something more useful. How about this one?

Lots of arithmetic operations - if we are looking for cryptographic/key generation/(un)packing type of function might we want to check all exhibiting similar characteristic.

Last one:

Wow, that looks messy - lots of dataflow mixed with arithmetic operations. That might indicate something I call 'routing' function - lots of decisions about future flow but not much of real data operations.

You might have noticed that some of the images look bit strange - like incomplete. Reason for that is choice of space filling curve - namely Hilbert Curve. Because this curve spans over 2n * 2n squares I cannot always fill whole space and need to stop drawing leaving rest of the space blank. Why this curve you ask? First and foremost reason is that it give a mapping between 1D and 2D space that fairly well preserves locality - long stream of instructions from the same group will form a cluster instead of a line (if I would have decided to fill space line by line).

To be quite honest, I don't think this fingerprint revolutionize reverse engineering. Still, it won't hurt to take a look at it while facing some binary with many functions - at least it might give you some idea where to look or not to look at the beginning.