In recent update to my tiny Binja plugin I've added something named 'Binary Fingerprint'. It looks roughly like this:
In x86/x64 assembly language we can group instructions according to their primary function. Those are:
- █ Data operations (mov*, pop, push, lea)
- █ Floating point operations
- █ Arthmetic operations (add, xor, shr ...)
- █ Dataflow operations (call, jmp ...)
- █ Other operations
For experiments sake we will analyze /bin/tar using my technique to spot interesting functions. First type of function that attracts my eye are simple functions like this:
Let's try to find something more useful. How about this one?
You might have noticed that some of the images look bit strange - like incomplete. Reason for that is choice of space filling curve - namely Hilbert Curve. Because this curve spans over 2n * 2n squares I cannot always fill whole space and need to stop drawing leaving rest of the space blank. Why this curve you ask? First and foremost reason is that it give a mapping between 1D and 2D space that fairly well preserves locality - long stream of instructions from the same group will form a cluster instead of a line (if I would have decided to fill space line by line).
To be quite honest, I don't think this fingerprint revolutionize reverse engineering. Still, it won't hurt to take a look at it while facing some binary with many functions - at least it might give you some idea where to look or not to look at the beginning.