16 April 2018

Binary Fingerprint

In recent update to my tiny Binja plugin I've added something named 'Binary Fingerprint'. It looks roughly like this:

It does not represent anything groundbreaking to be honest, but let me explain everything from the start.

In x86/x64 assembly language we can group instructions according to their primary function. Those are:

  • Data operations (mov*, pop, push, lea)
  • Floating point operations
  • Arthmetic operations (add, xor, shr ...)
  • Dataflow operations (call, jmp ...)
  • Other operations
To every instruction group we assign different color. Now lets map all assembly instructions in a function into two dimensional space. What we get is an image representing layout of different groups of instruction. What this tells us about function?

For experiments sake we will analyze /bin/tar using my technique to spot interesting functions. First type of function that attracts my eye are simple functions like this:

Let's see - a lot of data operations and some calls. What this function might do?
Ok, just freeing some data - we can probably mark it not so interesting for further analysis. Same principle applies to every fingerprint showing mostly one group of instruction.

Let's try to find something more useful. How about this one?

Lots of arithmetic operations - if we are looking for cryptographic/key generation/(un)packing type of function might we want to check all exhibiting similar characteristic.

Last one:

Wow, that looks messy - lots of dataflow mixed with arithmetic operations. That might indicate something I call 'routing' function - lots of decisions about future flow but not much of real data operations.

You might have noticed that some of the images look bit strange - like incomplete. Reason for that is choice of space filling curve - namely Hilbert Curve. Because this curve spans over 2n * 2n squares I cannot always fill whole space and need to stop drawing leaving rest of the space blank. Why this curve you ask? First and foremost reason is that it give a mapping between 1D and 2D space that fairly well preserves locality - long stream of instructions from the same group will form a cluster instead of a line (if I would have decided to fill space line by line).

To be quite honest, I don't think this fingerprint revolutionize reverse engineering. Still, it won't hurt to take a look at it while facing some binary with many functions - at least it might give you some idea where to look or not to look at the beginning.

5 March 2018

Plugin: 10k view on functions

I'm always writing code for all the wrong reasons. Last weekend I had an idea to do large scale testing of one of my plugins for Binary Ninja - Annotator. There is however a little use of testing your plugin on code you wrote and compiled yourself, so at some point you have to hit the proverbial road and start testing things in the wild. Before I even had some kind of structured plan I've opened some binaries on the system. I think I've picked up chmod. Now, I was interested in in some not terribly complicated functions that call some libc functions. I've opened one at random - 10 instructions and no calls. Next one - 30 basic blocks, 130 instructions (or something). I think I gave up with manual hit-and-miss after third one.

So, I've started writing another plugin to help me with development of my another plugin. I've decided to simply count instruction, basic blocks, functions calls and code xrefs (last one inspired by one person on binary ninja slack). After quickly putting together some code I've decided to put more structure around it so I can extend it later on with additional reports. I've named it Keyhole.

My biggest problem right now is that HTML widget offered by QT has very weak support for styles, therefore the report looks like, well, let's just say far from my ascetics expectations. I guess I will have to, instead of relying on built-in browser save it on disk and launch separate browser instance to pick it up from there. This time with proper styling.

And here question for my lovely audience - what other characteristics you are looking at before you even begin reversing a binary. Is there anything you might consider worth adding? My idea is to add some sort of entropy heat map (like here) and some other things. Well, that will have to wait of course, at least until I wrap up changes to Annotator.

24 January 2018

Binary Ninja Recipes #2

This time I want to explore two problems I saw before while writing plugins for Binary Ninja. First problem, steaming from development of Annotator plugin (and I need to implement what I've learned here) and second is influenced by paper on Static Analysis and theory of lattices (if of course I've understood it correctly).

Problem 1: walk the graph

Let's say we want to track the state of a given variable in a program and all standard methods (get_rag_value_at(), SSA) don't apply. Good example is my plugin where I want to track instructions that influenced given variable (in that case function argument). OK, I'm cheating a bit here in a sense that I haven't yet tried SSA approach - more about that in next time. For now, let's get back to our problem at hand.

In BinaryNinja blocks of a given function are available through an array. Let's take a look at the example bellow

So now we access those blocks programmatically: After running this code we get following result:
1 -> 0x804840bL
2 -> 0x8048450L
3 -> 0x8048435L
4 -> 0x8048455L
I guess you can see the problem right away. When we iterate over blocks we get them sequentially,but not exactly in the order that actual code might execute. Here, we will be processing blocks 2 and 3 after each other, while actually they will never be executed in the same code run (I'm assuming you are reading this in bright future where speculative execution bug was addressed once and for all). Truth be told, all functions parameters should be placed on the stack/in registers in the same block the function is being called, but there is absolutely no guarantee about that. I wasn't sure about that so I've asked Gynvael, to which he responded - "well, for sure it will happen in the same function ...". Thanks buddy.

Fortunately it shouldn't be that difficult to fix that. Well, for certain definition of fix. As you can see there is a simple recursive descent function tracking visited blocks. We are also taking advantage of nifty API feature where every block has incoming and outgoing edges that actually point to other blocks.

So, does this work? Of course it does. Does it solve all problems? No and here is why. This code will work well for all functions with linear flow (with just conditional statements). Things get bit hairy when we introduce blocks with, what BinaryNinja calls back edges. In simple terms - loop statements.

Problem 2: find all paths

So it happened - we hit the loop condition. Like one here
Checking blocks again...
1 -> 0x804840bL
2 -> 0x8048444L
3 -> 0x8048425L
4 -> 0x804844aL
If we use our recursive descent we get two paths: 1->2->3 and 1->2->4. We clearly see this is incorrect and reason for that is condition preventing from revisiting a block that we have already visited. We should be getting 1->2->4 and 1->[2->3->2]*->4 (simplified to 1->2->3->2->4). So, now we know, that blocks can be revisited. What shouldn't be revisited? Of course, edges. Take a look a the code. I think code explains itself pretty well, so there is no point linger too long around it. You might wonder why I'm making local copy of visited edges list. It is fairly simple - in the example bellow you can see that we branch in block 2 and one call stack of walk() is using edge 2->4 and later on, another call stack needs to take this edge again. If I keep single list of visited edges my search does terminate on block 2 missing last step of a path. Fun func: as I was told yesterday, and I blame my poor knowledge of CS I've just reinvented a variant ofrecursive DFS algorithm.

I have tested this code of relatively simple samples so if you have something more complex and it breaks horribly please let me know. Right now I'm just hoping someone will find it useful and I haven't spent my evening doing poor's man implementation of SSA. Well, only one way to find out. See you soon :)