8 January 2019

Grumpy Unicorn

Intro

Last few evenings I’ve spent playing with Unicorn Engine. Well, last few is a figure of speech here, as I was planing to write this text long time ago. Anyhow, I just sat down to learn it a bit more as I never really had a chance to develop better understanding and flow when it comes to this engine.
In the end I’ve solved three optimization challenges and one obfuscation problem. I’ve also read some code written by other people (hi Gynvael). That does not make me an expert. Far from it. I, however learned something about it and I’m ready to complain.

What is wrong with the engine

Ultimately there is nothing wrong - engine works well (minus some weird quirks in interpreting some instructions, but if I remember correctly QEMU is the one to blame).
People I’ve talked to complained about speed of execution, but in my cases it wasn’t an important factor. What is more I haven’t done any proper benchmark, so I really don’t have an informed opinion. Well, to be honest I see one place where execution slows down, but this is just an assumption. More of that later.

So, why this section even mentioned wrongness in the first place? Simply because there is a lot of wrong (in my opinion) in what API offers, how it works and how some things are structured. I’m not even going to start ranting about documentation, because at least there are some code samples covering wide array of functionality that one can read. Still, proper docs like for example provide by Binary Ninja would be nice.

Superfluous const naming

I’m not a big fan of typical style of imports where you pollute main namespace with all possible functions and classes like from unicorn import all.
Style I’m accustomed to the most comes from our python styleguide, therefore if there is no proper module nesting I can always do import unicorn and when creating classes I know where everything comes from.
Now, what I like about Unicorn is that constants have their own namespace. Even better, every architecture has its own namespace. And while they do, why oh why are they named with architecture prefix. Let me explain with this tiny code sample:

import unicorn as un  
import unicorn.x86_const as const  
...  
engine.reg_write(const.UC_X86_REG_RAX, ret_rax)  
...  

So, I’ve imported x86_const - I know that RAX register constant comes from Unicorn and from x86_const. I don’t want to write UC_X86_REG_RAX every time I want to access it.
I know this is just a tiny inconvenience and pretty much after typing it once any reasonable editor will complete it for you but still, this can be improved.

Setup phase

You start new emulation project by pretty much write exactly the same code every time

engine = un.Uc(un.UC_ARCH_X86, un.UC_MODE_32)  
  
# Setup Code section  
engine.mem_map(BASE, SIZE)  
  
# Setup stack  
engine.mem_map(STACK_ADDR, SIZE)  
engine.reg_write(const.UC_X86_REG_ESP, STACK_ADDR + (SIZE/2))  
engine.reg_write(const.UC_X86_REG_EBP, STACK_ADDR + (SIZE/2))  
  
# Copy code  
engine.mem_write(BASE, read_prog(sys.argv[1]))  
  
# start  
engine.emu_start(START, STOP)  

All values like BASE, SIZE, START and STOP you have to retrieve manually by reading the header either via readelf or throwing given binary into your reverse engineering platform of choice.
This is tedious and I would really love some nice helper functions. It can be either high level like load_elf() or some medium level shortcut methods of Uc engine like setup_stack(bottom, top).

Another thing that really annoys me is how sometimes we need to skip certain instructions, either because they are making a call to a shared library (that we obviously have not loaded) or perform some IO operations. Typical code doing such task looks like this:

skip_list = [  
0x40058A, # call _printf  
]  
  
if address in skip_list:  
 engine.reg_write(const.UC_X86_REG_EIP, address+size)  

Not only you have to maintain a list of instructions to skip but also you have to manually adjust instruction pointer. First problem is hard to solve automatically, because engine might not know what exact instructions we want to skip, but manual adjustment of register is just ugly. I would love to have this as a core functionality.

Hooks.

The worst thing in my personal opinion is how we are forced to use hooks. For every type of hook you define one global callback function. One.
Now, let’s say you want to do three different operations in three distinct addresses - of course we all know how this is going to look in the code - tree of ifs.
Typical example of this situation we can for example observe in Unicorn tutorial by Eternal Red

if address in skip_list:  
 engine.reg_write(const.UC_X86_REG_RIP, address+size)  
elif address == 0x400560:  
 c = engine.reg_read(const.UC_X86_REG_RDI)  
 key.append(chr(c))  
 engine.reg_write(const.UC_X86_REG_RIP, address+size)  
elif address == FIB_START:  
 arg0 = engine.reg_read(const.UC_X86_REG_RDI)  
 rsi = engine.reg_read(const.UC_X86_REG_RSI)  
 arg1 = u32(engine.mem_read(rsi, 4))  
  
if (arg0, arg1) in know_vals:  
 ret_rax, ret_ref = know_vals[(arg0, arg1)]  
 engine.reg_write(const.UC_X86_REG_RAX, ret_rax)  
 engine.mem_write(rsi, p32(ret_ref))  
 engine.reg_write(const.UC_X86_REG_RIP, 0x4006F1)  
else:  
 stack.append((arg0, arg1, rsi))  

Same of course goes for UC_HOOK_MEM_* and other types of hooks. It also means, that your python function gets called for every instruction you execute - I can only imagine what impact it has on performance. This mess begs for a per address hooks (but truth be told, I don’t know QEMU internals enough to say if this is even possible).

Misc

There are two more problems that you need to solve during typical emulation process and you have to do it manually.

First, reading string from memory - there is no shortcut for that. Basically you need to read byte by byte in a loop until NULL value.
Second - shortcut for read value pointed by reg like [eax] that requires writing two instructions

rsi = engine.reg_read(const.UC_X86_REG_RSI)  
val = u32(engine.mem_read(rsi, 4))  

Summary

In the end - Unicorn seems to be a nice emulation engine. Fairly approachable and easy to use. Don’t let the old man ranting disdain you.
All I wish for is just better API so I don’t have to write the same snippets of code again and again. It looks like I will eventually have to write those shortcuts functions myself.

16 April 2018

Binary Fingerprint

In recent update to my tiny Binja plugin I've added something named 'Binary Fingerprint'. It looks roughly like this:

It does not represent anything groundbreaking to be honest, but let me explain everything from the start.

In x86/x64 assembly language we can group instructions according to their primary function. Those are:

  • Data operations (mov*, pop, push, lea)
  • Floating point operations
  • Arthmetic operations (add, xor, shr ...)
  • Dataflow operations (call, jmp ...)
  • Other operations
To every instruction group we assign different color. Now lets map all assembly instructions in a function into two dimensional space. What we get is an image representing layout of different groups of instruction. What this tells us about function?

For experiments sake we will analyze /bin/tar using my technique to spot interesting functions. First type of function that attracts my eye are simple functions like this:

Let's see - a lot of data operations and some calls. What this function might do?
Ok, just freeing some data - we can probably mark it not so interesting for further analysis. Same principle applies to every fingerprint showing mostly one group of instruction.

Let's try to find something more useful. How about this one?

Lots of arithmetic operations - if we are looking for cryptographic/key generation/(un)packing type of function might we want to check all exhibiting similar characteristic.

Last one:

Wow, that looks messy - lots of dataflow mixed with arithmetic operations. That might indicate something I call 'routing' function - lots of decisions about future flow but not much of real data operations.

You might have noticed that some of the images look bit strange - like incomplete. Reason for that is choice of space filling curve - namely Hilbert Curve. Because this curve spans over 2n * 2n squares I cannot always fill whole space and need to stop drawing leaving rest of the space blank. Why this curve you ask? First and foremost reason is that it give a mapping between 1D and 2D space that fairly well preserves locality - long stream of instructions from the same group will form a cluster instead of a line (if I would have decided to fill space line by line).

To be quite honest, I don't think this fingerprint revolutionize reverse engineering. Still, it won't hurt to take a look at it while facing some binary with many functions - at least it might give you some idea where to look or not to look at the beginning.

5 March 2018

Plugin: 10k view on functions

I'm always writing code for all the wrong reasons. Last weekend I had an idea to do large scale testing of one of my plugins for Binary Ninja - Annotator. There is however a little use of testing your plugin on code you wrote and compiled yourself, so at some point you have to hit the proverbial road and start testing things in the wild. Before I even had some kind of structured plan I've opened some binaries on the system. I think I've picked up chmod. Now, I was interested in in some not terribly complicated functions that call some libc functions. I've opened one at random - 10 instructions and no calls. Next one - 30 basic blocks, 130 instructions (or something). I think I gave up with manual hit-and-miss after third one.

So, I've started writing another plugin to help me with development of my another plugin. I've decided to simply count instruction, basic blocks, functions calls and code xrefs (last one inspired by one person on binary ninja slack). After quickly putting together some code I've decided to put more structure around it so I can extend it later on with additional reports. I've named it Keyhole.

My biggest problem right now is that HTML widget offered by QT has very weak support for styles, therefore the report looks like, well, let's just say far from my ascetics expectations. I guess I will have to, instead of relying on built-in browser save it on disk and launch separate browser instance to pick it up from there. This time with proper styling.

And here question for my lovely audience - what other characteristics you are looking at before you even begin reversing a binary. Is there anything you might consider worth adding? My idea is to add some sort of entropy heat map (like here) and some other things. Well, that will have to wait of course, at least until I wrap up changes to Annotator.

24 January 2018

Binary Ninja Recipes #2

This time I want to explore two problems I saw before while writing plugins for Binary Ninja. First problem, steaming from development of Annotator plugin (and I need to implement what I've learned here) and second is influenced by paper on Static Analysis and theory of lattices (if of course I've understood it correctly).

Problem 1: walk the graph

Let's say we want to track the state of a given variable in a program and all standard methods (get_rag_value_at(), SSA) don't apply. Good example is my plugin where I want to track instructions that influenced given variable (in that case function argument). OK, I'm cheating a bit here in a sense that I haven't yet tried SSA approach - more about that in next time. For now, let's get back to our problem at hand.

In BinaryNinja blocks of a given function are available through an array. Let's take a look at the example bellow

So now we access those blocks programmatically: After running this code we get following result:
1 -> 0x804840bL
2 -> 0x8048450L
3 -> 0x8048435L
4 -> 0x8048455L
I guess you can see the problem right away. When we iterate over blocks we get them sequentially,but not exactly in the order that actual code might execute. Here, we will be processing blocks 2 and 3 after each other, while actually they will never be executed in the same code run (I'm assuming you are reading this in bright future where speculative execution bug was addressed once and for all). Truth be told, all functions parameters should be placed on the stack/in registers in the same block the function is being called, but there is absolutely no guarantee about that. I wasn't sure about that so I've asked Gynvael, to which he responded - "well, for sure it will happen in the same function ...". Thanks buddy.

Fortunately it shouldn't be that difficult to fix that. Well, for certain definition of fix. As you can see there is a simple recursive descent function tracking visited blocks. We are also taking advantage of nifty API feature where every block has incoming and outgoing edges that actually point to other blocks.

So, does this work? Of course it does. Does it solve all problems? No and here is why. This code will work well for all functions with linear flow (with just conditional statements). Things get bit hairy when we introduce blocks with, what BinaryNinja calls back edges. In simple terms - loop statements.

Problem 2: find all paths

So it happened - we hit the loop condition. Like one here
Checking blocks again...
1 -> 0x804840bL
2 -> 0x8048444L
3 -> 0x8048425L
4 -> 0x804844aL
If we use our recursive descent we get two paths: 1->2->3 and 1->2->4. We clearly see this is incorrect and reason for that is condition preventing from revisiting a block that we have already visited. We should be getting 1->2->4 and 1->[2->3->2]*->4 (simplified to 1->2->3->2->4). So, now we know, that blocks can be revisited. What shouldn't be revisited? Of course, edges. Take a look a the code. I think code explains itself pretty well, so there is no point linger too long around it. You might wonder why I'm making local copy of visited edges list. It is fairly simple - in the example bellow you can see that we branch in block 2 and one call stack of walk() is using edge 2->4 and later on, another call stack needs to take this edge again. If I keep single list of visited edges my search does terminate on block 2 missing last step of a path. Fun func: as I was told yesterday, and I blame my poor knowledge of CS I've just reinvented a variant ofrecursive DFS algorithm.

I have tested this code of relatively simple samples so if you have something more complex and it breaks horribly please let me know. Right now I'm just hoping someone will find it useful and I haven't spent my evening doing poor's man implementation of SSA. Well, only one way to find out. See you soon :)

29 November 2017

Binary Ninja Recipes

What is the value of a blog if you don't post something from time to time. But what to publish when you only recognize two kinds of knowledge: something you know, therefore it is trivial and something you don't know, therefore you shouldn't be writing about that? Well, today is the time for some trivial knowledge - Binary Ninja recipes.

Problem 1: how to develop plugins

I was trying to find an optimal way to structure my development environment for plugins for some time. First - for Binary Ninja to discover and run one it must be located in ~/.binaryninja/plugins/ directory (I'm skipping standalone plugins that you can just run from anywhere). Obvious solution is to edit it directly there, but somehow I was seeing this solution as inelegant. At first, I was editing files in my project directory and copying it manually, but after few times it became tedious. So, in the next step I've developed universal shell script that was taking plugin files and deploying it to relevant directory in binary ninja tree. That however had one tiny flaw - I had to remember to execute the deployment. Multiple times in my flow I was restarting Binary Ninja, opening binary file and executing plugin only to realize I'm still running old version of the code.
My next try was with Binary Ninja internal plugin system - it can fetch code from remote git repository and just make it run locally. But still, it was too complicated for a simple problem I was facing. I've asked good people on Binary Ninja Slack channel and I've adjusted my workflow basing it on few suggestions.

I primarily use git during my development, so I can later push things to github.com. I keep two main branches - stable and dev. Now, in addition to that I basically soft link my project directory under binary ninja plugin directory. When I want to develop new feature I switch to dev branch and I get instant deployment for free and when I just want to use it I checkout stable version. (I told you this is going to be trivial).

Problem 2: Binary Reader

Now, something more technical. Let's say you want, for some reason, to read/scan whole binary you've loaded into binary ninja; to, for example, find some pattern. My initial idea was to do it like this:
# bv stands for BinaryView
for addr in range(bv.start, bv.end):
  b = bv.read(addr, 1)
This approach has few flaws. First of all, return type is string, so if for example you want to read 4 bytes and compare it against value like 0x41414141 you need to unpack it into correct type. Second one is you can't move forward and backward with ease. I've decided that it would be better to use Binary Reader, so I wrote this:
br = bn.BinaryReader(bv)

while not br.eof:
  f_byte = br.read8()
In theory that should scan every byte of a binary, mapped or not. Every read8() call move internal read offset by one byte and return value correspond to relevant function being called. There was on small problem with that code - if ended up with infinite loop. It took me while to understand, what is going on. So, basically, if a read step out of mapped segment and returns null value it stops moving internal offset, hence the infinite loop. Improved version of the code now looks roughly like this:
br = bn.BinaryReader(bv)

while not br.eof:
  if bv.is_valid_offset(br.offset):
    f_byte = br.read8()
  else:
    br.seek_relative(1)
Now it works smoothly.

From now on I will try to write short pieces of How I do things style posts, especially about Binary Ninja. I've even started drafting something I refuse to call book, but if I have enough material related to writing Binary Ninja plugins, who knows. Let me know what do you think about all of this! Next time I will try to write some more about Binary Ninja plugin repository management.

15 March 2017

Nobody expected 64 bits

Apparently if you are not mortally embarrassed by the quality of your code you are releasing it too late [(tm) Silicon Valley]. But to use another only-too-often-used-quote - "Release early, release often". I've made mistakes of hoarding my tools and code for too long, not releasing them because they weren't perfect. This was obviously road to nowhere because if I don't release, nobody uses it. And if nobody uses it I have no motivation to develop it anymore. So, to break this circle I present you a new version of function Annotator for Binary Ninja.

First thing worth mentioning is a new database of functions prototypes. To be exact we now have 4728 prototypes. From this place big thanks to Zach Riggle for his functions project - this update would not be here if not for him.

Next thing is virtual stack for x64 platform - from now on you can also annotate 64-bit applications for Intel/AMD processors.

One small thing that I still need to properly implement is full support for functions operating on floating point types (float, double and long double). Right now they are not properly annotated and there are two important reasons for that:
For 32 bit platform floating point arguments are pushed on the stack using instructions like fstp or fst. Sadly, Binary Ninja right now does not have a corresponding Low Level Instructions for those. They are just showing as unimplemented(). The moment Binary Ninja starts supporting them I just add some more parsing code and everything will be fine.
64 bit platform is slightly more complicated. First of all, arguments to functions are passed via registers. Integers, pointers and such are passed through 6 registers - RDI, RSI, RDX, RCX, R8 and R9 and order matters. Floating point arguments are passed via XMM0-7 registers. Now, let's imagine that we have two functions f1(int, float) and f2(float, int). What will compiler do? Well, on Linux, in case of f1() first argument will end up in RDI and second in XMM0, but in f2 first argument will end up in XMM0 and second one in RDI.
"Wait a minute" - you will say - "but this is exactly the same". I'm glad you are seeing the same problem. Just having state of registers won't tell us what the first and the second argument is unless you know types in the first place. Virtual Stack does not know types, so until I refactor my code FP types won't be supported.

New updates are planned so stay tuned! And of course, please let me know what you think about it and report all bugs.

21 February 2017

Annotate all the things

I don't do reverse engineering for a living but I still like to peek under the hood of binaries from time to time. Either because of testing, looking for bugs or just for fun. Problem is, that IDA Pro, de-facto standard tool for any Reverse Engineer is prohibitively expensive for most of the people. On top of that, licensing policy is very annoying and illogical. But enough about IDA Pro - let's talk about new contender on this field - Binary Ninja.

I'm not going to repeat all the praises that this tool is receiving. Instead, you may for example read how you can use it to automatically reverse 2000 binaries or maybe how the underlying Low Level Instrumentation Language works. All in all platform looks very promising and I couldn't wait to try it after seeing it for the first time. Couple of months ago I was playing with the Beta and pretty much bought it first day it was released.

There is one tiny problem with Binary Ninja however - IDA Pro was here for years, therefore it is both feature rich and ecosystem around it is pretty robust. Binja still has a long way to go in this department - there are not that many useful plugins and some features are missing. One thing I've noticed for example is that while reversing basic libc functions and system calls are not annotated in any way. There is no prototype of them and arguments are not marked in any way. So instead of complaining I've decided to utilize available API and just fix that.

Let's start by defining a problem. For example we have a listing like this:

Not terribly descriptive, right? Well, at least for strcpy() we roughly remember the prototype so we can quickly find where arguments are being pushed on the stack. But what about fchmodat() or sigaction(). Yeah, you need to get back to man page. How cool would be to open a binary and get this:

This is exactly what Annotator plugin does - it iterates through all instruction in the code building a virtual stack as it goes, but instead of variables it tracks instructions that pushed a given variable on to the stack. Upon encountering a call of known function it uses this virtual stack to annotate it with a proper argument prototype.

This is a very first release so it is probably riddled with bugs. Not to mention some features are missing. Right now not all glibc function prototypes are present because I haven't found a good and reliable way to extract them from headers - instead I'm using a combination of grep, regex and cut with some manual cleanup effort. That unfortunately takes time. Same goes for system calls, but I should be able to put all Linux 32bit ones today. Ah, and you have to run plugin manually in every function you view - right now there is no way to automatically apply it to all the functions - I'm contemplating to write one method allowing user to apply it to whole underlying call graph, but we will see about that.

Another thing is quite naive virtual stack implementation - for sure it requires more work to track stack growth more accurately and for example track number of arguments for functions with va_arg type of arguments. Right now I'm also scanning blocks of code in linear manner, but for future version I will probably switch to recursive mode with stack isolation for each path (well, right now I haven't encountered situation where functions arguments are done in different code block than the call itself, but better safe than sorry). Last thing to improve is number of virtual stacks - first for x64 platforms and later for ARM architecture.

Please, let me know what do you think about the extension and report all the bugs.