29 November 2017

Binary Ninja Recipes

What is the value of a blog if you don't post something from time to time. But what to publish when you only recognize two kinds of knowledge: something you know, therefore it is trivial and something you don't know, therefore you shouldn't be writing about that? Well, today is the time for some trivial knowledge - Binary Ninja recipes.

Problem 1: how to develop plugins

I was trying to find an optimal way to structure my development environment for plugins for some time. First - for Binary Ninja to discover and run one it must be located in ~/.binaryninja/plugins/ directory (I'm skipping standalone plugins that you can just run from anywhere). Obvious solution is to edit it directly there, but somehow I was seeing this solution as inelegant. At first, I was editing files in my project directory and copying it manually, but after few times it became tedious. So, in the next step I've developed universal shell script that was taking plugin files and deploying it to relevant directory in binary ninja tree. That however had one tiny flaw - I had to remember to execute the deployment. Multiple times in my flow I was restarting Binary Ninja, opening binary file and executing plugin only to realize I'm still running old version of the code.
My next try was with Binary Ninja internal plugin system - it can fetch code from remote git repository and just make it run locally. But still, it was too complicated for a simple problem I was facing. I've asked good people on Binary Ninja Slack channel and I've adjusted my workflow basing it on few suggestions.

I primarily use git during my development, so I can later push things to github.com. I keep two main branches - stable and dev. Now, in addition to that I basically soft link my project directory under binary ninja plugin directory. When I want to develop new feature I switch to dev branch and I get instant deployment for free and when I just want to use it I checkout stable version. (I told you this is going to be trivial).

Problem 2: Binary Reader

Now, something more technical. Let's say you want, for some reason, to read/scan whole binary you've loaded into binary ninja; to, for example, find some pattern. My initial idea was to do it like this:

# bv stands for BinaryView
for addr in range(bv.start, bv.end):
  b = bv.read(addr, 1)

This approach has few flaws. First of all, return type is string, so if for example you want to read 4 bytes and compare it against value like 0x41414141 you need to unpack it into correct type. Second one is you can't move forward and backward with ease. I've decided that it would be better to use Binary Reader, so I wrote this:

br = bn.BinaryReader(bv)

while not br.eof:
  f_byte = br.read8()

In theory that should scan every byte of a binary, mapped or not. Every read8() call move internal read offset by one byte and return value correspond to relevant function being called. There was on small problem with that code - if ended up with infinite loop. It took me while to understand, what is going on. So, basically, if a read step out of mapped segment and returns null value it stops moving internal offset, hence the infinite loop. Improved version of the code now looks roughly like this:

br = bn.BinaryReader(bv)

while not br.eof:
  if bv.is_valid_offset(br.offset):
    f_byte = br.read8()
  else:
    br.seek_relative(1)

Now it works smoothly.

From now on I will try to write short pieces of How I do things style posts, especially about Binary Ninja. I've even started drafting something I refuse to call book, but if I have enough material related to writing Binary Ninja plugins, who knows. Let me know what do you think about all of this! Next time I will try to write some more about Binary Ninja plugin repository management.

15 March 2017

Nobody expected 64 bits

Apparently if you are not mortally embarrassed by the quality of your code you are releasing it too late [(tm) Silicon Valley]. But to use another only-too-often-used-quote - "Release early, release often". I've made mistakes of hoarding my tools and code for too long, not releasing them because they weren't perfect. This was obviously road to nowhere because if I don't release, nobody uses it. And if nobody uses it I have no motivation to develop it anymore. So, to break this circle I present you a new version of function Annotator for Binary Ninja.

First thing worth mentioning is a new database of functions prototypes. To be exact we now have 4728 prototypes. From this place big thanks to Zach Riggle for his functions project - this update would not be here if not for him.

Next thing is virtual stack for x64 platform - from now on you can also annotate 64-bit applications for Intel/AMD processors.

One small thing that I still need to properly implement is full support for functions operating on floating point types (float, double and long double). Right now they are not properly annotated and there are two important reasons for that:
For 32 bit platform floating point arguments are pushed on the stack using instructions like fstp or fst. Sadly, Binary Ninja right now does not have a corresponding Low Level Instructions for those. They are just showing as unimplemented(). The moment Binary Ninja starts supporting them I just add some more parsing code and everything will be fine.
64 bit platform is slightly more complicated. First of all, arguments to functions are passed via registers. Integers, pointers and such are passed through 6 registers - RDI, RSI, RDX, RCX, R8 and R9 and order matters. Floating point arguments are passed via XMM0-7 registers. Now, let's imagine that we have two functions f1(int, float) and f2(float, int). What will compiler do? Well, on Linux, in case of f1() first argument will end up in RDI and second in XMM0, but in f2 first argument will end up in XMM0 and second one in RDI.
"Wait a minute" - you will say - "but this is exactly the same". I'm glad you are seeing the same problem. Just having state of registers won't tell us what the first and the second argument is unless you know types in the first place. Virtual Stack does not know types, so until I refactor my code FP types won't be supported.

New updates are planned so stay tuned! And of course, please let me know what you think about it and report all bugs.

21 February 2017

Annotate all the things

I don't do reverse engineering for a living but I still like to peek under the hood of binaries from time to time. Either because of testing, looking for bugs or just for fun. Problem is, that IDA Pro, de-facto standard tool for any Reverse Engineer is prohibitively expensive for most of the people. On top of that, licensing policy is very annoying and illogical. But enough about IDA Pro - let's talk about new contender on this field - Binary Ninja.

I'm not going to repeat all the praises that this tool is receiving. Instead, you may for example read how you can use it to automatically reverse 2000 binaries or maybe how the underlying Low Level Instrumentation Language works. All in all platform looks very promising and I couldn't wait to try it after seeing it for the first time. Couple of months ago I was playing with the Beta and pretty much bought it first day it was released.

There is one tiny problem with Binary Ninja however - IDA Pro was here for years, therefore it is both feature rich and ecosystem around it is pretty robust. Binja still has a long way to go in this department - there are not that many useful plugins and some features are missing. One thing I've noticed for example is that while reversing basic libc functions and system calls are not annotated in any way. There is no prototype of them and arguments are not marked in any way. So instead of complaining I've decided to utilize available API and just fix that.

Let's start by defining a problem. For example we have a listing like this:

Not terribly descriptive, right? Well, at least for strcpy() we roughly remember the prototype so we can quickly find where arguments are being pushed on the stack. But what about fchmodat() or sigaction(). Yeah, you need to get back to man page. How cool would be to open a binary and get this:

This is exactly what Annotator plugin does - it iterates through all instruction in the code building a virtual stack as it goes, but instead of variables it tracks instructions that pushed a given variable on to the stack. Upon encountering a call of known function it uses this virtual stack to annotate it with a proper argument prototype.

This is a very first release so it is probably riddled with bugs. Not to mention some features are missing. Right now not all glibc function prototypes are present because I haven't found a good and reliable way to extract them from headers - instead I'm using a combination of grep, regex and cut with some manual cleanup effort. That unfortunately takes time. Same goes for system calls, but I should be able to put all Linux 32bit ones today. Ah, and you have to run plugin manually in every function you view - right now there is no way to automatically apply it to all the functions - I'm contemplating to write one method allowing user to apply it to whole underlying call graph, but we will see about that.

Another thing is quite naive virtual stack implementation - for sure it requires more work to track stack growth more accurately and for example track number of arguments for functions with va_arg type of arguments. Right now I'm also scanning blocks of code in linear manner, but for future version I will probably switch to recursive mode with stack isolation for each path (well, right now I haven't encountered situation where functions arguments are done in different code block than the call itself, but better safe than sorry). Last thing to improve is number of virtual stacks - first for x64 platforms and later for ARM architecture.

Please, let me know what do you think about the extension and report all the bugs.