This time I want to explore two problems I saw before while writing plugins for Binary Ninja. First problem, steaming from development of Annotator plugin (and I need to implement what I've learned here) and second is influenced by paper on Static Analysis and theory of lattices (if of course I've understood it correctly).
Problem 1: walk the graph
Let's say we want to track the state of a given variable in a program and all standard methods (get_rag_value_at(), SSA) don't apply. Good example is my plugin where I want to track instructions that influenced given variable (in that case function argument). OK, I'm cheating a bit here in a sense that I haven't yet tried SSA approach - more about that in next time. For now, let's get back to our problem at hand.In BinaryNinja blocks of a given function are available through an array. Let's take a look at the example bellow
So now we access those blocks programmatically:for block in current_function.basic_blocks: | |
print "%d -> %s"%(block.index+1, hex(block.start)) |
1 -> 0x804840bL 2 -> 0x8048450L 3 -> 0x8048435L 4 -> 0x8048455LI guess you can see the problem right away. When we iterate over blocks we get them sequentially,but not exactly in the order that actual code might execute. Here, we will be processing blocks 2 and 3 after each other, while actually they will never be executed in the same code run (I'm assuming you are reading this in bright future where speculative execution bug was addressed once and for all). Truth be told, all functions parameters should be placed on the stack/in registers in the same block the function is being called, but there is absolutely no guarantee about that. I wasn't sure about that so I've asked Gynvael, to which he responded - "well, for sure it will happen in the same function ...". Thanks buddy.
Fortunately it shouldn't be that difficult to fix that. Well, for certain definition of fix.
def walk_graph(bv, function): | |
start = function.basic_blocks[0] | |
visited = [] | |
walk(start, visited) | |
def walk(block, visited): | |
if block not in visited: | |
visited.append(block) | |
# Action here | |
for edge in block.outgoing_edges: | |
walk(edge.target, visited) |
So, does this work? Of course it does. Does it solve all problems? No and here is why. This code will work well for all functions with linear flow (with just conditional statements). Things get bit hairy when we introduce blocks with, what BinaryNinja calls back edges. In simple terms - loop statements.
Problem 2: find all paths
So it happened - we hit the loop condition. Like one here Checking blocks again...1 -> 0x804840bL 2 -> 0x8048444L 3 -> 0x8048425L 4 -> 0x804844aLIf we use our recursive descent we get two paths: 1->2->3 and 1->2->4. We clearly see this is incorrect and reason for that is condition preventing from revisiting a block that we have already visited. We should be getting 1->2->4 and 1->[2->3->2]*->4 (simplified to 1->2->3->2->4). So, now we know, that blocks can be revisited. What shouldn't be revisited? Of course, edges. Take a look a the code.
def all_paths(bv, function): | |
paths = [] | |
first_block = function.basic_blocks[0] | |
for edge in first_block.outgoing_edges: | |
walk(edge, [], paths) | |
for path in paths: | |
print "---" | |
print "> %s "%path[0].source | |
for edge in path: | |
print "-> %s"%edge.target | |
def walk(edge, edges, paths): | |
if edge not in edges: | |
p_edges = copy.copy(edges) | |
p_edges.append(edge) | |
if len(edge.target.outgoing_edges): | |
for e in edge.target.outgoing_edges: | |
walk(e, p_edges, paths) | |
else: | |
paths.append(p_edges) |
I have tested this code of relatively simple samples so if you have something more complex and it breaks horribly please let me know. Right now I'm just hoping someone will find it useful and I haven't spent my evening doing poor's man implementation of SSA. Well, only one way to find out. See you soon :)