128nops - life of a pentester

Grumpy Unicorn

2019-01-08T01:48:00.002+01:00

Intro

Last few evenings I’ve spent playing with Unicorn Engine. Well, last few is a figure of speech here, as I was planing to write this text long time ago. Anyhow, I just sat down to learn it a bit more as I never really had a chance to develop better understanding and flow when it comes to this engine.
In the end I’ve solved three optimization challenges and one obfuscation problem. I’ve also read some code written by other people (hi Gynvael). That does not make me an expert. Far from it. I, however learned something about it and I’m ready to complain.

What is wrong with the engine

Ultimately there is nothing wrong - engine works well (minus some weird quirks in interpreting some instructions, but if I remember correctly QEMU is the one to blame).
People I’ve talked to complained about speed of execution, but in my cases it wasn’t an important factor. What is more I haven’t done any proper benchmark, so I really don’t have an informed opinion. Well, to be honest I see one place where execution slows down, but this is just an assumption. More of that later.

So, why this section even mentioned wrongness in the first place? Simply because there is a lot of wrong (in my opinion) in what API offers, how it works and how some things are structured. I’m not even going to start ranting about documentation, because at least there are some code samples covering wide array of functionality that one can read. Still, proper docs like for example provide by Binary Ninja would be nice.

Superfluous const naming

I’m not a big fan of typical style of imports where you pollute main namespace with all possible functions and classes like from unicorn import all.
Style I’m accustomed to the most comes from our python styleguide, therefore if there is no proper module nesting I can always do import unicorn and when creating classes I know where everything comes from.
Now, what I like about Unicorn is that constants have their own namespace. Even better, every architecture has its own namespace. And while they do, why oh why are they named with architecture prefix. Let me explain with this tiny code sample:

import unicorn as un  
import unicorn.x86_const as const  
...  
engine.reg_write(const.UC_X86_REG_RAX, ret_rax)  
...

So, I’ve imported x86_const - I know that RAX register constant comes from Unicorn and from x86_const. I don’t want to write UC_X86_REG_RAX every time I want to access it.
I know this is just a tiny inconvenience and pretty much after typing it once any reasonable editor will complete it for you but still, this can be improved.

Setup phase

You start new emulation project by pretty much write exactly the same code every time

engine = un.Uc(un.UC_ARCH_X86, un.UC_MODE_32)  
  
# Setup Code section  
engine.mem_map(BASE, SIZE)  
  
# Setup stack  
engine.mem_map(STACK_ADDR, SIZE)  
engine.reg_write(const.UC_X86_REG_ESP, STACK_ADDR + (SIZE/2))  
engine.reg_write(const.UC_X86_REG_EBP, STACK_ADDR + (SIZE/2))  
  
# Copy code  
engine.mem_write(BASE, read_prog(sys.argv[1]))  
  
# start  
engine.emu_start(START, STOP)

All values like BASE, SIZE, START and STOP you have to retrieve manually by reading the header either via readelf or throwing given binary into your reverse engineering platform of choice.
This is tedious and I would really love some nice helper functions. It can be either high level like load_elf() or some medium level shortcut methods of Uc engine like setup_stack(bottom, top).

Another thing that really annoys me is how sometimes we need to skip certain instructions, either because they are making a call to a shared library (that we obviously have not loaded) or perform some IO operations. Typical code doing such task looks like this:

skip_list = [  
0x40058A, # call _printf  
]  
  
if address in skip_list:  
 engine.reg_write(const.UC_X86_REG_EIP, address+size)

Not only you have to maintain a list of instructions to skip but also you have to manually adjust instruction pointer. First problem is hard to solve automatically, because engine might not know what exact instructions we want to skip, but manual adjustment of register is just ugly. I would love to have this as a core functionality.

Hooks.

The worst thing in my personal opinion is how we are forced to use hooks. For every type of hook you define one global callback function. One.
Now, let’s say you want to do three different operations in three distinct addresses - of course we all know how this is going to look in the code - tree of ifs.
Typical example of this situation we can for example observe in Unicorn tutorial by Eternal Red

if address in skip_list:  
 engine.reg_write(const.UC_X86_REG_RIP, address+size)  
elif address == 0x400560:  
 c = engine.reg_read(const.UC_X86_REG_RDI)  
 key.append(chr(c))  
 engine.reg_write(const.UC_X86_REG_RIP, address+size)  
elif address == FIB_START:  
 arg0 = engine.reg_read(const.UC_X86_REG_RDI)  
 rsi = engine.reg_read(const.UC_X86_REG_RSI)  
 arg1 = u32(engine.mem_read(rsi, 4))  
  
if (arg0, arg1) in know_vals:  
 ret_rax, ret_ref = know_vals[(arg0, arg1)]  
 engine.reg_write(const.UC_X86_REG_RAX, ret_rax)  
 engine.mem_write(rsi, p32(ret_ref))  
 engine.reg_write(const.UC_X86_REG_RIP, 0x4006F1)  
else:  
 stack.append((arg0, arg1, rsi))

Same of course goes for UC_HOOK_MEM_* and other types of hooks. It also means, that your python function gets called for every instruction you execute - I can only imagine what impact it has on performance. This mess begs for a per address hooks (but truth be told, I don’t know QEMU internals enough to say if this is even possible).

Misc

There are two more problems that you need to solve during typical emulation process and you have to do it manually.

First, reading string from memory - there is no shortcut for that. Basically you need to read byte by byte in a loop until NULL value.
Second - shortcut for read value pointed by reg like [eax] that requires writing two instructions

rsi = engine.reg_read(const.UC_X86_REG_RSI)  
val = u32(engine.mem_read(rsi, 4))

Summary

In the end - Unicorn seems to be a nice emulation engine. Fairly approachable and easy to use. Don’t let the old man ranting disdain you.
All I wish for is just better API so I don’t have to write the same snippets of code again and again. It looks like I will eventually have to write those shortcuts functions myself.

Binary Fingerprint

2018-04-16T22:19:00.001+02:00

In recent update to my tiny Binja plugin I've added something named 'Binary Fingerprint'. It looks roughly like this:

It does not represent anything groundbreaking to be honest, but let me explain everything from the start.

In x86/x64 assembly language we can group instructions according to their primary function. Those are:

█ Data operations (mov*, pop, push, lea)
█ Floating point operations
█ Arthmetic operations (add, xor, shr ...)
█ Dataflow operations (call, jmp ...)
█ Other operations

To every instruction group we assign different color. Now lets map all assembly instructions in a function into two dimensional space. What we get is an image representing layout of different groups of instruction. What this tells us about function?

For experiments sake we will analyze /bin/tar using my technique to spot interesting functions. First type of function that attracts my eye are simple functions like this:

Let's see - a lot of data operations and some calls. What this function might do?

Ok, just freeing some data - we can probably mark it not so interesting for further analysis. Same principle applies to every fingerprint showing mostly one group of instruction.

Let's try to find something more useful. How about this one?

Lots of arithmetic operations - if we are looking for cryptographic/key generation/(un)packing type of function might we want to check all exhibiting similar characteristic.

Last one:

Wow, that looks messy - lots of dataflow mixed with arithmetic operations. That might indicate something I call 'routing' function - lots of decisions about future flow but not much of real data operations.

You might have noticed that some of the images look bit strange - like incomplete. Reason for that is choice of space filling curve - namely Hilbert Curve. Because this curve spans over 2ⁿ * 2ⁿ squares I cannot always fill whole space and need to stop drawing leaving rest of the space blank. Why this curve you ask? First and foremost reason is that it give a mapping between 1D and 2D space that fairly well preserves locality - long stream of instructions from the same group will form a cluster instead of a line (if I would have decided to fill space line by line).

To be quite honest, I don't think this fingerprint revolutionize reverse engineering. Still, it won't hurt to take a look at it while facing some binary with many functions - at least it might give you some idea where to look or not to look at the beginning.

Plugin: 10k view on functions

2018-03-05T23:53:00.001+01:00

I'm always writing code for all the wrong reasons. Last weekend I had an idea to do large scale testing of one of my plugins for Binary Ninja - Annotator. There is however a little use of testing your plugin on code you wrote and compiled yourself, so at some point you have to hit the proverbial road and start testing things in the wild. Before I even had some kind of structured plan I've opened some binaries on the system. I think I've picked up chmod. Now, I was interested in in some not terribly complicated functions that call some libc functions. I've opened one at random - 10 instructions and no calls. Next one - 30 basic blocks, 130 instructions (or something). I think I gave up with manual hit-and-miss after third one.

So, I've started writing another plugin to help me with development of my another plugin. I've decided to simply count instruction, basic blocks, functions calls and code xrefs (last one inspired by one person on binary ninja slack). After quickly putting together some code I've decided to put more structure around it so I can extend it later on with additional reports. I've named it Keyhole.

My biggest problem right now is that HTML widget offered by QT has very weak support for styles, therefore the report looks like, well, let's just say far from my ascetics expectations. I guess I will have to, instead of relying on built-in browser save it on disk and launch separate browser instance to pick it up from there. This time with proper styling.

And here question for my lovely audience - what other characteristics you are looking at before you even begin reversing a binary. Is there anything you might consider worth adding? My idea is to add some sort of entropy heat map (like here) and some other things. Well, that will have to wait of course, at least until I wrap up changes to Annotator.

Binary Ninja Recipes #2

2018-01-24T01:36:00.000+01:00

This time I want to explore two problems I saw before while writing plugins for Binary Ninja. First problem, steaming from development of Annotator plugin (and I need to implement what I've learned here) and second is influenced by paper on Static Analysis and theory of lattices (if of course I've understood it correctly).

Problem 1: walk the graph

Let's say we want to track the state of a given variable in a program and all standard methods (get_rag_value_at(), SSA) don't apply. Good example is my plugin where I want to track instructions that influenced given variable (in that case function argument). OK, I'm cheating a bit here in a sense that I haven't yet tried SSA approach - more about that in next time. For now, let's get back to our problem at hand.

In BinaryNinja blocks of a given function are available through an array. Let's take a look at the example bellow

So now we access those blocks programmatically: After running this code we get following result:

1 -> 0x804840bL
2 -> 0x8048450L
3 -> 0x8048435L
4 -> 0x8048455L

I guess you can see the problem right away. When we iterate over blocks we get them sequentially,but not exactly in the order that actual code might execute. Here, we will be processing blocks 2 and 3 after each other, while actually they will never be executed in the same code run (I'm assuming you are reading this in bright future where speculative execution bug was addressed once and for all). Truth be told, all functions parameters should be placed on the stack/in registers in the same block the function is being called, but there is absolutely no guarantee about that. I wasn't sure about that so I've asked Gynvael, to which he responded - "well, for sure it will happen in the same function ...". Thanks buddy.

Fortunately it shouldn't be that difficult to fix that. Well, for certain definition of fix. As you can see there is a simple recursive descent function tracking visited blocks. We are also taking advantage of nifty API feature where every block has incoming and outgoing edges that actually point to other blocks.

So, does this work? Of course it does. Does it solve all problems? No and here is why. This code will work well for all functions with linear flow (with just conditional statements). Things get bit hairy when we introduce blocks with, what BinaryNinja calls back edges. In simple terms - loop statements.

Problem 2: find all paths

So it happened - we hit the loop condition. Like one here

Checking blocks again...

1 -> 0x804840bL
2 -> 0x8048444L
3 -> 0x8048425L
4 -> 0x804844aL

If we use our recursive descent we get two paths: 1->2->3 and 1->2->4. We clearly see this is incorrect and reason for that is condition preventing from revisiting a block that we have already visited. We should be getting 1->2->4 and 1->[2->3->2]*->4 (simplified to 1->2->3->2->4). So, now we know, that blocks can be revisited. What shouldn't be revisited? Of course, edges. Take a look a the code. I think code explains itself pretty well, so there is no point linger too long around it. You might wonder why I'm making local copy of visited edges list. It is fairly simple - in the example bellow you can see that we branch in block 2 and one call stack of walk() is using edge 2->4 and later on, another call stack needs to take this edge again. If I keep single list of visited edges my search does terminate on block 2 missing last step of a path. Fun func: as I was told yesterday, and I blame my poor knowledge of CS I've just reinvented a variant ofrecursive DFS algorithm.

I have tested this code of relatively simple samples so if you have something more complex and it breaks horribly please let me know. Right now I'm just hoping someone will find it useful and I haven't spent my evening doing poor's man implementation of SSA. Well, only one way to find out. See you soon :)

Binary Ninja Recipes

2017-11-29T12:58:00.001+01:00

What is the value of a blog if you don't post something from time to time. But what to publish when you only recognize two kinds of knowledge: something you know, therefore it is trivial and something you don't know, therefore you shouldn't be writing about that? Well, today is the time for some trivial knowledge - Binary Ninja recipes.

Problem 1: how to develop plugins

I was trying to find an optimal way to structure my development environment for plugins for some time. First - for Binary Ninja to discover and run one it must be located in ~/.binaryninja/plugins/ directory (I'm skipping standalone plugins that you can just run from anywhere). Obvious solution is to edit it directly there, but somehow I was seeing this solution as inelegant. At first, I was editing files in my project directory and copying it manually, but after few times it became tedious. So, in the next step I've developed universal shell script that was taking plugin files and deploying it to relevant directory in binary ninja tree. That however had one tiny flaw - I had to remember to execute the deployment. Multiple times in my flow I was restarting Binary Ninja, opening binary file and executing plugin only to realize I'm still running old version of the code.
My next try was with Binary Ninja internal plugin system - it can fetch code from remote git repository and just make it run locally. But still, it was too complicated for a simple problem I was facing. I've asked good people on Binary Ninja Slack channel and I've adjusted my workflow basing it on few suggestions.

I primarily use git during my development, so I can later push things to github.com. I keep two main branches - stable and dev. Now, in addition to that I basically soft link my project directory under binary ninja plugin directory. When I want to develop new feature I switch to dev branch and I get instant deployment for free and when I just want to use it I checkout stable version. (I told you this is going to be trivial).

Problem 2: Binary Reader

Now, something more technical. Let's say you want, for some reason, to read/scan whole binary you've loaded into binary ninja; to, for example, find some pattern. My initial idea was to do it like this:

# bv stands for BinaryView
for addr in range(bv.start, bv.end):
  b = bv.read(addr, 1)

This approach has few flaws. First of all, return type is string, so if for example you want to read 4 bytes and compare it against value like 0x41414141 you need to unpack it into correct type. Second one is you can't move forward and backward with ease. I've decided that it would be better to use Binary Reader, so I wrote this:

br = bn.BinaryReader(bv)

while not br.eof:
  f_byte = br.read8()

In theory that should scan every byte of a binary, mapped or not. Every read8() call move internal read offset by one byte and return value correspond to relevant function being called. There was on small problem with that code - if ended up with infinite loop. It took me while to understand, what is going on. So, basically, if a read step out of mapped segment and returns null value it stops moving internal offset, hence the infinite loop. Improved version of the code now looks roughly like this:

br = bn.BinaryReader(bv)

while not br.eof:
  if bv.is_valid_offset(br.offset):
    f_byte = br.read8()
  else:
    br.seek_relative(1)

Now it works smoothly.

From now on I will try to write short pieces of How I do things style posts, especially about Binary Ninja. I've even started drafting something I refuse to call book, but if I have enough material related to writing Binary Ninja plugins, who knows. Let me know what do you think about all of this! Next time I will try to write some more about Binary Ninja plugin repository management.

Nobody expected 64 bits

2017-03-15T23:35:00.002+01:00

Apparently if you are not mortally embarrassed by the quality of your code you are releasing it too late [(tm) Silicon Valley]. But to use another only-too-often-used-quote - "Release early, release often". I've made mistakes of hoarding my tools and code for too long, not releasing them because they weren't perfect. This was obviously road to nowhere because if I don't release, nobody uses it. And if nobody uses it I have no motivation to develop it anymore. So, to break this circle I present you a new version of function Annotator for Binary Ninja.

First thing worth mentioning is a new database of functions prototypes. To be exact we now have 4728 prototypes. From this place big thanks to Zach Riggle for his functions project - this update would not be here if not for him.

Next thing is virtual stack for x64 platform - from now on you can also annotate 64-bit applications for Intel/AMD processors.

One small thing that I still need to properly implement is full support for functions operating on floating point types (float, double and long double). Right now they are not properly annotated and there are two important reasons for that:
For 32 bit platform floating point arguments are pushed on the stack using instructions like fstp or fst. Sadly, Binary Ninja right now does not have a corresponding Low Level Instructions for those. They are just showing as unimplemented(). The moment Binary Ninja starts supporting them I just add some more parsing code and everything will be fine.
64 bit platform is slightly more complicated. First of all, arguments to functions are passed via registers. Integers, pointers and such are passed through 6 registers - RDI, RSI, RDX, RCX, R8 and R9 and order matters. Floating point arguments are passed via XMM0-7 registers. Now, let's imagine that we have two functions f1(int, float) and f2(float, int). What will compiler do? Well, on Linux, in case of f1() first argument will end up in RDI and second in XMM0, but in f2 first argument will end up in XMM0 and second one in RDI.
"Wait a minute" - you will say - "but this is exactly the same". I'm glad you are seeing the same problem. Just having state of registers won't tell us what the first and the second argument is unless you know types in the first place. Virtual Stack does not know types, so until I refactor my code FP types won't be supported.

New updates are planned so stay tuned! And of course, please let me know what you think about it and report all bugs.

Annotate all the things

2017-02-21T13:30:00.000+01:00

I don't do reverse engineering for a living but I still like to peek under the hood of binaries from time to time. Either because of testing, looking for bugs or just for fun. Problem is, that IDA Pro, de-facto standard tool for any Reverse Engineer is prohibitively expensive for most of the people. On top of that, licensing policy is very annoying and illogical. But enough about IDA Pro - let's talk about new contender on this field - Binary Ninja.

I'm not going to repeat all the praises that this tool is receiving. Instead, you may for example read how you can use it to automatically reverse 2000 binaries or maybe how the underlying Low Level Instrumentation Language works. All in all platform looks very promising and I couldn't wait to try it after seeing it for the first time. Couple of months ago I was playing with the Beta and pretty much bought it first day it was released.

There is one tiny problem with Binary Ninja however - IDA Pro was here for years, therefore it is both feature rich and ecosystem around it is pretty robust. Binja still has a long way to go in this department - there are not that many useful plugins and some features are missing. One thing I've noticed for example is that while reversing basic libc functions and system calls are not annotated in any way. There is no prototype of them and arguments are not marked in any way. So instead of complaining I've decided to utilize available API and just fix that.

Let's start by defining a problem. For example we have a listing like this:

Not terribly descriptive, right? Well, at least for strcpy() we roughly remember the prototype so we can quickly find where arguments are being pushed on the stack. But what about fchmodat() or sigaction(). Yeah, you need to get back to man page. How cool would be to open a binary and get this:

This is exactly what Annotator plugin does - it iterates through all instruction in the code building a virtual stack as it goes, but instead of variables it tracks instructions that pushed a given variable on to the stack. Upon encountering a call of known function it uses this virtual stack to annotate it with a proper argument prototype.

This is a very first release so it is probably riddled with bugs. Not to mention some features are missing. Right now not all glibc function prototypes are present because I haven't found a good and reliable way to extract them from headers - instead I'm using a combination of grep, regex and cut with some manual cleanup effort. That unfortunately takes time. Same goes for system calls, but I should be able to put all Linux 32bit ones today. Ah, and you have to run plugin manually in every function you view - right now there is no way to automatically apply it to all the functions - I'm contemplating to write one method allowing user to apply it to whole underlying call graph, but we will see about that.

Another thing is quite naive virtual stack implementation - for sure it requires more work to track stack growth more accurately and for example track number of arguments for functions with va_arg type of arguments. Right now I'm also scanning blocks of code in linear manner, but for future version I will probably switch to recursive mode with stack isolation for each path (well, right now I haven't encountered situation where functions arguments are done in different code block than the call itself, but better safe than sorry). Last thing to improve is number of virtual stacks - first for x64 platforms and later for ARM architecture.

Please, let me know what do you think about the extension and report all the bugs.

In search of golden fleece

2015-08-28T14:13:00.000+02:00

Key activity when looking for reflected XSS is to check what parameters provided in request are echoed back in response. Doing that manually is tedious and that time can be spent in more productive way. For example you can write burp extension that will do it for you. So, I present Argonaut.

Extension works in very simple way - it parses captured request to extract all parameters (cookies included) and later search through response body to see if value in question has been echoed back. In such case a short snippet of match is presented to the user.

Currently a parameter parsing is done in quite a dumb way - it works quite well with standard GET and POST parameters, but for example is unable to extract param values from JSON or XML and tried to see for exact match of whole payload. That is not very effective, but it is on my TODO list. One more thing to remember - parameter values shorter then 3 characters are ignored (you don't want 300 matches of '1' in result table).

Hey, but what about escaping, you ask? No worries, I got this covered. Let's say you are testing a web application written on top of Django. Most likely you are going to use Jinja2 template engine, and it applies escaping. Argonaut will search the response body for plain parameter value (let's say test">), but will also apply various defined transformations/escaping to see if for example application returned 'test">'.

I've chosen Jinja2 example for a reason - truth be told Jinja2 is the only transformation implemented so far, but mechanism is in place and I'm planning to add new ones very soon.

There is still work to be done. Some simple tasks will be completed soon - for example new transformations and some UI work. Others, harder - like support for contextual autoescaping libraries and type dependent parameter extraction will have to wait a bit. Anyway, stay tuned and let me know what do you think.

Migrating repository

2015-07-27T13:49:00.000+02:00

Because code.google.com will be finally deprecated really soon I've moved all my projects to github. That includes JSONDecoder.

MutProxy

2013-08-14T20:58:00.000+02:00

Recently I had very little time to write anything meaningful. New post are coming, slowly but steady. In the meantime I've stumbled upon short code at Gynvael page. It reminded me of a project I wrote some years ago for one assessment.
When I finally found it the code wasn't in state where I'd like to show it to anyone. Past few days I've spent cleaning and expanding it a bit. Today I've pushed code into GitHub. Here, take a look.

So, what MutProxy does? (Yep, I know that name is not very original nor brilliant, but come on, I'm not a Junior Creative Director in D'Arcy, I'm just a plain pentester.) It's just a simple proxy/tunnel with ability to attach functions to alter or log traffic in different ways. ReadMe does not exists at the moment, so you will have to read the code to determine functionality. There is some documentation in code comments :).

A lot of work still to be done - mutators are very basic and act more as an example then real deal, logger is very plain and documentation does not exist. Waiting for more free time. I was also planning to write more how to force applications to go through your proxy.

Small update

2013-06-18T13:14:00.001+02:00

This is going to be very short (let's call it a warmup) post.
Just wanted to let you know that I've made small update to JSONDecoder. Changes are mostly cosmetics:

Content type check is case insensitive now
Decoder is now removing garbage from JSON payload (like }]);)
Another Content-type is being checked: text/javascript (twitter uses that)

More stuff soon.

Jar full of cookies

2013-02-11T14:02:00.001+01:00

Few posts back I've been giving tips about how to organize web fuzzing - you remember that part, color highlights, marking stuff for later. But one person (I think that was my only semi-active reader) asked me: "But those request are gonna expire, session will die". That is true - very often you no longer can reuse that request, unless of course you are planning to copy and paste all the cookies from more recent one. There, however is a faster method.

Set things up

Burp Suite has this nifty feature called Jar Cookie - basically Burp has ability to parse every Set-Cookie header and store cookies in a database. Good thing is that other tools are able to use the same jar. While issuing a request Burp will replace every matching cookie header with the most recent value obtained from the jar.
In the Options/Sessions tab you have the ability to set which tool traffic should be monitored to update a jar. To configure what tool should use the cookie Jar you have to edit default session handling rule - take a look at scope tab. Now, before you start fuzzing (or just playing with some stored requests) you only have to login to application through proxy and newest cookies will be placed in a jar.

How about magic trick

This is just the beginning - cookie jar/session management options are even richer. In Options/Sessions tab you can set a lot of possible actions. First - macros. You can set up automatic sequences of request, retrieve some parameters like anti-CSRF token or simply log you automatically to the application. In session handling rules you can configure some behaviours making use of previously set up macros (but not only). For example in Intruder before every request you may want to issue different request to obtain a valid anti-CSRF token and then use it while issuing one with tampered parameters. Of course details will differ between applications you are testing, but I encourage you to try it yourself. Remember - what sometimes seems to be overly complicated can in fact save you a lot of manual and mindless cop-and-paste job.

As always some additional information can be find at BurpSuite Blog.

JSON Decoder

2013-02-06T14:03:00.000+01:00

Long time no see. Usually people start such notes with oh-so-cliche quote from Mark Twain, but I've already did that on numerous occasions, so no. Anyway, despite the hidden motto of this blog ("no promises, it will be released when it's done") I wrote something. Finally, yesterday I've overcome my pathological laziness and finished version 1 of very small Burp Extensions - JSON Decoder. Code itself is not very impressing, nor is the functionality, but it's a start - now, knowing the basics I can move to more impressive stuff.

The Extension

Since version 1.5.01 Burp Suite Pro comes with new API for writing extensions. No longer you need to write them in Java, bundle into JAR and are forced to do some mojo magic to make them run. New API also gives you access to much more of the Burp internals. I'm not going to give you a tutorial how to write them, but I encourage you to read some of official tutorials on PortSwigger blog. If I see correctly there are eleven tutorials covering quite wide selection of topics.

So, what is my extension doing? Not that much (at least in this version) - it's just an additional tab with pretty printed JSON packet. I have other plans for that but I need to find time (and I've started flying BMS 4.32 again, so no rest for the wicked). I have some others extensions as a work in progress, but they are not in the ready-to-show state.

Debugging

Debugging burp extension is a bit like "Why? Because Fuck You, that's why" experience. You have made a typo, mixed expected type or declared too many parameters in function definition? All you get is JavaRuntimeException. You think that you won't made those mistakes? Let me show you what kind of mistakes I did while coding this extension.

Typos - I've spend 30 minutes failing to spot the difference between CreateTxtEditor() and createTxtEditor(). While writing an extension make sure that every API function follows CamelCase conventions (it can be tricky, because python names are usually flat). For example you can convert byte[] data variable in two ways - burp.helpers.byteToString(data) or data.tostring().

Difference between Java.String and byte[] - some functions accept byte[], some String - always check which type function expects and what it returns. It will save you time spent inserting countless println() lines.

Given the low complexity of my code I was able to use oldest, print everything technique of debugging, but if you are writing something more complex please read this blog entry.

Bit more about Burp stuff

If you are a new to Burp I can recommend a book written by my friend - grab it here. You can read it yourself or give to that new Junior Pentester that just joined.

Small and vulnerable webapp

2012-11-04T05:11:00.000+01:00

The problem

You are in the plane, 11000 meters above the sea level, traveling 900kmh. And suddenly (usually after a bottle of wine) you have this brilliant idea about a bug in the browser, new way to filter some data or really anything that just requires writing a webapp. But you are in the plane, having the company machine running the OS_of_not_so_much_your_choice. Not always you have an Apache server wit PHP running on your laptop (well, you really should not have), VMplayer/VMFusion/anyVM is probably even less common. So, maybe you can use Django or Ror (or J2EE+Tomcat+JBoss - and if you say yes to it this is not the blog you are looking for). Anyway, you still want to code something.

So, here comes the Bottle. Repeating after web page - Bottle is a fast, simple and lightweight WSGI micro web-framework for Python. It is distributed as a single file module and has no dependencies other than the Python Standard Library.

Those two sentences neatly summarize all the things I like in it - just one file, no much of a setup needed and development effort is reduced to minimum. Thanks to that you can spent time solving real problems, not struggling with weird Apache vhost config file and wondering why the hell mod_php is not working.

Crash course

I know - talk is cheap, so show me the code. Instead of just pasting some code from the tutorial we can try solving some semi-real problem. Let's get back to my previous post - Using burp in a smart way where we were trying to figure it out how to fuzz for XSS vulnerabilities. To see how Burp behaves in different situation we would need some vulnerable script.

#!/usr/bin/env python

import re
import traceback

try:
  from bottle import run, request, template, route
except:
  print traceback.format_exc().splitlines()[-1]

head = "<html><title>Simple search interface</title><body>"
footer = "</body></html>"
filters = [r'!.*',r'[^a-zA-z0-9]*'] 
 
@route('/show')
def show_patterns():
  t=""
  for p in patterns:
    t+="<p>"+p+"</p>"
 
  return head+t+footer
 
@route('/search/:id')
def index(id):
  q = request.query.q
  t = "<p>You have searched for {{!query}} "
  t+= "and I've applied following filter - {{filter}}</p>"
  t+= "<p>Sadly, nothing was returned</p>"
 
  try:
    f = filters[int(id)]
  except:
    f = filters[0]
   
  return template(t, filter=f, query=re.sub(f,'',q))
 
run(host='localhost', port=8000)

Idea - application echoing back user search query and simple switch that will either show regexp patterns used to filter it or apply one of them.

I'll explain it in more details.

try:
  from bottle import run, request, template, route
except:
  print traceback.format_exc().splitlines()[-1]

First, import couple of functions from bottle framework - for this small program you will need only some functions to route request, get parameters from query, run template and run test webserver. Bear i mind that this is very simple web server and it's not suited to be exposed to the world.
You can safely ignore try..except block construction - I'm just trying to inform you, that you are missing bottle library.

Let's handle our first request:

@route('/show')
def show_patterns():
  t=""
  for p in patterns:
    t+="<p>"+p+"</p>"
 
  return head+t+footer

Most important thing here is a function decorator @route. I hope you are familiar with python function decorators - if not this is simply a function which wraps around other function - (over)simplifying you can treat it as a condition upon which the inside function will run (for all CS degree people - I don't care for formal decorator definition).
So, if we make an request to URL /show the function show_patterns() will run. Inside this function we only enumerate filters inside our script - we glue it together with header and footer then spit it out by return function.

That was really easy. Now time to analyze next function.

 
@route('/search/:id')
def search(id):
  q = request.query.q

Again, important thing is our @route decorator. Take a look at this :id thing - it just an element that will be matched dynamically - whatever you put after /search/ will get translated into argument for your function. Of course we need to grab argument from the query string (be careful, it's bit tricky - GET parameters are in requst.query, but POST requests are in request.forms), hence the q assignment.

Now, let's construct main body of the template file

 
  t = "<p>You have searched for {{!query}} "
  t+= "and I've applied following filter - {{filter}}</p>"
  t+= "<p>Sadly, nothing was returned</p>"
  [..]
  return template(t, filter=f, query=re.sub(f,'',q))

Bottle can use multiple template engines, but by default it uses Simple Template Engine. Two important things here - first, take a look at {{filter}} - it tells you that this is a place where you are going to put data while rendering template (by template() function). Second thing - template engine by default escapes all dangerous HTML characters - probably we don't want that, so precede it with ! character to disable that feature.

Being practical

I was using this program (well, actually I was not - I wrote it during 15 minutes break on the conference) writing previous article about burp fuzzing - you can use it as a testbed for both, testing some filters (still not perfect, but I can make better version later) and learn how to fuzz looking things with burp. Currently I'm using Bottle for both - writing small vulnerable things if I need to test some concept/attacks but also for some more serious projects - but let's save it for the next entry.

Summary

I know that this topic is not groundbreaking. Why writing then? Well, maybe because I want to show people that there is alternative to LAMP - you don't have to set up whole Apache + MySQL to create script with 5 lines of code just to test some simple case of anti-xss mechanism in a browser.

Using Burp in a smart way

2012-10-17T13:24:00.000+02:00

If I would get a penny for every BurpSuite tutorial I saw on the internet I would be rich. No, not really, I would just have 3.5 pennies. Well, let's face it - I suck at constructing metaphors. Back to BurpSuite then. As I said before - I've seen BurpSuite tutorials - they are good at explaining how to use certain tools in terms of 'this button does this and you can click here to do that'. Very often those tutorials does not explain why would you use this function and what is the effective way of doing certain tasks. I'm hoping to fill that gap in following post.

Setup

The most important advice I can give to you at the beginning is to set up your workspace and tools correctly to avoid problems on later stages.

First - layout your windows to get a better overview. In my case (two 23 inch monitors) left monitor is being used for Burp window and right for browser and firebug/terminal window (two panels, each one occupies half of the screen - courtesy of unity wm). It's quite important to be able to move your attention between windows and to be able to see more than one window at any given moment - you won't waste time on context switching.

Default settings are quite reasonable, but there are some things you can tweak. First - it's a Java app, so give it at least 1GB (2 would be optimal) of RAM via -Xmx.
For evidence retention you might want to configure Automatic Backup (options/misc) - it will save a copy of Burp state periodically and on the exit. Crash of BurpSuite never happened to me but better to be safe then sorry (and you might click that Install updates and reboot your computer button at 3am and waste whole evening of work).

Another important task is to configure your SSL certificates. Because Burp is acting as an intercepting proxy you are not really connecting to a site, but to Burp, and then Burp is making connection to a chosen site. The result is that your browser warns you, that SSL certificate presented to you by a page is not trusted. It is not, because every Burp instance generate his own SSL certificate. To avoid annoying SSL alerts you need to install Burp CA certificate into your browser - instructions are available here. As suggested by my friend in firefox you can create separate profile for pentesting with all the security options disabled and Burp CA certificated (because Firefox has a separate certificate store per profile).

Another thing you might want to do in terms of setup (again, thanks go to Daniel F.) is to move your folder where Burp stores request/responses. By default it's in /tmp directory and is world readable - it mean that by default all your credentials would be visible by all the people with access to your computer.

First look

Very rarely you would want to just look at the traffic being sent by your browser to all the pages in dozens of tabs you have open (and we call paranoid people who do). Well, actually if you are using the same browser to do pentesting and to casually browse the internet please stop - for pentesting it's better to use browser with all the security features disabled and of course you don't want to browse internet with it.
So, as I was saying - most likely you want to focus your attention at one particular domain (with maybe some additional subdomains or somehow related domains) and you do it in Burp by setting a scope. That way you declutter a history and target views removing unnecessary entries.

Now time to do your first run over the application - it should be clean - behave like a model citizen. Don't try to look for vulnerabilities yet - you will have plenty of time later on. I call the first run a 'pattern' upon which you will work in next stages. It's important to hit most important and most frequently used functions of application. Any experience from UAT scenarios might come in handy. Do it for every role in the application.

Now you have to run through history you've just accumulated. Personally I mark every candidate for data input validation testing (parameters being passed) with green highlight, vertical and horizontal authorization bypass candidate with blue and other suspicious request and responses with yellow. Also, if site you are testing has some complex authentication mechanism I add comments like auth stage 1 etc.

Personally I don't use active scanner but passive scanner is quite capable of spotting some obvious vulnerabilities like missing cookie flags, mixed content or clear text password submission.

Dirbusting

Now it's time to discover some hidden content aka dirbusting. For that purpouse of this task you can employ Intruder but it has some limitation - it cannot do recursive scanning automatically. After every found catalogue you have to reconfigure intruder to follow it deeper.

Better option would be to use skipfish or DirBuster for that task until Dafydd decides to code this tool into Suite.

On the other hand maybe you just need quick look at the directory structure (I had to kill last DirBuster run after 13 hours).

Mashing the inputs

Remember the tedious task of highlighting the requests in the history? Now it's time to look for some vulnerabilities. Grab first green request and send it to intruder - we will do some fuzzing (and repeat it for every highlighted request).

So - short fuzzing with intruder guide begins. It's really easy - first you need to set up a payload position, attack type (they are well explained in help) and then you need to choose payload. You can of course pick up some pre-set payload list like fuzzing-quick or even, remain calm, fuzzing-full but this does not bring you even close to proper coverage. Don't try to create your own fuzz list - save yourself a hassle and use fuzzdb.

This is what I usually do for every field at the beginning - I pick a list named URIhex.fuzz.txt, set up a payload processing rule Add suffix: xxx and run it against every field. Doing this you will have some sort of understanding which characters are allowed in which field and which ones are filtered or encoded.

Of course this is just scratching the surface - you never know what kind of filtering mechanism is behind that data input routine you are just fuzzing - maybe some characters are allowed, but certain combinations are not? There might be some strip_tags function or some really weird regexp. In that case fuzzdb is your friend - just pick a right list and off we go.

There is one difficult thing in fuzzing - choosing the right payload/payload generation method. There is also one tedious thing in fuzzing - browsing fuzzing results. You can however save some time by setting up Intruder properly.

Let's get back to our first example - checking which characters are allowed. After doing this you've ended up with 256 results for every input field. Browsing this by hand? No, thank you but no. So, what to do? Fortunately intruder have some tools to help you extract meaningful information from server responses.

We start by looking for an SQL Injection. Let's assume that you're just testing some simple search function - one field only. Payload position is set, as a payload you've picked pre-set list called Fuzzing - SQL Injection, no weird payload processing is needed and you are ready to hit big red button. No so fast - before running scan you need to make sure, that your baseline request is legitimate and guarantees obtaining valid results. Remember those green patterns we established couple paragraphs before? You should be using them now.
Short moment after running the intruder you should have a nice 134 (+ baseline) pairs of request/response. Now, couple of important tips. First look at response length - any significant deviation (especially decrease) can indicate that something went wrong. Look also for responses with status code different then 200.

Also Intruder options might come in handy - set a grep-match to look for any keyword that might indicate SQL server problems - mysql, ORA, error, ODBC and such. Search engine will probably print number of retrieved results - you can get them using grep-extract and print it in attack result table. This way you will have all important information summarised in one place.

Now let's hunt for some XSS-es. It's somehow more complicated then looking for SQLI - after fuzzing with, let's assume xss-rsnake.txt you will end up with 74 results. Status code and response length won't allow you to distinguish between successful and unsuccessful attack. We however can help ourselves with two intruder options.

First would be grep-extract. If you have a baseline request you can see where your inputs get echoed back. Set a proper patter and you will see all outputs in attack table. I still however forces us to review hundreds (if we combine couple of fuzz lists) results looking for stripped characters or difference in character encoding. This is good method up to 50-70 results but surly we can do better then that.

That brings us to grep-payload - very nifty tool to review fuzzing results. The most important option is Search responses for payload string - this will flag every request where payload in request is echoed exactly in the response - and this is a strong indication that there might be a potential for xss vulnerability.

Closing word

We don't want to turn this post into long list of vulnerabilities and how to look for them - you are smart so you can figure the rest of your own. There are of course more complicated examples of attacks and obstacles that you can hit during fuzzing (like CSRF-protected forms) but I hope to cover them in the future.

I was thinking about writing such guide for some time hoping to be first. In one way I've succeed but in other way I've lost the race - this guy is writing a whole book about Burp. Maybe I can get a draft?

Welcome home

2012-10-17T13:23:00.001+02:00

So, we meet again. It's great to see you here. Probably you know me from my previous blog, so no need to introduce myself. If you don't then please read short summary on the right sideboard.

This blog is (and I am hoping to keep it that way) about security and I promise to update is as infrequently as possible but as always with a good content. I did not wanted to say word 'content' in vain, so next post is ready and I hope you will enjoy it.