I should start by telling you that this post does not contain anything fundamentally new. Hence, if you already know the tools mentioned in the title, this post may probably not be for you. However, if you are not too familiar with these tools and want to understand a little bit more on how they work together, you should keep on reading.

First, let us get a high-level overview of the different tools. We begin with QEMU. QEMU is a piece of software to emulate hardware such as processors. Imagine, for example, that you are running an operating system such as Linux or Windows on a x86-64 machine and that you would like to analyze a binary that has been compiled for an ARM or MIPS processor. Of course, you can use static analysis on the binary, but if you want to find out more about the runtime behavior, well, it would be good to have a corresponding runtime environment.

If you now think “What about virtual machines? Can they create such a virtual environment?” the answer would typically be “No, they can’t.”. A hypervisor usually takes the hardware of the underlying machine (except in the cases where it does not), to execute, for example, the instructions embedded into a binary. That means, the hypervisor takes the x86-64 processor to execute the instructions and, hence, you cannot run your ARM or MIPS binary.

And here comes QEMU into play. As QEMU is a hardware emulator, it can emulate processors (e.g. ARM aka qemu-system-arm or MIPS aka qemu-system-mips) such that the emulated processor can be used to execute the corresponding instruction set. It should be noted, though, that QEMU can also be used as part of a virtualization environment. For example, the kernel-based virtual machine (KVM) uses QEMU if it cannot handle certain instructions and has to resort to emulation.

So far, we learned that QEMU is great for setting up a runtime environment for binaries compiled for an architecture that is different from the one of your machine. When a binary gets executed in this emulated environment, QEMU potentially gets a lot of information on the instructions that get executed, e.g. which instructions are executed, which registers are accessed, which memory regions are accessed and so on. Hence, it would be great to somehow get access to this information and maybe also modify corresponding entries, for example, registers or memory regions. Unfortunately, there does not exist a simple API for this task. That is why Unicorn got born.

Unicorn ripped out the core of QEMU (basically its CPU emulation) and build a powerful API around it (with Python bindings, so it is simple to use). However, Unicorn is based on QEMU version 2.1.2, because this was the current version at the time of development of Unicorn. Although Unicorn is actively maintained, the QEMU core has not really been updated as it is tied with quite a lot of effort. Hence, a lot of features that have been developed for QEMU (the current version number at the time of writing is 5.0.0) could not be implemented in Unicorn (such as, for example, support for the AVX instruction set). That is why the maintainers of Unicorn recently opened an issue within their GitHub repository to ask for support in re-working Unicorn based on an up to date QEMU version. The corresponding issue can be found here.

You can even combine American Fuzzy Lop (AFL) with Unicorn (called afl-unicorn) to fuzz binaries in the emulated environment (it should be noted, though, that there is also a QEMU mode for AFL used to get code coverage for arbitrary binaries). This allows to fuzz targets based on a memory snapshot of a process. The memory snapshot is obtained via a helper script for GDB, loaded via Unicorn, and is then used to fuzz the target binary based on the context that was set up at the time the memory snapshot was taken. This approach is explained great detail here and here. However, as the fuzzing is performed in an emulated environment, it will be much slower than fuzzing the binary natively (approximately 10 times slower). But sometimes, if it is not easy to set up a fuzzing environment on the target machine (where the binary was obtained from), this can be a viable option.

Based on afl-unicorn, another tool called Unicorefuzz has been developed to fuzz kernel components. It has been presented at WOOT’19 and the paper can be found here. In contrast to the original approach of afl-unicorn, Unicorefuzz does not take a memory snapshot but rather uses a GDB stub to connect to the process in its native environment to fetch memory contents when needed.

Coming back to afl-unicorn, the preparation of fuzzing can often be a bit cumbersome as system calls and certain library calls (depending on if the library function is included in the memory snapshot or if the library function is just a wrapper around a system call) have to be emulated. For example, when a call to malloc is encountered in the execution path, this call needs to be hooked and replaced by its Unicorn equivalent, i.e. a function that provides a memory region controlled by Unicorn. The hook for this would be something like the following on a x86 system:

def unicorn_hook_instruction(uc, address, size, user_data):

    if address == ADDR_TO_MALLOC:                                                                            // [1]
        size = struct.unpack("<I", uc.mem_read(uc.reg_read(UC_X86_REG_ESP) + 4, 4))[0]                       // [2]
        retval = unicorn_heap.malloc(size)                                                                   // [3]
        uc.reg_write(UC_X86_REG_EAX, retval)                                                                 // [4]
        uc.reg_write(UC_X86_REG_EIP, struct.unpack("<I", uc.mem_read(uc.reg_read(UC_X86_REG_ESP), 4))[0])    // [5]
        uc.reg_write(UC_X86_REG_ESP, uc.reg_read(UC_X86_REG_ESP) + 4)                                        // [6]        

This Python code will emulate the behavior of a malloc call. The address of the malloc function (e.g. within a memory dump) has to be provided here via ADDR_TO_MALLOC. The if statement in line [1] checks if this address is hit and it is, executes the code afterwards. In line [2], the size for malloc will be read from the stack. This size will be used in line [3] to provide a Unicorn controlled heap region allocated with the malloc function provided by Unicorn. As the usual malloc stores its return value in EAX, the correspoding return value from the Unicorn malloc function is stored in this register in line [4]. Afterwards, the return of malloc is emulated by fetching the return address from the stack and putting it into EIP in line [5] and by increasing ESP by four (the size of the return address on the stack) in line [6].

Moreover, handling of certain segment registers such as fs and gs cannot be done directly in Unicorn, but a small Unicorn stub using read/writes to model specific registers has to be used. The Unicorefuzz project provides such stubs, which can be found here.

Finally, I would like to discuss a tool named Zelos (Zeropoint Emulated Lightweight Operating System) that was recently published here. As mentioned above, syscalls can be an issue when running a binary via Unicorn. Now, Zelos is based on Unicorn and emulates all of the Linux syscalls for x86/x86-64, ARM and MIPS binaries. It also provides detailed information on them. Hence, if you have a binary, for example, for a MIPS system and you would like to quickly evaluate what the binary is doing, you could try to execute

zelos your_binary

and would get a list of all the executed syscalls.

I hope this blog post helped some of you to get a better understanding of some of the tools out there that can be used to dynamically analyze binaries for different platforms.

Cheers
Oliver