“Unstripping” binaries: Restoring debugging information in GDB with Pwndbg

By Jason An

GDB loses significant functionality when debugging binaries that lack debugging symbols (also known as “stripped binaries”). Function and variable names become meaningless addresses; setting breakpoints requires tracking down relevant function addresses from an external source; and printing out structured values involves staring at a memory dump trying to manually discern field boundaries.

That’s why this summer at Trail of Bits, I extended Pwndbg—a plugin for GDB maintained by my mentor, Dominik Czarnota—with two new features to bring the stripped debugging experience closer to what you’d expect from a debugger in an IDE. Pwndbg now integrates Binary Ninja for enhanced GDB+Pwndbg intelligence and enables dumping Go structures for improved Go binary debugging.

Binary Ninja integration

To help improve GDB+Pwndbg intelligence during debugging, I integrated Pwndbg with Binary Ninja, a popular decompiler with a versatile scripting API, by installing an XML-RPC server inside Binary Ninja, and then querying it from Pwndbg. This allows Pwndbg to access Binary Ninja’s analysis database, which is used for syncing symbols, function signatures, stack variable offsets, and more, recovering much of the debugging experience.

Figure 1: Pwndbg showing symbols and argument names synced from Binary Ninja in a stripped binary

For the decompilation, I pulled the tokens from Binary Ninja instead of serializing them to text first. This allows for fully syntax-highlighted decompilation, configurable to use any of Binary Ninja’s 3 IL levels. The decompilation is shown directly in the Pwndbg context, with the current line highlighted, just like in the assembly view.

Figure 2: Decompilation pulled from Binary Ninja and displayed in Pwndbg

I also implemented a feature to display the current program counter (PC) register as an arrow inside Binary Ninja and a feature to set breakpoints from within Binary Ninja to reduce the amount of switching to and from Pwndbg involved.

Figure 3: Binary Ninja displaying icons for the current PC and breakpoints

The most involved component of the integration is syncing stack variable names. Anywhere a stack address appears in Pwndbg, like in the register view, stack view, or function argument previews, the integration will check if it’s a named stack variable in Binary Ninja. If it is, it will show the proper label. It will even check parent stack frames so that variables from the caller will still be labeled properly.

Figure 4: A demonstration of how stack variable labeling is displayed

The main difficulty in implementing this feature came from the fact that Binary Ninja only provides stack variables as an offset from the stack frame base, so the frame base needs to be deduced in order to compute absolute addresses. Most architectures, like x86, have a frame pointer register that points to the frame base, but most architectures, including x86, don’t actually need the frame pointer, so compilers are free to use it like any other register.

Fortunately, Binary Ninja has constant value propagation, so it can tell if registers are a predictable offset from the frame base. So, my implementation will first check if the frame pointer is actually the frame base, and if it’s not, it will see if the stack pointer advanced a predictable amount (which is usually true with modern compilers); otherwise, it will check every other general-purpose register to try to find one with a consistent offset. Technically, this approach won’t work all the time, but in practice, it should almost never fail.

Go debugging

A common pain point when debugging executables compiled from non-C programming languages (and sometimes even C) is that they tend to have complex memory layouts that make it hard to dump values. A benign example is dumping a slice in Go, which requires one command to dump the pointer and length, and another to examine the slice contents. Dumping a map, on the other hand, can require over ten commands for a small map, and hundreds for larger ones, which is completely impractical for a human.

That’s why I created the go-dump command. Using the Go compiler’s source code as a reference, I implemented dumping for all of Go’s built-in types, including integers, strings, complex numbers, pointers, slices, arrays, and maps. The built-in types are notated just like they are in Go, so you don’t need to learn any new syntax to use the command properly.

Figure 5: Dumping a simple map type using the go-dump command

The go-dump command is also capable of parsing and dumping arbitrarily nested types so that every type can be dumped with just one command.

Figure 6: Dumping a more complex slice of map types using the go-dump command

Parsing Go’s runtime types

While Go-specific dumping is much nicer than manual memory dumping, it still poses many usability concerns. You need to know the full type of the value you’re dumping, which can be hard to determine and usually involves a lot of guesswork, especially when dealing with structs that have many fields or nested structs. Even if you have deduced the full type, some things are still unknowable because they have no effect on compilation, like struct field names and type names for user-defined types.

Conveniently, the Go compiler emits a runtime type object for every type used in the program (to be used with the reflect package), which contains struct layouts for arbitrarily nested structs, type names, size, alignment, and more. These type objects can also be matched up to values of that type, as interface values store a pointer to the type object along with a pointer to the data, and heap-allocated values have their type object passed into their allocation function (usually runtime.newobject).

I wrote a parser capable of recursively extracting this information in order to process type information for arbitrarily nested types. This parser is exposed via the go-type command, which displays information about a runtime type given its address. For structs, this information includes the type, name, and offset of every field.

Figure 7: Examining a struct type that consists of an int and a string

This can be used to dump values in two ways. The first, easier way only works for interface values, since the type pointer is stored along with the data pointer, making it easy to automatically retrieve. These can be dumped using Go’s any type for empty interfaces (ones with no methods), and the interface type for non-empty interfaces. When dumping, the command will automatically retrieve and parse the type, leading to a seamless dump without having to enter any type information.

Figure 8: Dumping an interface value without specifying any type information

The second way works for all values but requires you to find and specify the pointer to the type for the value. In many cases, it is as easy as looking for the pointer passed into the function that allocated the value, but for global variables or variables whose allocation may be hard to find, some guesswork may be involved in finding the type. However, this method is generally still easier than trying to manually deduce the type layout and is capable of dumping even the most complex types. I tested it on a few large struct types in a stripped build of the Go compiler, which is one of the largest and most complex open-source Go codebases, and it was able to dump all of them with no problem.

Figure 9: Dumping a complex structure in the Go compiler only specifying a type address, using the -p flag for pretty printing

Recap and looking forward

This summer, I enhanced Pwndbg so it can be integrated with Binary Ninja to access its rich debugging information. I also added the go-dump command for dumping Go values. All of this is available on the Pwndbg dev branch and its latest release.

Moving forward, there’s even more that can be done to improve the debugging experience. I developed my Binary Ninja integration with a modular design so that it would be easy to add support for more decompilers in the future. I think it would be amazing to fully support Ghidra (the current integration only syncs decompilation), as Ghidra is a free and open-source decompiler, making it accessible to everyone who wants to use the functionality.

In terms of Go debugging, work can be done to add better support for displaying and working with goroutines, which is currently one of the major advantages of the Delve debugger (a debugger specialized for debugging Go) over GDB/Pwndbg. For example, Delve is capable of listing every goroutine and the instruction that created them and it also has a command to switch between goroutines.

Acknowledgments

Working at Trail of Bits this summer has been an absolutely amazing experience, and I would like to thank them for giving me the opportunity to work on Pwndbg. In particular, I would like to thank my manager, Dominik Czarnota, for being incredibly responsive about reviewing my code and giving me feedback and ideas about my work, and the Pwndbg community, as they have been incredibly helpful with answering any questions I had during the development process.