Taming the dragon: reverse engineering firmware with Ghidra

Hardware Hacking
How Tos
Reverse Engineering

Taming the dragon: reverse engineering firmware with Ghidra

Adam Bromiley

12 Mar 2026 14 Min Read

Related services
Related blogs

Introduction

I stumbled into infosec the same year the NSA graced us with Ghidra. It’s by far become the most used tool in my arsenal for reverse engineering and vulnerability research. It’s free, extensible, and supports some of the quirkier architectures we come across.

But its learning curve is steep.

This blog post is the culmination of my learnings from spending what may be too many hours in front of Ghidra’s glaring and dated UI. It focuses on firmware analysis, so malware researchers and OS bug hunters may be disappointed, but there should be tips for everyone. All examples are real research projects we’ve undertaken.

It’s worth noting that reverse engineering is not a quick process. It can take weeks to fully understand how a system works. The snail summits Mount Fuji, eventually…

Contents

Identify the load address

Find the interrupt vector table

Define a memory map

Annotate as much as possible

Identify functions with strings

Don’t miss stack strings

Hunt for magic values

RTOS scheduling

BSim to identify library functions

Source code importing

Delve into the assembly

Make the most of scripts and plugins

Large language models

Identify the load address

Back to contents

Firmware is rarely loaded into memory at address zero. In many architectures the first addresses are reserved for the initial stack pointer and reset vector. The CPU jumps to the reset vector to continue execution. This allows different firmware (e.g., main application code vs. a recovery bootloader) to be loaded.

Without setting the load address, the disassembler will create invalid memory references (to functions, strings, etc.) and lead you to failure.

Base addresses can be determined in several ways:

The chip’s datasheet—e.g., it’s well documented that STM32s load firmware at 0x0800 0000.

The IVT (see next section)—if interrupt handlers are at 0x20061ae2, 0x2004200c, etc., the base address is probably 0x20000000.

String references—get the offset of strings and look for potential pointers to them in the code. For a string at offset 0x5eea4, search for references to 0x5eea4 (endianness considered). Repeat with another string, then another. If all the strings have potential references with the same significant bytes (e.g., 0x1005eea4, 0x10014a2, …), you’ve identified the base address. You’ll get false positives performing this manually, but tools can be used to sample hundreds of strings at once and check what base address creates the most valid references.

Other known firmware structures—binbloom identifies UDS databases in automotive firmware that contain pointers to handler functions. Like the IVT method, it determines the base address from the MSB of each function pointer. Your firmware may also have debug tables that map function names to addresses.

Figure 1: The load address of main flash memory for the STM32L0x3 family (RM0360 Reference Manual)

Figure 2: Setting the firmware load address in Ghidra on import

Find the interrupt vector table

Back to contents

Often starting at offset zero, the IVT is a table of addresses for interrupt handlers.

Set each entry’s datatype to a pointer (key binding “p”) and you can double-click to jump to the handler’s code. Kick off disassembly with key binding “d”.

The table’s first record is typically the reset handler (or the second, if the first is an initial stack pointer). This is the firmware’s entry point. Disassembling the reset handler is a helpful precursor to running Auto Analysis, since it will run through most functions in the correct order of execution.

The purpose and location of each IVT entry is documented in your chip’s datasheet.

Figure 3: The IVT of an STM32F405 as per the reference manual (RM0090 Reference Manual)

Figure 4: Disassembled and annotated IVT in the firmware of an STM32F100, prior to Auto Analysis

Define a memory map

Back to contents

Flash memory will already be here with the base address of what you set previously. What Ghidra doesn’t do is set it as read-only (it’s expected that firmware usually does not or cannot write to the loaded flash image).

Strings and other read-only data structures will have odd references:

Figure 5: Why isn’t the decompiler displaying a string?

Set to read-only and they’ll appear as expected:

Figure 6: Read-only memory allows Ghidra to display strings

It’s also worth defining read-write SRAM (using addresses in the chip’s reference manual), otherwise addresses will show red and you cannot define their datatype or initialise them with data:

Figure 7: It’s not possible to retype SRAM variables and other references outside of the existing memory map

Figure 8: Defining the SRAM region in the memory map allows you to retype variables created in memory

The final task is adding the location of registers so you know what peripheral a subroutine is interacting with, otherwise you just see references to an unknown address.

CMSIS System View Description (SVD) files list these locations for ARM microcontrollers and can be loaded into Ghidra with the SVD-Loader script (https://github.com/leveldown-security/SVD-Loader-Ghidra). The script also sets the peripheral registers’ datatypes to well-defined structs:

Figure 9: Memory Map and Data Type Manager after running SVD-Loader

M68000 processors have peripheral locations that depend on runtime-configured base address registers (VBR, RAMBAR, and MBAR). Look in the firmware’s initialisation sequence for how these are set:

Figure 10: Some architectures use variable-offset registers

Annotate as much as possible

Back to contents

Even if a function or data address seems irrelevant, if you have an inkling of its purpose, label it with key binding “l”. It helps down the line when you see it referenced elsewhere. For binaries with symbols, prefix your labels to distinguish them from autogenerated ones.

Add comments (key binding “;”).

Labelling variables and correcting their datatypes (“Ctrl-l”) helps understand the decompilation, especially when returning to the project after a long weekend. You’ll reach the point where you reconstruct subroutines into compilable C snippets.

Identify functions with strings

Back to contents

This is an obvious one. Since firmware is often stripped of symbols, strings are the only form of documentation the vendor provides you.

Debug logs, assertion, and exception messages often provide a full function prototype that allows you to rename the function and set correct parameter datatypes:

Figure 11: An assertion that identifies the strncmp function and param_1 and 2 as a struct used by ESP32’s virtual filesystem

Address tables sometimes have titles or include function names alongside their pointer. Find these automatically, with other cross-references, bookmarks, or by manually scrolling through the disassembly:

Figure 12: Address table for a CLI containing entries for the command name, function pointer, and description

Don’t miss stack strings

Back to contents

Strings can be constructed on the stack instead of being stored at compile time in static data sections. Why? Optimisation (e.g., four-byte strings can be handled in 32-bit registers), obfuscation, or intentional design choices concerning memory allocations, lifetimes, and real-time constraints.

Figure 13: Ghidra will not display string assignments to stack-based variables as expected; the 16-byte key is decompiled as four four-byte integers

Figure 14: Annotating with correct datatypes will help display stack-based strings

Hunt for magic values

Back to contents

A less obvious method of function identification. Use the memory and scalar searches for known constants like the DHCP magic cookie (0x63825363), ARP’s EtherType (0x0806), or the AES S-Box.

Figure 15: The ARP EtherType identified the Treck TCP/IP stack’s tfEtherRecv function

Figure 16: Searching for the first few entries of the CRC-16-ANSI lookup table led to functions that processed Modbus packets and asserted firmware integrity

RTOS scheduling

Back to contents

Real-time operating systems add a layer of complexity to reverse engineering. Global resources are shared between subroutines, it’s more common to see heap allocations and objects passed by reference, and it’s challenging to separate kernel code from application code.

It’s helpful to identify how tasks are invoked. The task creation routine will take a pointer to the function that implements the given task. By searching for calls to this routine, you’ll get a list of all function pointers to application code.

Task creations also often take task names as arguments for debug purposes. So in addition to getting useful function pointers, you can label them with descriptive names. FreeRTOS defines xTaskCreate with the following:

BaseType_t xTaskCreate( 
    TaskFunction_t pvTaskCode,    // Pointer to the task's entry function 
    const char * const pcName,    // The task's name 
    const configSTACK_DEPTH_TYPE uxStackDepth, // Words to allocate for the stack 
    void *pvParameters,           // Arguments for the task 
    UBaseType_t uxPriority,       // Scheduling priority 
    TaskHandle_t *pxCreatedTask   // Handle to the created task 
);

There’s often a dedicated task per network service so this knowledge is critical if you want to, say, focus your efforts on reverse engineering a custom protocol.

Figure 17: Using task creations to identify thirteen tasks on a FreeRTOS-based device

BSim to identify library functions

Back to contents

Ghidra BSim is a plugin for fuzzy-matching code with a database of known functions.

By knowing certain libraries used by the firmware (e.g., the RTOS or TCP/IP stack), you compile those libraries (or use public precompiled releases) yourself with debug strings. Feed them into Ghidra and it creates a database containing signatures of all the library functions.

You then compare functions in your firmware with the BSim database. Unlike function IDs, it performs fuzzy matching. It manages to match a large proportion of functions even if the library is a different version or was built with a different compiler for a different architecture. With two clicks you can copy the function name and any custom datatypes.

We published a full tutorial here: https://www.pentestpartners.com/security-blog/fuzzy-matching-with-ghidra-bsim-a-guide/

Figure 18: Identifying functions from the ESP32 Wi-Fi stack with BSim fuzzy matching

Source code importing

Back to contents

Try importing header files for libraries you know the firmware uses. Ghidra will populate the Data Type Manager with all typedefs, structs, and function prototypes. Preprocessor constants can annotate the decompilation with Set Equate… (key binding “e”).

This can be finicky to get working. Ghidra’s source code parser isn’t intelligent and the order of imports matters. You also need to set the correct preprocessor constants to pass #ifdef conditionals and add C standard library headers to the include path if relied on by the library (a complete C environment is not necessary; minimal sets of standard header files exist).

Figure 19: After importing header files for a known library, we can use its datatype definitions in annotating variables

Bookmarks

Back to contents

Ghidra populates the bookmark table during auto-analysis.

Here you can find address tables (useful for function identification), “undefined” functions (found from known operand sequences instead of references), embedded media, and places where disassembly failed.

Bookmarks can also be manually set or set by plugins.

Figure 20: Bookmarks showing a function table (likely handlers for interacting with Winbond memory), disassembly failures that must be manually fixed, and

Delve into the assembly

Back to contents

Unless you’re narrowing down on a vulnerability or exploring an archaic system that uses handwritten assembly or eccentric compilation techniques, the bulk of reverse engineering work will be in the decompilation window.

It makes logic flows easier to follow and allows you to annotate code with datatypes and descriptive names.

But the translation to pseudo-C isn’t always perfect. Ghidra often incorrectly assumes the number of function parameters or the size of variables. Sometimes, it can’t decompile the code at all.

Warning signs can include the existence of “in_”, “in_stack_”, “unaff_”, and “extraout_” variables in the decompiler view.

Reading the assembly is crucial in these scenarios. Research the calling convention used by your architecture, if parameters are passed on the stack or in registers, what order they’re read, and what registers can be clobbered by functions.

Without debug symbols, Ghidra fails to identify variadic and thiscall functions, e.g., printf(char *fmt, …) and object::method(). Set these manually in the Edit Function Signature window.

Figure 21: This is weird, why is the one-byte char variable placed 15-bytes into the stack and a whole four bytes beyond the second parameter?

Figure 22: Motorola 68000’s calling convention passes parameters on the stack, pushed right-to-left. Using this and the 0x8 offset against SP in the disassembly, we declare custom storage for the function with the char at offset 0xc instead of 0xf

Make the most of scripts and plugins

Back to contents

I’ve already mentioned SVD-Loader but there are plenty of other scripts to automate processes or annotate interesting functionality.

As examples: I’ve used the VxWorks Symbol Table Finder with success, Leaf Blower to find format strings and string.h functions, FindCrypt, and Stack String Explorer.

Large language models

Back to contents

I often feed Ghidra disassembly and decompilations into LLMs. It’s great for quickly identifying common algorithms like memory allocation, checksums, and cryptographic routines. At its simplest, naively copy-pasting Ghidra output into a chatbot like ChatGPT with a simple prompt will provide good insights.

We’ve also found success with dedicated AI extensions.

GhidrAssist (https://github.com/jtang613/GhidrAssist/) supports a variety of LLM APIs and provides a chat window with a solid default prompt template. You can provide supporting context with handwritten text documents or reference manuals, and it can be instructed to autonomously perform investigation loops for analysing more complex logic with self-reflection.

It also supports Model Context Protocol (MCP) servers to enhance the output with automatic decompilation and symbol annotation. The same author made GhidrAssistMCP for this purpose, but we’ve also used LaurieWired’s GhidraMCP with success (https://github.com/LaurieWired/GhidraMCP).

As is always the guidance for AI: it hallucinates so verify output, and use a local instance or one with enterprise data protection for commercially sensitive code.

Figure 23: We’ve found that publicly available LLMs do well at analysing both disassembly and decompilation snippets

Figure 24: LLMs are useful at roughly identifying functions

Conclusion

Back to contents

Like any tool, the more you use Ghidra, the better you get. Now go reverse engineer some firmware!