Ghidra Decompiler - CLI guide ############################# `Ghidra `_ has a decompiler that unlike the rest of the program (written in java) is written in C++. This caught my attention so I started to hack on it. Unfortunately, there isn't much written on the decompiler if one wants to use it standalone, in the terminal without the ghidra GUI. This article tries to fill that void. Building The Decompiler *********************** Fetch and unzip the ghidra package from `their github release page `_ .. code:: $ unzip ghidra_11.1.2_PUBLIC_20240709.zip `cd` into the decompiler directory and build it .. code:: $ cd ghidra_11.1.2_PUBLIC/Ghidra/Features/Decompiler/src/decompile/cpp $ make decomp_opt -j $(nproc --all) You should end up with a executable called `decomp_opt`. Running the Decompiler ********************** While inside the directory, export the SLEIGHHOME env variable so our decompiler can find it, then run the executable. .. code:: $ export SLEIGHHOME=/home/shreeyash/ghidra_11.1.2_PUBLIC $ ./decomp_opt [decomp]> The compiler is running now waiting for commands. .. note:: Remember to always export the environment variable before running decomp_opt. You could consider tossing the two commands into a script, making life easier for you. Decompile and view an ELF executable ************************************ Let's start with a trivial c++ program with some control flow, compile it into an executable (ELF) and decompile it. Here's the program, save and compile it: .. code:: $ cat a.cpp #include #define THRESHOLD 20 int foo() { return 10; } int main() { int b = foo(); std::cout << "The threshold is " << THRESHOLD << '\n'; std::cout << "You returned " << b << '\n'; if (b < THRESHOLD) { std::cout << "get in\n"; } else { std::cout << "get out!\n"; } } $ g++ -no-pie a.cpp -o a $ ./a The threshold is 20 You returned 10 get in The executable is ready, what's left now is decompilation. Let's start the decompiler, and load our file: .. code:: $ ./decomp_opt [decomp]> load file a a successfully loaded: Intel/AMD 64-bit x86 We've loaded our executable in the decompiler. c++ is an abstract language with constructs that do not make any sense to a CPU. These include, but are not limited to: functions, structs, loops etc. In order to implement these, the compiler has to translate abstractions into concrete implementation which manifests itself in the form of control flow instructions like branch, compare, and jump. If we peep into an executable, we'll notice what we called functions are now 'addresses' i.e. a number that represents a location in memory. Functions are run by jumping (i.e. setting the program counter) to an address. Essentially, if we wish to decompile a function we had in source, we'll have to find the corresponding address at which it resides. `a.cpp` has two functions: `main` and `foo`. To find the address where a functions resides in the executable, we could use `objdump`. .. code:: $ objdump -C -D a ... 00000000004011c5
: 4011c5: f3 0f 1e fa endbr64 4011c9: 55 push %rbp 4011ca: 48 89 e5 mov %rsp,%rbp 4011cd: 48 83 ec 10 sub $0x10,%rsp 4011d1: e8 e0 ff ff ff call 4011b6 <_Z5todayv> 4011d6: 89 45 fc mov %eax,-0x4(%rbp) 4011d9: 48 8d 05 24 0e 00 00 lea 0xe24(%rip),%rax # 402004 <_IO_stdin_used+0x4> 4011e0: 48 89 c6 mov %rax,%rsi 4011e3: 48 8d 05 96 2e 00 00 lea 0x2e96(%rip),%rax # 404080 <_ZSt4cout@GLIBCXX_3.4> 4011ea: 48 89 c7 mov %rax,%rdi 4011ed: e8 9e fe ff ff call 401090 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt> 4011f2: 48 89 c2 mov %rax,%rdx 4011f5: 8b 45 fc mov -0x4(%rbp),%eax ... Searching for 'main' reveals its label which resides at address `0x4011c5`. .. code:: [decomp]> load addr 0x4011c5 main Function main: 0x004011c5 `load addr` takes an address and an optional 'label'. Label is essentially a name that we assign to that address. In this case, it was 'main'—could've been anything for what its worth. .. code:: [decomp]> decompile Decompiling main Decompilation complete [decomp]> print C xunknown8 main(void) { int4 iVar1; xunknown8 xVar2; iVar1 = func_0x004011b6(); xVar2 = func_0x00401090(0x404080,0x402004); xVar2 = func_0x004010c0(xVar2,0x14); func_0x004010a0(xVar2,10); xVar2 = func_0x00401090(0x404080,0x402016); xVar2 = func_0x004010c0(xVar2,iVar1); func_0x004010a0(xVar2,10); if (iVar1 < 0x14) { func_0x00401090(0x404080,0x402024); } else { func_0x00401090(0x404080,0x40202c); } return 0; } [decomp]> Just like that, we've decompiled our program. Notice how the names are garbled. This is because names (of variables and functions) are really neccessary to execute a program. Let's analyze the decompiled output. The latter part of all function names are their address. This means, we can look them up in the `objdump`. Moreover, if the set of commands that got us `main` s decompilation we to be repeated for all the functions present in in the output, the resulting decompilation of main would replace all address with the labels we assign to them. Looking up in `objdump`, we find `func_0x004011b6` to be foo: .. code:: ... 00000000004011b6 : 4011b6: f3 0f 1e fa endbr64 4011ba: 55 push %rbp 4011bb: 48 89 e5 mov %rsp,%rbp 4011be: b8 0a 00 00 00 mov $0xa,%eax ... `func_0x00401090` is not present in the executable, however, the calls to this function are shown in the objdump thusly: .. code:: 4011ed: e8 9e fe ff ff call 401090 >& std::operator<< >(std::basic_ostream >&, char const*)@plt> Its quite obvious from the hint that `func_0x00401090` is the operator `<<` overloaded to accept a `std::basic_ostream` object and a `const char *`. The `@plt` at the end indicates that this function can be found in the `.plt` section of the executable. `.plt` which stands for Procedure Linkage Table is a redirection table of external functions that can be found in shared objects. So, `func_0x00401090` is `operator<<` found in `libstdc++.so` that the program is linked to. It takes two arguments: both addresses to objects. A search reveals that the first argumnet is the object `std::cout` of which the definition resides in an external library (`libstdc++.so`) and the other argument is a char literal that can be found in the `.rodata` section of the executable. .. code:: $ objdup -s -j .rodata a Contents of section .rodata: 402000 01000200 54686520 74687265 73686f6c ....The threshol 402010 64206973 2000596f 75207265 7475726e d is .You return 402020 65642000 67657420 696e0a00 67657420 ed .get in..get 402030 6f757421 0a00 out!.. Indeed, the string `"The threshold is "` is present at address `0x0402004`. Likewise, all following functions till `func_0x004010a0` are overloads of `operator<<` that handle different types of data. What remains is the control flow. It checks if `iVar1` which is `b` in the original source is less than `0x14` (`THRESHOLD`) and calls the familiar `func_0x00401090` i.e. (`operator<<`). Conclusion ********** Our work was made much easier by the fact that the executable was not 'stripped'. Stripping is a process that gets rid of all the symbols that are not absolutely neccessary for execution (greatly reduces executable size). In the real world, especially if we are dealing with propreitary software, executables might be stripped. Unstripped executables allows us to tread faster by simply searching for symbols like we did to find main. Stripped executables require us to trace, find and deduce what we need. In a later article, I may demo decompilation of stripped executables.