Software Reverse Engineering [Everything is Open-source]
Software reverse engineering is the process of taking an already forward engineered source code apart to see how it works.
Anything can be reversed engineered; software, physical machines, military technology and even biological functions.
And oh boy, everything becomes open source if you can reverse engineer “everything”.
It can be useful in virtually every field, whether for security analysis, vulnerability research, or software modification.
However, there are some laws that pertain to reverse engineering, including patent law, copyright, and fair use law in the Electronic Communications Privacy Act, among others.
The ethics behind reverse engineering can be complicated and are only compounded by the proliferation of information technology in our everyday life.
But that’s a topic for another post.
What is Reverse-Engineering and How is it Useful?
For engineering to happen in the first place, something will already need to have been forward engineered.
Reverse engineering is primarily used to replicate a product more inexpensively or because the original product is no longer available.
It can also help improve the quality of a software or find bugs.
And this process does require a set of tools.
There are generally three steps when reverse engineering:
- information extraction: when the object or design is studied and the important information is extracted.
- modeling: abstracting the collected information into a model, and
- review: testing the model to see if it has been successfully reverse engineered.
Some common examples of what hackers and companies can reverse engineer include:
- software: like reconstructing the source code, which also requires the use of other tools like CAD and a disassembler
- computer parts: for instance, if a processor manufacturer wants to make its own processor based on a competitor’s design, and
- network security assessments: where someone or a company will reverse engineer network attacks to prevent them in the future.
How Does Reverse Engineering Work?
Since it’s important to understand what and how a program was forward engineered to be able to effectively reverse engineer it, the process usually starts from understanding forward engineering.
And forward engineering starts from the beginning of the software development life cycle, that is way back from the specifications phase where you gather requirements, and goes through analysis phase where you analyze the domain requirements, then design where you try to design the project, and finaly implementation and testing.
In that order a program comes alive; requirements gathering starts and ends, then analysis, next is design. After designing, implementation kicks off, followed by testing and deployment.
Now to rework this software, alias reverse-engineer it, starts from analysing a copy of this existing software with a view to understand its design, specifications and functions.
From there, you build a program database which is useful for generating necessary information.
Why Software Reverse Engineering?
- If you can successfully hack a software, you can handle the complexity of even the largest or most complex systems.
- The process can be useful for recovering lost information.
- Reverse engineering a software can help detect bugs and side effects in a program.
- Detecting reusable artifacts and components from dead, proprietary or legacy code bases.
And when you are hacking a software, you are ultimately looking out for three core things:
Understanding the process
You are looking to spot every process and function that performed a task in the program; probably not all, but at least the ones you care about.
Understanding the data
Demystifying the data and data structures as you sniff around will give you an understand of the type of input the program takes and outputs.
Understanding user interface
Most importantly, the user interface is where most commands and every inputs will come from and reconstructing it can make your work a lot easier.
Reverse Engineering in Action – Hacking a Crackme program
How about we actually hack something. Let’s do it.
The following are the tools we will be using.
1. Ghidra:
Ghidra is a popular open-source all in one static analysis, disassembler and decompiler tool that was developed by the United State’s National Security Agency (NSA) and the key features include,
- a compiled binaries analyzer to perform various analyses, such as control flow analysis, data flow analysis, and function identification, to provide a comprehensive view of the binary’s structure.
- a disassembler and decompiler to disassemble binaries into assembly code, then decompile the assembly code into a higher-level programming language, aiding in understanding the logic of the software, and
- debugger for debugging.
A team of people can also collaborate on the Ghidra software simultaneously.
2. IDA (Interactive Disassembler):
While Ghidra is free and open-sourced, IDA is a commercial disassembler and debugger developed by Hex-Rays. And who says you can’t hack it.
Some of the features are a UI for carrying out actions, interactive analyzer to annotate and comment on disassembled code, and plug-in support to extend capabilities however you want.
3. x64dbg:
x64dbg is another open-source 32 and 64-bit debugger for the Windows operating system which we will be using for debugging and scripting. It also has plugin support.
These tools collectively provide an environment for us to dissect and understand the functionality of the compiled binaries we will be working on.
And we will not be hacking anything complicated. So before you start deciding to join the list of NASA hackers, you should start with a simple crackme programs which are designed specifically for you to analyze and reverse in the hopes of cracking them.
It is a perfect way to to test your reverse engineering skills and get your hands dirty in the process for the first time.
Once you have a crackme, first execute it to see what it does before starting to attempt dissecting it.
Here, I have one that asks for an input, and when I enter a random one, it simply fails.
I’ll first drop it into Ghidra and proceed with the analysis.
Now, we already have a high-level idea of how the crackme behaves.
It asks for an input and prints a failure when it is incorrect. So let’s search the string reference “input key” using the string window of Ghidra to find where the program asks for input.
Since nothing comes up, the program is most likely packed. So, let’s take a look inside with Detect It Easy (DiE).
DiE shows that it is indeed packed, and even though it doesn’t really tell us what was used to compress it, we’ll need to unpack it.
Now, the unpacking shouldn’t be too difficult because we will just attempt dumping the image at runtime using a debugger like x64dbg.
After starting the program with x64dbg, I will press the Run button, some prompt shows up in the console.
And it seems the debugging stops, but the program still runs, probably because once it has unpacked the program, it executes it in the same process.
All we have to do then is to attach x64dbg to the created process and dump its image.
To do so, simply click into Plugins
, Scylla
in the new window, in misc
options disable use PE from disk
.
Then simply click IAT, get Imports, and dump, and give a location where to dump the image.
Now, if we take this image inside of Ghidra and check for the string “input key,” we find it.
If we check its reference and follow it, we end up in a function, which must be where the checks happen.
So now, all we have to do is to figure out where the key is, and we’re done and that will be right after we clean up the code.
While the end result of the code might look confusing, it’s just Assembly code.
And now, we can see the key, fragmented in variables, which if we read their hex values in the right of little-endian (little-endianness is the dominant ordering for processor architectures (x86, most ARM implementations, base RISC-V implementations) and their associated memory), and when the individual bytes are converted to text, we get a string, the string.
We can now input it into the crackme, and we get success.
…
The crackme I have just described step by step should give you an idea of the process of reversing binaries.
Keep in mind though, that this is just an example, and this approach is just one in a million, of course.
Some programs can use things like Python, which is interpreted, or Java, which is emulated, or even .NET, which is just-in-time compiled and this could change the process entirely.
Also, some languages have compilers that generate different kinds of machine code for the same code, like Rust binary is very different from a C++ or a Go binary.
That’s why your goals should be aimed at developing the intuition required to reverse anything, no matter the goal.
If you followed along, you just might have hacked your first software. How does it feel?
Conclusion
While forward engineering starts from requirments gathering and ends with testing, reverse engineering effectively flips this process on it’s head.
And there are usually three (3) phases.
First, implementation has taken place already. Now, by seeing the implementation, we are trying to design the product. After designing, analysis then begins.
You might have big goals, but shart hacking away at crackmes first.
Get hacking!