Malware Analysis
Reverse Engineering
The specialist volume. Assumes Book 1 or equivalent experience. Read disassembled malware, defeat anti-analysis protections, unpack custom packers, reconstruct C2 protocols from raw bytes, and reverse engineer ransomware, rootkits, and RATs at the code level.
Malware analysis is one of the broadest technical disciplines in security. This series is split into two books to cover the subject with the depth it deserves. Book 1 covers practitioner skills — triage, static and dynamic analysis, memory forensics, YARA, and detection engineering — and requires no assembly or reverse engineering knowledge. Book 2 (this volume) covers the specialist skills that begin where Book 1 ends.
x86/x64 Assembly for Analysts
Registers and the stack, calling conventions, essential instruction set, reading function prologues and epilogues, identifying loops and conditionals in disassembly
Assembly language is the lowest level at which disassemblers show you code. You do not need to write assembly to be a malware analyst — you need to read it. The goal of this chapter is to build the mental model that lets you look at a disassembly listing and understand what the code is doing: what data it is operating on, where it came from, where it is going, and which logical structure it represents.
A disassembler reads the raw bytes of a binary and reconstructs the corresponding assembly instructions. Each line shows an address (where in memory this instruction lives), bytes (the raw encoding), a mnemonic (the human-readable operation name like MOV or CALL), and operands (what the operation acts on). The decompiler goes one step further and reconstructs a C-like pseudocode representation.
The Register Architecture
The Stack and Calling Conventions
The stack is a region of memory that grows downward (toward lower addresses). PUSH decrements the stack pointer and writes a value; POP reads a value and increments the stack pointer. The stack is how functions receive arguments (on x86), save return addresses, and preserve registers they modify.
The Function Prologue and Epilogue
Every function begins with a prologue that sets up its stack frame and ends with an epilogue that tears it down. Recognising these patterns lets you immediately identify function boundaries in a disassembly listing.
Identifying Loops and Conditionals
Essential Instructions — Analyst Reference
| Instruction | Operation | Flags Set | Common Use |
|---|---|---|---|
MOV dst, src | Copy src into dst | None | Move data between registers/memory |
LEA dst, [expr] | Load Effective Address — computes address, doesn't dereference | None | Pointer arithmetic, fast multiply |
PUSH / POP | Write to/read from top of stack, adjust RSP | None | Save registers, pass arguments (x86) |
CALL target | Push return address, jump to target | None | Function call |
RET | Pop return address from stack, jump to it | None | Function return |
ADD / SUB | Add/subtract; result in destination | CF, ZF, SF, OF | Arithmetic, pointer adjustment |
XOR dst, src | Bitwise exclusive OR | ZF, SF, CF=0 | Decryption, XOR reg, reg = zero register |
AND / OR | Bitwise AND/OR | ZF, SF, CF=0 | Masking flags, bit manipulation |
TEST dst, src | Bitwise AND without storing result — sets flags only | ZF, SF, CF=0 | TEST eax, eax checks if eax is zero |
CMP dst, src | Subtraction without storing result — sets flags only | CF, ZF, SF, OF | Comparison before conditional jump |
JMP / Jcc | Unconditional/conditional jump based on flags | None | Loops, branches, if/else |
SHL / SHR | Shift left/right (multiply/divide by powers of 2) | CF, ZF, SF | Bit manipulation, fast multiply |
INC / DEC | Increment/decrement by 1 | ZF, SF, OF | Loop counters |
REP MOVS / STOS | Repeated move/store using RCX as counter | None | memcpy/memset equivalents — common in shellcode |
- In x64 Windows, function arguments go RCX → RDX → R8 → R9 → stack; recognising this pattern immediately tells you what each argument to an API call is without reading documentation
TEST eax, eaxfollowed byJZ/JNZis the most common null/zero check pattern — it appears after every function call that returns a handle or status code- An XOR loop with a single-byte immediate value is a decryption routine — the immediate value is the key; write a Python script to decrypt the data before spending time on further analysis
- Writing to
EAXautomatically zero-extends toRAXin x64 — this is why you frequently seeMOV eax, ...in 64-bit code even when the value is used in a 64-bit context - The function prologue pattern (
SUB RSP, Nor register saves followed by local variable setup) immediately identifies function entry points when auto-analysis misses them
Ghidra from Zero
Project setup and auto-analysis, CodeBrowser navigation, disassembly vs decompiler view, renaming and retyping, cross-references, and the iterative RE workflow
Ghidra is the NSA-developed, open-source reverse engineering framework that has become the primary free tool for malware analysts. It combines disassembly, decompilation, a scripting engine, and a collaborative analysis platform in a single application. This chapter covers the complete workflow from opening a binary for the first time to arriving at a readable, annotated decompilation of its key functions.
Project Setup and Initial Analysis
Reading the Decompiler View
The Iterative RE Workflow — Renaming and Retyping
Raw Ghidra output is unreadable. FUN_00401000, param_1, uVar2 — these names carry no information. The core RE workflow is iterative annotation: identify what a function or variable is, rename it to something meaningful, and watch the decompilation of its callers become clearer as a result.
After Annotation — The Readable Function
- The Defined Strings window is the fastest entry point — find the C2 URL or a persistence path string, cross-reference it, and you land directly in the function that matters most
- Rename aggressively and early — every renamed function and variable improves the readability of its callers; the analysis compounds as you work outward from known functions
- Retyping variables from
undefined8to the correct Windows type dramatically improves decompiler output —LPWSTRvsundefined8*tells Ghidra how to display the data - Cross-references (
Xin Listing, right-click → References in Decompiler) show you every call site — use them to build the call graph from interesting functions outward to their callers - The iterative workflow — identify a known API call, rename the function that calls it based on what it does, follow cross-references to that function's callers — is the core RE loop that eventually covers the whole binary
IDA Pro and x64dbg
Where IDA and Ghidra differ, graph view, FLIRT signatures for library recognition, x64dbg for dynamic debugging — breakpoints, stepping, memory inspection, runtime patching
IDA Pro is the industry standard disassembler in professional malware analysis and vulnerability research. Ghidra is the free alternative that has closed most of the gap. This chapter covers the IDA-specific capabilities that remain useful alongside Ghidra, and introduces x64dbg — the dynamic debugger that lets you execute malware instruction by instruction and inspect its state at any point.
Where IDA Differs from Ghidra
| Capability | IDA Pro | Ghidra |
|---|---|---|
| Graph view | Excellent — the original; highly readable CFG with colour-coded edges | Good — available but less mature than IDA's |
| FLIRT signatures | Extensive library — identifies statically-linked code from hundreds of compilers/libraries | FunctionID (limited); third-party FLIRT imports possible |
| Decompiler | Hex-Rays (paid add-on) — generally produces cleaner output, especially for x64 | Built-in free decompiler — excellent for a free tool; occasionally produces odd output |
| Scripting | IDAPython — mature, large ecosystem of community scripts | Java and Python — equally capable; growing ecosystem |
| Cost | $3,000–$15,000+ per seat | Free and open source |
| Collaborative analysis | IDA Teams (paid) | Built-in shared project support |
IDA Graph View — Reading Control Flow
x64dbg — Dynamic Debugging
x64dbg executes the malware inside a controlled environment and lets you pause execution at any point, inspect registers and memory, modify values, and step instruction by instruction. This is essential when static analysis can't determine what a function does — just run it and watch.
Useful x64dbg Keyboard Reference
Combining Static and Dynamic Analysis
The most effective RE workflow alternates between Ghidra and x64dbg. Ghidra gives you the big picture — all functions, all strings, the call graph. x64dbg fills in what static analysis can't determine — the runtime value of computed expressions, the result of API calls, the contents of dynamically allocated buffers. When a Ghidra function is confusing, set a breakpoint in x64dbg at its entry and step through it while watching the registers.
- IDA's FLIRT signatures identify statically-linked library code — functions matched by FLIRT are not malware logic and can be skipped; focus analysis on unmatched functions
- Graph view makes control flow immediately visible — the diamond shape is if/else, the back-edge is a loop; these patterns are identifiable in seconds without reading every instruction
- Hardware breakpoints are preferable to software breakpoints when analysing samples that check for debuggers — they use CPU debug registers and do not insert INT3 bytes that
IsDebuggerPresentcan detect - Runtime patching in x64dbg — modifying a register value or NOP-ing a conditional jump — is the fastest way to bypass anti-analysis checks during dynamic analysis; changes are in-memory and don't affect the file on disk
- The static-dynamic loop: analyse in Ghidra to understand structure → set x64dbg breakpoints at interesting functions → observe runtime behaviour → update Ghidra annotations with observed values → repeat
Unpacking in Depth
The OEP concept, manual unpacking workflow in x64dbg, PE reconstruction with Scylla, unpacking common packers, and automated unpacking frameworks
Book 1 showed how to detect packing and unpack UPX with a single command. This chapter covers what to do when that command fails — when the sample uses a custom packer, a renamed UPX stub, or a commercial protector. The technique is universal: let the packer do its job inside the debugger, identify the moment it hands control to the original code, and dump the process memory at that moment to recover the unpacked binary.
A packed binary has two entry points. The packer entry point is where execution begins — it runs the decompression or decryption stub. The Original Entry Point (OEP) is where the real malware code begins, after the packer has finished its work. The goal of manual unpacking is to pause execution exactly at the OEP, at which point the unpacked binary exists in memory and can be dumped.
Finding the OEP — Three Techniques
Dumping and Fixing the Unpacked Binary
Verifying the Dump — Before and After
After Scylla produces the fixed dump, a quick hex inspection confirms the PE header is intact and the IAT has been rebuilt correctly. The MZ/PE signatures should be present at their expected offsets and the import directory should point to valid RVAs.
Automated Unpacking
For volume analysis, manual unpacking of every packed sample is impractical. Several automated approaches exist for common packers:
- unpac.me — cloud-based automated unpacking service; supports hundreds of packer families; returns the unpacked binary with the detected packer family name
- Qiling Framework — Python-based binary emulation; scripts can emulate the packer stub to OEP without needing a real Windows machine or debugger
- CAPE Sandbox — Cuckoo fork specifically designed for malware unpacking; automatically captures the unpacked payload from memory and saves it as a separate artefact
- Dynamic analysis approach — even without unpacking, dynamic analysis in a monitored VM captures the real behaviour; pair with a memory dump at execution time for Volatility analysis on the unpacked in-memory image
- The ESP trick (hardware BP on the initial RSP value) works reliably for most single-layer packers — it fires when the packer restores the saved stack context just before transferring control to the OEP
- The tail jump pattern — a series of register pops followed by
JMP EAXorPUSH addr; RET— is the most common OEP transfer mechanism; recognising it visually saves time over technique-based approaches - Scylla's IAT reconstruction is essential after dumping — the dump alone has an invalid import table; Fix Dump produces the correctly rebased and import-patched binary that Ghidra can analyse
- If IAT autosearch fails, examine the OEP region manually for the GetProcAddress calls the unpacked code uses to self-resolve its imports, then point Scylla's IAT start address manually
- For known packers at scale, unpac.me is faster than manual unpacking — reserve manual unpacking for custom packers and novel protectors that automated services can't handle
Deobfuscation Techniques
XOR decryption loops, base64 variants, control flow obfuscation — opaque predicates and dispatcher patterns, and Ghidra scripting for automated deobfuscation
Obfuscation is the set of techniques used to make code harder to understand without changing what it does. Packing hides the code entirely; obfuscation makes the visible code confusing. The two are often combined: a packed binary that, once unpacked, contains heavily obfuscated code. This chapter covers the most common obfuscation techniques encountered in malware and the analytical approaches to defeat each.
XOR Decryption — Identifying and Scripting
XOR is the most common obfuscation technique in malware. A key byte (or sequence of bytes) is XOR'd with each byte of the obfuscated data. In Ghidra's disassembly, a XOR decryption routine looks like a loop that reads one byte, XORs it with a constant, and writes it back. The constant is the key.
Multi-byte XOR and Rolling Key
Control Flow Obfuscation
Control flow obfuscation makes the logical structure of code difficult to follow without changing its behaviour. Three patterns appear frequently in protected malware:
Ghidra Scripting for Automated Deobfuscation
- XOR decryption loops are identifiable by their structure — a loop that reads a byte, XORs with a constant, and writes it back; the constant is the key; extract the data and key and decrypt with a three-line Python script
- Rolling keys advance the key value with each byte — mirror the key-advancement arithmetic exactly in your decryption script; even a tiny difference (wrong modulus, wrong increment) produces garbage output
- Opaque predicates are always-true or always-false branches that exist only to confuse analysis tools — in x64dbg, run to the branch and observe whether it is always taken or always not-taken, then treat the dead branch as dead code
- The dispatcher / flattened control flow pattern is the most time-consuming obfuscation to defeat statically — use x64dbg to record the actual execution sequence by observing the state variable, then annotate Ghidra with the real block order
- Ghidra's Python scripting API gives full access to the binary's bytes — automate decryption directly in Ghidra rather than extracting data to a separate script, so decrypted strings appear in the analysis context
API Hashing and Import Reconstruction
How API hashing works, common hash algorithms (ROR13, djb2), writing a hash resolver script, Scylla import reconstruction, and automated tools for known hashing schemes
API hashing is the technique used when even dynamic import resolution via GetProcAddress leaves too many strings in the binary. Instead of looking up "CreateRemoteThread" by name, the malware computes a hash of every export name in kernel32.dll and compares it to a hardcoded hash value. No API name string ever appears in the binary. From a static analysis perspective, the import table is empty and no strings reveal capability. This chapter covers how to defeat it.
How API Hashing Works
Computing ROR13 Hashes — Building a Lookup Table
Finding Hash Values in the Binary
- API hashing leaves no API name strings in the binary — the only evidence is the hash constants and the PEB walk routine; identifying the PEB walk is the entry point to finding which APIs the malware uses
- ROR13 is the most common API hashing scheme (used by Metasploit, Cobalt Strike shellcode, and many custom implants) — a precomputed lookup table of all Windows API names resolves all hashes in seconds
- Once hashes are resolved, annotate Ghidra immediately with the correct API names — the downstream decompilation of every function that calls the resolved function table becomes readable
- Different hashing schemes (djb2, FNV, custom) require identifying the hash function from its disassembly — the rotate-and-accumulate pattern in the inner loop is the giveaway; adjust your Python implementation to match
- Community tools like HashDB (OALabs) maintain databases of known API hashes for common malware families — check HashDB before writing a custom resolver
Anti-Debugging Techniques
PEB flag detection, API-based checks, timing attacks, exception-based detection, heap flags, and defeating each technique with ScyllaHide and runtime patching
Anti-debugging is the set of techniques malware uses to detect that it is running under a debugger and alter its behaviour accordingly — typically by exiting, sleeping, or executing decoy code. Every technique described in this chapter has a countermeasure. The analyst's goal is not to eliminate anti-debugging from the sample but to neutralise it efficiently so analysis can proceed.
PEB-Based Detection
The Process Environment Block (PEB) contains two fields that Windows sets to indicate a debugger is attached. These are the most commonly used and easiest to detect checks.
API-Based Detection
Timing-Based Detection
ScyllaHide — Automated Anti-Debug Bypass
ScyllaHide is an x64dbg plugin that automatically handles the most common anti-debugging techniques. It intercepts relevant API calls, patches PEB fields, and emulates timing functions to present a clean environment to the malware. For most commodity malware, enabling ScyllaHide before analysis makes anti-debug bypassing essentially automatic.
- PEB.BeingDebugged and NtGlobalFlag are the most common checks — ScyllaHide patches both automatically; for manual bypass, patch the conditional jump after the check or zero the result register before the branch
- NtQueryInformationProcess with ProcessDebugPort (class 7) is harder to defeat than IsDebuggerPresent because it queries the kernel directly — ScyllaHide hooks it at the syscall level
- RDTSC timing checks fire because execution is much slower under a debugger — ScyllaHide's RDTSC emulation eliminates this; manual bypass requires modifying the delta register to a value below the threshold
- Hardware breakpoints don't insert INT3 bytes and are therefore invisible to byte-scanning anti-debug checks — prefer hardware BPs over software BPs when analysing protected samples
- Exception-based detection works by generating an exception and checking whether a debugger caught it before the program's exception handler — ScyllaHide's exception handling hooks neutralise this class of check
Anti-VM and Anti-Sandbox Techniques
CPUID hypervisor detection, VM artefact enumeration, timing differentials, user interaction and locale checks, and defeating each at the analysis environment level
Anti-VM and anti-sandbox techniques are structurally similar to anti-debugging: the malware checks for indicators of a controlled analysis environment and changes behaviour if it detects one. The key difference is scale — anti-debugging is defeated once per debugging session; anti-VM must be defeated at the analysis environment configuration level, before samples are even run, because reconfiguring a VM per-sample is impractical at volume.
CPUID-Based Hypervisor Detection
VM Artefact Enumeration
User Interaction and Environment Checks
Reversing the Anti-VM Code Itself
When a sophisticated sample evades your environment despite these countermeasures, the solution is to reverse engineer the anti-VM routine itself and understand exactly what it checks. Set breakpoints before the check, single-step through it, observe what API it calls or what registry key it reads, and patch the comparison. The analytical approach always wins — every check is a comparison that can be patched.
- Configuring the analysis VM baseline correctly eliminates most anti-VM checks before they become a per-sample problem — realistic hostname, username, MAC address, disk size, screen resolution, and populated document history should all be set before taking the clean baseline snapshot
- CPUID leaf 0x40000000 returns the hypervisor vendor string — VMware and VirtualBox both have characteristic strings that malware compares against; this string can be overridden in VMware .vmx configuration files
- The CPUID hypervisor present bit (ECX bit 31 from leaf 1) is the fastest VM detection check and the hardest to defeat without hardware virtualisation configuration changes — some analysts use bare-metal machines for samples that heavily gate on this
- Locale-based exclusions (CIS country codes) are common in crimeware — an analysis VM with en-US locale is transparent to this check without any modification
- Every anti-VM check reduces to a comparison that can be patched in x64dbg — if the environment-level countermeasures don't work, reverse the check and patch the branch; analysis always proceeds
Shellcode Analysis
Position-independent code, the GetPC technique, PEB walk to find kernel32, running shellcode safely with scdbg and Speakeasy, and analysing shellcode in Ghidra
Shellcode is position-independent executable code — a raw byte sequence that can run at any memory address without the Windows loader's assistance. It contains no import table, no PE header, no relocation table. It finds everything it needs at runtime: its own location in memory, the base address of kernel32, the addresses of the API functions it requires. Understanding how it does this is what makes shellcode analysis tractable.
A PE file delegates import resolution to the Windows loader. Shellcode has no loader — it must bootstrap itself. The techniques covered in this chapter — GetPC and the PEB walk — are the universal shellcode bootstrap mechanism. Every piece of shellcode you encounter will use some variant of these, making them the first patterns to look for in any shellcode analysis.
The GetPC Technique
Position-independent code cannot use hardcoded addresses, because it doesn't know where it will be loaded in memory. To reference its own data (embedded strings, encrypted payloads, function tables), shellcode first needs to determine its own current address. The classic technique is the CALL/POP pattern.
The PEB Walk — Finding kernel32 from Scratch
Shellcode in Raw Bytes — What It Looks Like on Disk
Shellcode arrives as a raw blob — no headers, no section table, just executable bytes starting at offset zero. Recognising the CALL/POP GetPC pattern and the PEB walk from the raw hex is the first step before loading it into Ghidra or scdbg.
Running Shellcode Safely for Analysis
Analysing Shellcode in Ghidra
- The CALL/POP GetPC pattern is the first thing to look for in any shellcode — it appears in the first dozen bytes and establishes the base address that all subsequent data references are relative to
- The PEB walk is universal in x86 shellcode —
fs:[0x30]→ PEB → LDR → InMemoryOrderModuleList → third entry = kernel32; recognise this 6-instruction sequence on sight - scdbg provides fast behavioural analysis of shellcode without executing it on real hardware — it emulates API calls and reports all network connections, file operations, and execution transfers in seconds
- Ghidra can analyse shellcode as a raw binary — the key difference from PE analysis is manually defining the entry point at offset 0 and manually annotating API calls resolved via the PEB walk
- After identifying the PEB walk and API hashing scheme, apply the Chapter 6 hash resolver script to annotate all API call sites — the shellcode becomes fully readable in Ghidra's decompiler
Rootkit and Kernel Malware Analysis
User-mode vs kernel-mode rootkits, DKOM process and file hiding, driver loading and DriverEntry, WinDbg kernel debugging setup, and Volatility kernel analysis plugins
Rootkits operate at or below the level of the operating system's own visibility mechanisms. A user-mode rootkit can hide from the OS by hooking the functions the OS uses to list processes and files. A kernel-mode rootkit modifies the kernel data structures themselves. Understanding both requires a mental model of how Windows manages processes and handles system calls.
User-Mode vs Kernel-Mode Rootkits
| Type | Mechanism | Persistence | Detection Approach |
|---|---|---|---|
| User-mode hook | Modifies function pointers in user-space DLLs (ntdll.dll, kernel32.dll) to intercept API calls | Until process restart; injects into system processes for longevity | Compare in-memory function bytes to on-disk DLL bytes — hooks show as modified bytes |
| DKOM | Directly modifies kernel objects (process list, file records) — removes entries rather than hiding them from queries | Until reboot; survives even kernel queries if the object is removed from all lists | Cross-view analysis — compare results from multiple enumeration methods; Volatility pstree vs pslist discrepancy |
| Kernel driver | Loads a malicious .sys driver that runs in Ring 0 with full kernel privileges | Via service key or registry Run entry pointing to .sys file | Enumerate loaded drivers; check driver signing; Volatility modules + driverscan |
| SSDT hook | Overwrites entries in the System Service Descriptor Table to redirect syscalls to malicious handlers | In memory; requires kernel driver to write SSDT | Volatility ssdt plugin — compares SSDT entries to expected ntoskrnl values |
DKOM — Direct Kernel Object Manipulation
Kernel Driver Analysis in Ghidra
- DKOM hides processes by unlinking their EPROCESS entry from ActiveProcessLinks — they remain invisible to pslist but visible to psscan (pool tag scanning), which is why always running both is essential
- DriverEntry is the kernel driver entry point — the IRP dispatch table assignments it makes reveal the driver's capability: a driver that sets up an IOCTL handler communicates with user-land; one that modifies SSDT entries hooks system calls
- SSDT hooks redirect syscalls to the rootkit's handlers — Volatility's ssdt plugin identifies hooks by finding SSDT entries pointing outside ntoskrnl.exe's address range
- A kernel driver requires either a valid code signing certificate (since Windows Vista 64-bit with Secure Boot) or a technique to bypass driver signature enforcement — unsigned drivers are an immediate red flag in memory forensics
- Cross-view analysis — comparing results from two enumeration methods — is the foundational detection technique for rootkits; any discrepancy between two enumerations of the same resource indicates hiding behaviour
Ransomware Reverse Engineering
Key generation and the asymmetric wrapping scheme, reversing the encryption loop to identify the algorithm, file enumeration logic, weak RNG exploitation for key recovery, and what makes ransomware decryptable
Book 1 showed what ransomware looks like from the outside — the ProcMon events, the shadow copy deletion, the ransom note write. This chapter reverses a ransomware sample from the inside — reading the key generation code, identifying the encryption algorithm from its implementation, and understanding what separates recoverable ransomware (weak RNG, escrow key visible in memory, protocol vulnerability) from truly unrecoverable encryption.
The Standard Ransomware Cryptographic Architecture
Professional ransomware uses a hybrid encryption scheme: fast symmetric encryption (AES) for file content, and asymmetric encryption (RSA or elliptic curve) to protect the symmetric keys. Recovering files requires the attacker's private key, which never touches the victim's machine.
Identifying the Encryption Algorithm from Code
When a sample uses the Windows CryptoAPI, the algorithm is identified by a constant passed to CryptCreateHash or CryptImportKey. When it implements encryption from scratch, you identify the algorithm by its mathematical constants.
What Makes Ransomware Decryptable — The Weak RNG Case
- The standard ransomware architecture uses AES for file encryption and RSA to protect the AES key — recovery requires the attacker's RSA private key unless a vulnerability exists in the key generation
- Ransomware using
rand()or time-based seeds instead ofCryptGenRandomis potentially recoverable — the seed space is brute-forceable if the infection timestamp is known - Identifying the encryption algorithm by its mathematical constants (AES S-Box, ChaCha20 "expand 32-byte k") is faster than tracing the full key schedule — search for known constant byte sequences in Ghidra
- If the C2 connection fails during key exchange, many ransomware families write the encrypted session key to the ransom note or a local file — recovery may be possible if you intercept the C2 traffic or find this file before it is deleted
- The encryption loop structure (file enumeration → per-file AES key → encrypt file → rename with extension) is consistent across most ransomware families; identifying it in Ghidra proceeds rapidly once you recognise the patterns from Chapter 1's loop recognition
C2 Protocol Reverse Engineering and Detection
Finding and reversing the communication function, the command parsing loop, response encoding, emulating the protocol in Python, writing Suricata rules, and the complete RE workflow end-to-end
Every implant communicates with its operator via a C2 protocol. Understanding that protocol at the code level — not just observing its traffic from the outside — is what makes it possible to write detection rules that survive C2 infrastructure rotation, build decoders that can parse encrypted command streams, and understand the full command set the operator has available. This final chapter walks the complete reverse engineering of a C2 communication function.
Finding the C2 Communication Function
Reversing the Command Parsing Loop
Extracting the Session Key and Emulating the Protocol
Documenting the Protocol Structure
Writing a Suricata Rule from the Reversed Protocol
The Complete RE Workflow — End to End
This chapter has demonstrated every stage of the full reverse engineering pipeline. Bringing it together as a summary of what Book 2 has covered:
- Unpack (Ch. 4) — identify the packer, find the OEP with the ESP trick, dump with Scylla
- Deobfuscate (Ch. 5) — identify and script the XOR decryption routine; defeat control flow obfuscation
- Resolve imports (Ch. 6) — identify API hashing scheme, build lookup table, annotate Ghidra call sites
- Bypass anti-analysis (Ch. 7–8) — configure ScyllaHide, fix the VM baseline, patch remaining checks
- Disassemble and annotate (Ch. 2–3) — iterative renaming from known API calls outward; IDA graph for complex functions; x64dbg to fill in dynamic values
- Characterise the payload (Ch. 9–11) — shellcode bootstrap if applicable; identify cryptographic constants; understand the key architecture
- Reverse the protocol (this chapter) — find communication function, parse the format, extract keys, emulate in Python
- Produce detection artefacts — YARA rule from static indicators, Sigma rule from behavioural indicators, Suricata rule from protocol structure
- Report — ATT&CK technique map, capability summary, IOC list, detection coverage assessment
- Working backward from network API imports (HttpSendRequestA, InternetOpenA) to their call sites in Ghidra leads directly to the C2 communication function — this is always the fastest path to the most important function
- The command dispatch switch statement reveals the implant's full command set — every
caseis a capability; naming each handler function turns the switch into a complete capability inventory - Hardcoded AES session keys in the binary can be extracted and used to decrypt traffic captures — even without the C2 server, captured traffic between the implant and server becomes readable
- Protocol-level Suricata rules detect the C2 regardless of infrastructure rotation — a magic byte match at a fixed offset in the TCP payload is a stronger long-term indicator than a domain or IP that will be rotated in days
- Emulating the C2 protocol in Python confirms your understanding is correct — if your emulated packet generates a valid server response, the protocol reverse engineering is accurate; this is the verification step that transforms analysis into confirmed knowledge
Malware Analysis Book 1: Foundations & Behavioural Analysis
Book 2 assumed familiarity with triage, static and dynamic analysis, memory forensics with Volatility, YARA rule writing, IOC extraction, and ATT&CK mapping. If any of those areas felt thin during this volume, Book 1 covers all of them in depth — no assembly required.
← Return to Book 1