Malware Analysis
Foundations & Behavioural
Everything a SOC analyst, incident responder, or detection engineer needs to triage samples, extract IOCs, perform static and dynamic analysis, hunt in memory with Volatility, write YARA rules, and map findings to ATT&CK — without requiring assembly or reverse engineering skills.
Malware analysis is one of the broadest technical disciplines in security — spanning file format internals, behavioural monitoring, memory forensics, assembly language, reverse engineering, anti-analysis techniques, and family-specific deep dives. Covering the subject with the depth it deserves in a single volume would require sacrificing either breadth or rigour. This series is therefore split into two books, each with a distinct audience and purpose.
This is Book 1. It covers the practitioner skills — triage, static analysis, dynamic behavioural analysis, memory forensics, YARA, detection engineering, and IOC extraction. Book 2 covers the specialist skills — x86/x64 assembly, Ghidra, IDA Pro, unpacking, deobfuscation, anti-analysis, and full reverse engineering of ransomware, RATs, and rootkits.
What Malware Is
Taxonomy, the infection chain, the malware ecosystem, how samples reach analysts, and the mental model that drives analysis decisions
Every analysis decision — what to look for, which tool to use next, how to prioritise findings — flows from understanding what kind of malware you are looking at. A dropper behaves differently from an implant. A ransomware payload behaves differently from a stealer. The taxonomy is not academic classification for its own sake; it is the map that guides the analysis.
Malware (malicious software) is any software intentionally designed to disrupt, damage, or gain unauthorised access to a computer system. The key word is intentional — a buggy program that crashes your system is not malware. A program designed to encrypt your files and demand payment is.
The Malware Taxonomy
| Type | Primary Behaviour | Key Indicators |
|---|---|---|
| Virus | Infects and modifies existing legitimate files to propagate | Modified file timestamps, appended code sections, checksum mismatches |
| Worm | Self-replicates across networks without user interaction | High outbound connection volume, scanning behaviour, dropped copies in network shares |
| Trojan | Masquerades as legitimate software while delivering malicious functionality | Legitimate-looking icon/metadata, suspicious imports, unexpected network connections |
| Dropper | Delivers and installs another malware payload; may not persist itself | Short-lived execution, file writes to temp directories, spawns child processes |
| Loader | Downloads and executes a next-stage payload from a remote source | Network requests to C2, memory allocation + execute pattern, no persistent file write |
| Stager | Minimal first-stage designed only to download a full implant | Very small size, few imports, single network request, shellcode execution pattern |
| Implant / RAT | Persistent remote access — keylogging, screenshot, file access, shell | Persistence mechanisms, C2 beaconing, broad capability imports |
| Ransomware | Encrypts victim files and demands payment for decryption key | Mass file operations, CryptEncrypt imports, vssadmin delete shadows, ransom note writes |
| Stealer | Exfiltrates credentials, cookies, browser data, crypto wallets | Browser database access, DPAPI calls, clipboard monitoring, data exfiltration network traffic |
| Rootkit | Hides presence of malware from OS and analysis tools | Hook patterns in kernel/userland, DKOM artefacts, cross-view discrepancies |
| Bootkit | Persists in the boot process — survives OS reinstall | MBR/VBR modifications, UEFI firmware writes, pre-OS network activity |
| Fileless | Executes entirely in memory — no persistent file on disk | PowerShell/WMI/registry execution, injected memory regions, no file artefacts |
| Wiper | Destroys data permanently — no recovery mechanism | Mass file deletion/overwrite, MBR destruction, no ransom demand |
The Infection Chain
Real-world malware rarely operates as a single binary. Modern attacks use a staged delivery chain where each stage is purpose-built, small, and limited in what it does. Understanding the chain tells you where in it the sample you're analysing sits — and what to look for next.
The Malware Ecosystem
Understanding who writes malware and why shapes how you approach analysis. A sample from a sophisticated APT group will have anti-analysis features, custom encryption, and living-off-the-land techniques that a commodity crimeware tool won't bother with. A ransomware-as-a-service affiliate using a purchased builder produces something structurally different from a custom implant developed over months by a nation-state team.
- Crimeware / eCrime — financially motivated. Commodity tooling, RaaS (Ransomware-as-a-Service), MaaS (Malware-as-a-Service). Broad targeting, high volume, moderate sophistication. Examples: LockBit, Emotet, RedLine Stealer.
- APT (Advanced Persistent Threat) — nation-state or state-sponsored. Custom tooling, targeted victims, high sophistication, long-term persistence. Examples: Cobalt Strike customised variants, SUNBURST (SolarWinds), WannaCry (attributed to Lazarus Group).
- Hacktivism — ideologically motivated. Often simpler tooling, DDoS-heavy, opportunistic targeting. Wipers sometimes used for destruction rather than financial gain.
- Commodity vs Custom — commodity malware is purchased or rented; custom malware is purpose-built. Commodity samples have existing detection signatures and threat intel; custom samples require full analysis.
How Samples Reach Analysts
- Incident response — extracted from a compromised endpoint during an active engagement. Highest context (you know the victim, the timeline, the infection vector).
- VirusTotal / MalwareBazaar — community-submitted samples. Good for finding commodity malware; limited context about original deployment.
- Threat intelligence feeds — commercial (Recorded Future, CrowdStrike Intelligence) and open (MISP communities, abuse.ch). Often include context about campaign and attribution.
- Sandboxes — any.run, Hybrid Analysis, Joe Sandbox — sometimes expose unpacked or deobfuscated variants that don't appear in other feeds.
- Honeypots — passive capture of samples actively scanning or exploiting internet-facing systems. Useful for seeing what is actively weaponised in the wild.
- Malware taxonomy is not classification for its own sake — dropper vs loader vs implant tells you what to look for, what stage of the chain you're in, and what the next stage will be
- Modern attacks are staged — a single binary is rarely the whole story; identifying where a sample sits in the chain determines what to analyse next
- ATT&CK technique IDs appear throughout this module because they are the common language between analysis findings and detection engineering
- Commodity vs APT tooling determines how much anti-analysis effort to expect — a RaaS affiliate uses a purchased builder; a nation-state team builds custom implants with months of development time
Building the Analysis Environment
VM isolation and snapshot workflow, FlareVM and REMnux setup, network isolation options, safe file handling, INetSim, and anti-analysis awareness at setup time
Malware analysis on a production machine is not a calculated risk — it is an error. The consequences range from personal data theft to network-wide compromise. The analysis environment must be isolated, reversible, and configured to surface behaviour rather than hide it. Getting this right before touching a single sample is what makes everything else in this module safe to do.
Why VMs and Why Snapshots
A virtual machine provides three essential properties for malware analysis. First, isolation — the guest OS is separated from the host OS and from the network. Second, reversibility — a snapshot taken before analysis can be restored in seconds, returning the environment to a clean state regardless of what the malware did. Third, observability — VMs can be paused, their memory dumped, and their network traffic captured at the hypervisor layer without the guest OS knowing.
FlareVM — The Windows Analysis Toolkit
FlareVM (FLARE = FireEye/Mandiant Advanced Reverse Engineering team) is an automated PowerShell installer that turns a clean Windows installation into a fully equipped malware analysis workstation. It installs and configures tools across every analysis category.
Network Isolation Options
Network configuration is the most consequential setup decision. Different analysis scenarios require different network postures:
| Mode | Configuration | What Malware Sees | Use When |
|---|---|---|---|
| No network | No network adapter or disabled | No connectivity — network-dependent malware may terminate or behave differently | Samples that should behave without C2 (ransomware file encryption stage) |
| Host-only | Internal VM network only; no internet routing | Can reach host and other VMs on the same host-only network; no internet | Default for most analysis — captures traffic without exposing real internet |
| INetSim | REMnux VM on host-only network, running INetSim or FakeNet-NG | All DNS resolves to the REMnux IP; HTTP/HTTPS/SMTP/FTP all served fake responses | When malware must receive network responses to proceed (C2 check-in, download trigger) |
| Isolated VLAN | Physical network segment with no route to production | Full internet simulation or real internet via isolated egress | Heavily VM-aware samples that require physical hardware or real internet responses |
Safe File Handling
Several conventions prevent accidental execution of samples outside the analysis environment:
- Password-protected archives — the universal convention is password
infected. All malware samples shared in the community are archived with this password. Windows will not auto-execute a file inside a password-protected ZIP. - The .mal extension — renaming
sample.exetosample.exe.malprevents accidental double-click execution on Windows. Rename back inside the analysis VM before executing. - Disable autorun — in the analysis VM, disable Windows AutoPlay/AutoRun for all drive types. A USB drive or mounted ISO with a malicious autorun.inf file should not execute automatically.
- Shared folders with caution — the shared folder between host and guest VM is a potential escape path. Keep it read-only for the guest, transfer samples in one direction only (host to guest), and disable it when not actively transferring files.
Anti-Analysis at Setup Time
Many malware families detect that they are running in a virtual machine and alter their behaviour — running benign code, sleeping for extended periods, or terminating. Before taking the clean baseline snapshot, configure the VM to be less fingerprint-able:
- Set a realistic VM hostname (not "DESKTOP-FLAREVM" or "SANDBOX") — common sandbox hostnames are blocklisted by malware
- Create a realistic user account name and populate recent documents/browser history to simulate a real user environment
- Change the screen resolution to a realistic size (1920×1080) — some malware checks for very small sandbox resolutions
- Ensure the VM has more than 2 CPU cores and 4GB RAM — resource thresholds are a common sandbox detection technique
- Install VMware Tools or VirtualBox Guest Additions but be aware that their presence can be detected — some analysts deliberately remove or rename these components
- The snapshot cycle — restore clean baseline, analyse, revert — is the fundamental safety mechanism; it must be followed for every sample without exception
- FlareVM automates the installation of the complete Windows analysis toolkit; take the clean baseline snapshot after FlareVM completes and before any sample touches the VM
- INetSim on a REMnux VM provides fake internet services that satisfy C2 check-ins and trigger continued malware execution — essential for analysing loaders that require a network response to proceed
- Anti-analysis at setup time — realistic hostname, username, screen resolution, and resources — reduces the rate at which samples detect the environment and alter their behaviour
First Triage: Identifying What You Have
File type identification, hashing and VirusTotal, sandbox submission and critical reading of reports, MITRE ATT&CK mapping, and the triage decision tree
The first sixty seconds with a new sample determines how you spend the next sixty minutes. Triage is not shallow analysis — it is structured, efficient analysis aimed at answering one question as quickly as possible: what kind of problem is this, and what analysis approach does it require? Commodity malware with existing detection may need only IOC extraction; a novel sample needs full behavioural and static analysis.
File Type Identification
Never trust the file extension. Malware is routinely distributed with misleading extensions — a PDF icon and a .pdf.exe filename, a .doc file that is actually a PE binary, a .jpg that is shellcode. File type is determined by the magic bytes at the start of the file, not the extension.
Hashing and VirusTotal Workflow
Reading Sandbox Reports Critically
Automated sandbox reports are extremely useful for rapid triage — but they require critical reading. The sandbox can only report what it observed during its analysis window (typically 60–120 seconds), which may not represent the malware's full behaviour. Several factors cause sandboxes to produce incomplete reports:
- Sleep bombs — the malware calls
Sleep(600000)(10 minutes) before doing anything. The sandbox times out. The report shows nothing. The malware is not benign. - VM detection — the malware detects it is running in a sandbox and executes benign code only. The report shows nothing suspicious. The malware is not benign.
- User interaction requirements — the malware checks for mouse movement, window focus, or specific user actions before proceeding. The sandbox environment does not simulate these. The report is incomplete.
- Network dependency — the malware requires a response from its C2 before proceeding. If the sandbox cannot simulate the response, the execution chain stops early.
The Triage Decision Tree
- File type is determined by magic bytes, not extension — always identify file type before analysis; a .pdf icon that is MZ (0x4D5A) is a PE binary
- A VirusTotal score of 0/72 does not mean the file is clean — it means detection signatures don't yet exist; this is when full analysis is most needed
- ssdeep fuzzy hashing finds structurally similar samples — a 90% match to a known Emotet variant is a strong triage signal even at a different SHA-256
- Sandbox reports require critical reading — sleep bombs, VM detection, and user interaction requirements all produce incomplete reports; absence of sandbox findings is not evidence of safety
- The triage decision (commodity / variant / novel / packed) determines the entire subsequent analysis approach — spend time getting this right before committing to a full analysis pipeline
The PE File Format
DOS header, PE signature, COFF and Optional headers, section table, Import Address Table, resources, and reading PE structure with pestudio and CFF Explorer
Almost every piece of Windows malware is a PE (Portable Executable) file — an EXE, DLL, SYS, or one of several other variants. Every static analysis technique in the following chapters operates on the PE structure. Understanding what each field means and where it lives in the file is what makes pestudio's output intelligible rather than a list of numbers.
The Portable Executable format is the binary container that Windows uses for executable code. It encodes everything the Windows loader needs to map the binary into memory and execute it: where the code is, which DLLs it needs, where to find the entry point, what permissions each section requires, and where the resources live.
PE Structure Overview
The Section Table — What Each Section Tells You
The Import Address Table — Capability from DLL Imports
The Import Address Table (IAT) lists every external function the binary calls, organised by DLL. This single data structure is often the most informative piece of static analysis — the imports are a capability manifest. Before running a single instruction, you can make an educated assessment of what the malware can do based purely on what it imports.
| API Function | DLL | Implied Capability | Signal |
|---|---|---|---|
| CreateRemoteThread | kernel32.dll | Process injection — creates a thread in another process | 🔴 High |
| VirtualAllocEx | kernel32.dll | Allocates memory in a remote process — injection prerequisite | 🔴 High |
| WriteProcessMemory | kernel32.dll | Writes data into another process's memory — injection | 🔴 High |
| CryptEncrypt / CryptGenKey | advapi32.dll | Cryptographic operations — ransomware, data obfuscation | 🔴 High |
| IsDebuggerPresent | kernel32.dll | Anti-debugging detection | 🔴 High |
| GetProcAddress / LoadLibraryA | kernel32.dll | Dynamic API resolution — hides true imports from static analysis | 🔴 High |
| RegSetValueEx / RegCreateKey | advapi32.dll | Registry writes — persistence, configuration storage | 🟡 Medium |
| CreateService / OpenSCManager | advapi32.dll | Service installation — persistence as Windows service | 🟡 Medium |
| WNetAddConnection | mpr.dll | Network share connection — lateral movement, worm propagation | 🟡 Medium |
| URLDownloadToFile | urlmon.dll | Downloads a file from URL — dropper/loader behaviour | 🟡 Medium |
| ShellExecute / CreateProcess | shell32/kernel32 | Executes another process — payload launch, privilege escalation | 🟡 Medium |
| GetClipboardData | user32.dll | Clipboard monitoring — stealer behaviour | 🟡 Medium |
| InternetOpen / InternetConnect | wininet.dll | HTTP/internet communication — C2, download | 🟢 Context |
| FindFirstFile / FindNextFile | kernel32.dll | File system enumeration — ransomware file traversal, stealer | 🟢 Context |
A packed or obfuscated binary may show very few imports — sometimes only GetProcAddress and LoadLibraryA. This is intentional: the packer stub uses these two functions to resolve everything else dynamically at runtime, hiding the real capability from static import analysis. When you see this pattern, import table analysis is limited — defer to dynamic analysis where the resolved APIs will be visible in process monitoring output.
- The PE section table reveals packing (UPX section names, high entropy, RWX permissions, VirtualSize significantly larger than RawSize) before a single byte of code is executed
- The Import Address Table is a capability manifest — CreateRemoteThread + VirtualAllocEx + WriteProcessMemory in the same binary is a near-certain injection indicator
- A minimal IAT containing only GetProcAddress and LoadLibraryA signals dynamic import resolution — the packer is hiding the real capability; defer capability assessment to dynamic analysis
- The compile timestamp in the PE header can indicate when malware was built — but it is trivially faked; treat it as a soft indicator, not ground truth
- RWX section permissions are unusual and suspicious — legitimate software rarely needs sections that are simultaneously readable, writable, and executable
Strings and Import Analysis
strings vs FLOSS, what strings reveal, stack strings and obfuscated strings, import capability mapping, and correlating indicators into a capability hypothesis
Strings extraction is the highest-value-per-effort technique in static malware analysis. A single run of the right tool against an unpacked sample can surface C2 domains, file paths, registry keys, mutex names, PDB paths, encryption keys, and error messages that define what the malware does before a single instruction executes. The challenge is that modern malware increasingly obfuscates its strings — and the standard strings tool misses the obfuscated ones.
strings vs FLOSS
The standard strings utility extracts sequences of printable ASCII or Unicode characters of a minimum length. This finds plaintext strings embedded directly in the binary. It misses strings that are built on the stack character by character (stack strings), strings that are XOR-encoded, strings that are base64-encoded, and strings that are concatenated in small pieces. FLOSS (FLARE Obfuscated String Solver) addresses all of these.
What Strings Reveal — The Intelligence Categories
| String Type | Examples | Intelligence Value |
|---|---|---|
| C2 Infrastructure | update-cdn.net/api/v2/check, 185.220.101.45 | Network IOCs — block at firewall, hunt in DNS/proxy logs |
| File system artefacts | C:\ProgramData\svchost32.exe, %TEMP%\~tmp.bat | File IOCs — hunt on endpoints, forensic artefact locations |
| Registry keys | HKCU\Software\Configs\uuid, HKLM\SYSTEM\CurrentControlSet\Services\SvcName | Persistence mechanism, configuration storage location |
| Mutex names | Global\\MicrosoftUpdate_v2, LocalXPMutex | Unique sample/family identifier — hunt on live systems, prevents re-infection |
| PDB paths | C:\Users\developer\Documents\project\Release\payload.pdb | Development artefact — username, directory structure, project name of author |
| Error messages | Failed to connect: %s, Injection error code: %d | Reveals internal operation names and functionality |
| Commands | vssadmin delete shadows /all /quiet, net stop "Windows Defender" | Direct ATT&CK technique identification |
| Encoded data | Long base64 strings, hex-encoded payloads | May contain embedded next-stage payload — extract and analyse separately |
Reading Obfuscated Strings — What FLOSS Reconstructs
Stack strings appear as isolated single characters in the raw binary. They are invisible to strings but FLOSS executes the relevant code fragments in an emulator and captures the fully-formed string after the stack-building loop completes. The hex dump below shows what the raw binary looks like for a stack-built API name — each byte loaded individually with a MOV instruction — versus the same string stored plaintext.
Building the Capability Hypothesis
The output of strings analysis and import analysis should be combined into a capability hypothesis before moving to dynamic analysis. This hypothesis directs what to watch for in ProcMon and Wireshark — if your static analysis suggests persistence via Run key and C2 via HTTP, you know exactly which ProcMon filters and Wireshark display filters to configure.
- FLOSS finds obfuscated strings that
stringsmisses — stack strings, XOR-encoded strings, and tight loop decoded strings; always run both - Mutex names are among the most unique per-sample indicators — they can be used to detect presence of the malware on live systems and to cluster related samples in threat intel
- PDB paths are development artefacts that frequently survive into released malware — they can reveal developer usernames, directory structures, and sometimes project names
- Combine strings and import findings into a capability hypothesis before running dynamic analysis — this directs tool configuration and makes dynamic analysis significantly more efficient
- Long base64 strings in static analysis may be embedded next-stage payloads — decode them and treat the decoded output as a separate sample for analysis
Entropy, Packing, and Obfuscation Detection
Shannon entropy, what high-entropy sections reveal, packer detection with Detect-It-Easy, manual UPX unpacking, dynamic API resolution and API hashing
Most malware distributed in the wild is packed or obfuscated. Packing serves two purposes: it reduces file size (the original purpose, now irrelevant) and it defeats signature-based detection and static analysis (the current purpose). Understanding what packing looks like — in entropy measurements, in section characteristics, in the import table — tells you when your static analysis is working on a stub rather than the real code, and what to do about it.
Shannon Entropy
Shannon entropy measures the information density (randomness) of data on a scale from 0 to 8 bits per byte. Plaintext English text has entropy around 3.5–4.5. Compiled code has entropy around 5.5–6.5. Compressed or encrypted data has entropy approaching 7.5–8.0 — because compression and encryption both produce output that is nearly indistinguishable from random bytes.
Packer Detection with Detect-It-Easy
Manual UPX Unpacking
UPX is by far the most common packer encountered in commodity malware. It is trivial to unpack — the UPX tool itself can unpack a UPX-packed binary in a single command. Custom and commercial packers require the dynamic unpacking techniques covered in Book 2.
Dynamic API Resolution — Hiding Imports
Even without packing, malware can hide its true capabilities by resolving API functions at runtime rather than declaring them in the import table. Two common techniques: using GetProcAddress to look up functions by name at runtime, and API hashing — looking up functions by a computed hash of their name so even the string "CreateRemoteThread" doesn't appear in the binary.
- Entropy above 7.2 on any section strongly suggests packing or encryption — static analysis on such a section analyses the wrapper, not the actual malware code
- Detect-It-Easy identifies the specific packer or compiler — this determines the unpacking approach: UPX is trivially unpacked with
upx -d; commercial protectors and custom packers require dynamic techniques - After UPX unpacking, the real import table becomes visible — often revealing injection, cryptographic, and network capabilities that were hidden by the two-function stub
- Dynamic API resolution via GetProcAddress still leaves the API name string in the binary (findable by FLOSS); API hashing leaves nothing readable — deferred to Book 2 reverse engineering
- The presence of GetProcAddress and LoadLibraryA as the only imports is itself an indicator — it means the binary resolves everything else dynamically, and the real import table is hidden
YARA Rule Writing
YARA syntax, writing rules from static indicators, the PE module, entropy and math modules, community rule sets, testing and false positive management
YARA is the universal pattern-matching language for malware analysis. A YARA rule encodes what you know about a malware sample — its strings, byte patterns, structural characteristics — into a portable, shareable, deployable detection artefact. Writing YARA rules is the bridge between analysis and detection: the findings from static analysis become the rules that hunt for the same malware across an entire environment.
A YARA rule has three sections: meta (descriptive information — name, author, date, description), strings (the patterns to search for — text, hex, or regex), and condition (the logical expression that determines a match — must all strings be present? any of them? a specific combination?).
Basic Rule Syntax
The PE Module — Structure-Aware Rules
Testing and False Positive Management
Community Rule Sets
- YARA-Forge — automatically compiled community rules from multiple sources, deduplicated and tested. Best starting point for operational deployment.
- Neo23x0 (Florian Roth) — extensive, actively maintained rule sets covering hundreds of malware families. High quality and widely used.
- Mandiant/FLARE rules — from the same team that wrote FLOSS. Technically rigorous rules focused on advanced threats.
- abuse.ch / YARAhub — community-contributed rules, often tied to current threat intelligence.
- YARA conditions should combine multiple indicators — a rule that fires on a single common string will have high false positives; combining C2 URI + mutex + structural indicator dramatically reduces them
- The PE module enables structure-aware rules — import-based detection that fires on the capability signature rather than string content is harder to evade by recompilation
- Always test rules against clean system files before deployment — a rule that matches
notepad.exewill generate thousands of false positives in production - The
uint16(0) == 0x5A4Dcondition (MZ header check) should be in almost every Windows malware rule — it restricts the rule to PE files and eliminates a large class of false positives - YARA-Forge and Neo23x0 provide production-quality community rules — extend and adapt them for variants rather than writing from scratch for every known family
Behavioural Analysis: Process and System Monitoring
ProcMon filter workflow, what to look for in the event stream, persistence registry locations, Process Explorer for injection detection, Autoruns, and ATT&CK correlation
Dynamic analysis begins the moment you execute the sample. From that point, three things are happening simultaneously: the malware is doing what it was designed to do, your monitoring tools are recording every action, and the capability hypothesis you built from static analysis is being confirmed or revised. The skill is not in running the tools — it is in filtering the noise and recognising the signal.
Process Monitor — The Event Stream
Process Monitor (ProcMon) captures every file system, registry, process, and network event on the system in real time. A single minute of execution on a Windows machine produces thousands of events — the vast majority from legitimate system processes. The workflow is filter first, analyse second.
Key Persistence Registry Locations
The following registry locations are the most commonly used by malware for persistence. Every one of these should appear in your ProcMon filters and your Autoruns review:
| Registry Path | Scope | ATT&CK | Notes |
|---|---|---|---|
HKCU\...\CurrentVersion\Run | Current user | T1547.001 | Most common persistence location — executes on user login |
HKLM\...\CurrentVersion\Run | All users | T1547.001 | Requires admin — persists for all user logins |
HKLM\SYSTEM\CurrentControlSet\Services\ | System | T1543.003 | Service installation — persists across all users, starts before login |
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon | System | T1547.004 | Winlogon helper DLL — executes at logon |
HKLM\SOFTWARE\Microsoft\Windows NT\...\Image File Execution Options\ | System | T1546.012 | Debugger hijacking — intercepts legitimate process execution |
HKCU\SOFTWARE\Classes\*\shellex\ContextMenuHandlers\ | Current user | T1546.015 | COM hijacking via context menu handler |
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce | All users | T1547.001 | Executes once on next login then deletes itself — often used by droppers |
Process Explorer — Live Injection Detection
Autoruns — What Persists Across Reboots
Autoruns from Sysinternals enumerates every autostart location on the system — every place where something is configured to run automatically. After executing malware, run Autoruns and look for new entries added since the clean baseline. Entries highlighted in yellow indicate the backing file is missing (the malware may have already deleted its dropper). Entries highlighted in red indicate the backing file is not signed.
- ProcMon filter configuration is the most important setup step — filtering to the malware's process name and likely child processes before execution eliminates noise and makes the event stream readable
- The Run and RunOnce registry keys under both HKCU and HKLM are the most common persistence mechanisms — filter for them in ProcMon and check them first in Autoruns
- Process Explorer's image path verification immediately surfaces masquerading — a binary named svchost.exe in anything other than System32 is malware
- Threads with no backing module in Process Explorer indicate injected shellcode — legitimate threads always have a module name associated with their start address
- Autoruns run after malware execution immediately shows what has persisted — yellow entries (missing file) indicate a dropper that cleaned itself up after installing the implant
Behavioural Analysis: Network Traffic
INetSim and FakeNet-NG setup, C2 communication patterns, protocol identification, DNS analysis, extracting network IOCs, and writing Suricata rules
Network traffic analysis during dynamic analysis captures the malware's communication behaviour — what it connects to, what protocol it uses, what it sends, what it expects to receive. This produces the network IOCs that defenders can deploy immediately: C2 domains and IPs for firewall block lists, HTTP patterns for proxy rules, JA3 hashes for IDS signatures. It also reveals the C2 protocol structure, which becomes the foundation for more targeted detection.
Setting Up Network Simulation
INetSim (Internet Services Simulator) or FakeNet-NG runs on the REMnux VM and simulates internet services — DNS, HTTP, HTTPS, SMTP, FTP, IRC. When the analysis VM's DNS is pointed at REMnux, all DNS queries resolve to the REMnux IP, and all connections to simulated services receive plausible fake responses. This allows loaders that require a C2 response to continue executing past the initial check-in.
Reading the C2 Check-In in Wireshark
Inspecting the Raw Beacon Packet
When a malware beacon uses a custom binary protocol over raw TCP rather than HTTP, Wireshark shows the raw bytes. Understanding the packet structure from a hex view is the first step toward writing a protocol parser or detection rule.
Identifying C2 Protocols
| Protocol Pattern | Indicators in Traffic | Family Examples |
|---|---|---|
| HTTP/S C2 | Regular GET/POST to same host, consistent interval, custom headers, encoded body | Emotet, Cobalt Strike default, most commodity RATs |
| DNS C2 | High-frequency queries, long subdomain labels, TXT/NULL record responses, high entropy subdomains | SUNBURST, DNSMessenger, dnscat2 |
| Raw TCP/UDP | Non-standard port, binary protocol (not HTTP), fixed-size packets, encrypted with no TLS handshake | Custom implants, older RATs |
| Legitimate service abuse | Connections to Pastebin, Discord, Telegram, GitHub — benign domains, unusual content/timing | Various commodity malware, living-off-the-land C2 |
| ICMP C2 | ICMP echo with non-zero data payload, large payload, encoded content in ICMP body | Rare but present in APT tooling |
Writing a Suricata Rule from Traffic Analysis
- INetSim/FakeNet-NG is essential for analysing loaders and stagers — without fake internet responses, network-dependent malware terminates early and the full execution chain is never observed
- Custom HTTP headers are among the strongest network IOCs — the
X-Sessionheader pattern in this chapter is far more specific than a domain or IP that will change - DNS queries are a complete picture of C2 infrastructure including fallback domains — capture all DNS queries, not just successful connections
- Suricata rules derived from dynamic analysis target the actual communication protocol — they detect the behaviour regardless of which IP or domain the C2 migrates to
- Connections to legitimate services (Discord, Telegram, GitHub) as C2 require content-level analysis — the destination domain is not suspicious but the usage pattern is
Memory Analysis with Volatility
Why memory analysis matters, acquiring memory, Volatility 3 core plugins, malfind for injection detection, carving executables, and correlating with ProcMon
Memory analysis is the technique that finds what disk analysis misses. Fileless malware executes entirely in memory with no persistent file on disk. Injected code lives in the memory of a legitimate process — it appears nowhere in the file system. Encryption keys, decrypted configuration, and plaintext C2 credentials exist in memory during execution and nowhere else. Volatility provides a framework for interrogating a memory image to surface all of these.
Memory Acquisition in the Analysis VM
Core Volatility 3 Plugins
malfind — The Injection Detection Plugin
The malfind plugin finds memory regions that are: executable, not backed by a file on disk, and contain what appears to be executable code (PE header or shellcode). This is the primary signature of process injection — the injected code lives in allocated memory with no corresponding file.
Finding Encryption Keys and Strings in Memory
- Fileless malware and process injection leave no file artefacts — memory analysis is the only technique that finds them reliably
- malfind finds injected code by looking for executable, file-unbacked memory regions containing PE headers or shellcode — it is the first plugin to run when injection is suspected
- PAGE_EXECUTE_READWRITE (RWX) on a private memory region is the canonical injection signature — legitimate mapped DLLs are read-execute (RX), not writable
- Volatility's yarascan applies YARA rules to process memory — the same rules written from static analysis can find injected copies in memory without a separate file artefact
- Encryption keys, decrypted C2 configuration, and plaintext strings that are obfuscated on disk often appear in plaintext in process memory during execution — always run strings against the suspicious process memory region
Process Injection and Memory Forensics
Classic DLL injection, reflective DLL injection, process hollowing, thread hijacking, parent process spoofing — what each technique produces in Volatility output
Process injection is the technique of running code within the address space of another, legitimate process. It is used to hide malware behind trusted process names, bypass application whitelisting, inherit the privileges and network access of the host process, and evade process-based detection. Each injection technique leaves a characteristic footprint — understanding that footprint is what lets you identify which technique was used from a memory image alone.
Classic DLL Injection
The most common and well-understood injection technique. The attacker allocates memory in the target process, writes the path to their malicious DLL, and creates a remote thread that calls LoadLibrary with that path. Windows loads the DLL into the target process as if it were legitimate.
Reflective DLL Injection
Reflective DLL injection improves on classic injection by eliminating the need to write a file to disk. The malicious DLL contains a custom loader (the reflective loader) that maps the DLL into memory from a byte array, resolves its own imports, and calls its entry point — all without calling LoadLibrary. The DLL never appears on disk and doesn't appear in the standard module list.
Process Hollowing
Process hollowing creates a legitimate process in a suspended state, unmaps its original executable image from memory, and maps the malicious image in its place. The process name, PID, and path all appear legitimate — the content is entirely replaced.
- Classic DLL injection appears in
dlllist— look for DLLs outside System32/Program Files directories; reflective injection does not appear indlllistbut does appear inmalfind - Process hollowing is detected by the VadS (private) tag at the expected image base address — a file-mapped legitimate process would show Vad (file-mapped) at that address
- When malfind fires, always dump the region and analyse it as a separate sample — the injected PE may be a completely different malware family with its own IOCs
- Parent process spoofing (PPID spoofing) makes malware appear to have been spawned by a legitimate parent — Volatility's
pstreeshows the actual parent PID; cross-reference with event logs that may show the real creator process - The combination of malfind + dlllist + vadinfo gives a complete picture of a process's memory — use all three before concluding whether injection is present and which technique was used
IOC Extraction and Threat Intelligence
IOC taxonomy, extraction workflow, IOC quality and decay, MISP and STIX/TAXII sharing, VirusTotal pivoting, and infrastructure analysis
Analysis findings become intelligence when they are structured, contextualised, and shared. An IP address in a Wireshark capture is a data point. That same IP address formatted as a STIX indicator, linked to a malware family, a campaign, and a threat actor, with a confidence score and a decay date, and shared via TAXII to every SIEM in your sector — that is threat intelligence. This chapter covers the transformation.
The IOC Taxonomy
| Type | Examples | Strength | Decay Rate |
|---|---|---|---|
| Atomic | IP address, domain, URL, email address, file hash | Easy to deploy; widely supported | Fast — IPs and domains rotate in days to weeks; hashes evaded by recompile |
| Computed | imphash, ssdeep, TLSH, JA3 hash | Survives recompilation; clusters variants | Medium — survives until attacker changes import structure or TLS config |
| Behavioural | Registry key paths, mutex names, file path patterns, C2 URI structure | Hardest to evade — requires significant code change | Slow — behavioural indicators can be valid for months to years |
| Network pattern | HTTP header structure, beacon interval, custom protocol fields, Suricata rules | Protocol-level — survives C2 infrastructure rotation | Slow — valid until attacker changes the communication protocol |
IOC Extraction Workflow
Infrastructure Pivoting
A single C2 domain or IP is a starting point, not an endpoint. Infrastructure analysis — finding related domains, certificates, and IPs operated by the same threat actor — dramatically expands the IOC set and provides early warning of future campaigns.
- Behavioural IOCs (mutex names, registry key paths, URI structure, custom HTTP headers) age better than atomic IOCs (IPs, domains, hashes) — prioritise them for detection rules
- Infrastructure pivoting via passive DNS, certificate transparency, and WHOIS turns one C2 domain into a cluster of related infrastructure — dramatically expanding early warning capability
- imphash clusters samples that share the same import structure even when recompiled — a new binary with the same imphash as a known malware family is almost certainly from the same family
- Sharing IOCs via MISP/STIX-TAXII multiplies their value — an indicator that protects one organisation protects every organisation in the sharing community simultaneously
- Registration date close to compile date, privacy-protected WHOIS, and recently issued certificates are soft indicators of malicious infrastructure — no single one is conclusive, but the combination is telling
ATT&CK Mapping and Detection Engineering
Mapping analysis to techniques, building a Navigator layer, writing Sigma rules from behavioural findings, detection coverage assessment, and layered detection design
MITRE ATT&CK is the common language between malware analysis and detection engineering. When an analyst finds that a sample modifies the Run registry key, they can tag that finding as T1547.001. When a detection engineer asks which techniques are covered by current rules, they reference the same taxonomy. ATT&CK makes the conversation between analysts, detection engineers, and defenders possible without ambiguity about what behaviour is being described.
Mapping Sample Findings to ATT&CK
Writing Sigma Rules from Behavioural Findings
Sigma rules translate behavioural observations — what ProcMon saw, what event IDs fired, what Autoruns found — into portable SIEM detection rules. A Sigma rule written for Splunk can be converted to Sentinel KQL, Elastic EQL, or QRadar AQL by the Sigma converter.
Detection Coverage Assessment
Once a sample's ATT&CK techniques are mapped, you can assess which of those techniques your current detection stack covers. This answers the question: "If this malware ran in our environment today, which stages would we detect, and which would be blind spots?"
- ATT&CK mapping is the translation layer between analysis findings and detection engineering — every behavioural observation should be tagged with its technique ID before the analysis report is written
- Sigma rules are portable — write once, convert to Splunk SPL, Sentinel KQL, or Elastic EQL via sigmac; the same behavioural finding generates detection for every SIEM simultaneously
- Detection coverage assessment reveals which phases of an attack chain are visible and which are blind — prioritise filling gaps in execution and persistence techniques before anti-analysis techniques
- A single malware sample's ATT&CK profile can drive an entire sprint of detection engineering work — map first, then systematically build rules for each uncovered technique
- Layered detection is more resilient than single-indicator detection — a malware family that changes its C2 domain should still fire on its Run key write, its injection behaviour, and its network protocol pattern
Sandbox Analysis and Automated Tools
Sandbox architectures, reading reports critically, evasion techniques that defeat sandboxes, improving sandbox results, malware clustering with imphash and ssdeep, automated pipelines
At scale, manual analysis of every sample is impossible. A mid-sized organisation's email gateway may block hundreds of suspicious attachments per day. Sandboxes provide automated dynamic analysis — executing samples and reporting behaviour without an analyst manually operating the tools. Understanding how sandboxes work, where they fail, and how to make them more effective is the skill that makes automated analysis operationally useful rather than a false confidence generator.
Sandbox Architectures
| Type | Mechanism | Examples | Strengths / Weaknesses |
|---|---|---|---|
| Agent-based | Software agent in guest OS monitors API calls, file/registry events | Cuckoo Sandbox, CAPE | Rich data; agent itself is detectable by sophisticated malware |
| Hook-based | Hooks Windows API functions at user or kernel level to intercept calls | Any.run, Joe Sandbox | Very detailed API logging; hooks create detectable timing differences |
| Hypervisor/hardware | Monitoring at the hypervisor level — invisible to guest OS | DRAKVUF, PolySwarm | Near-undetectable; hardware-level observation; expensive to deploy |
| Emulation | Executes code in a software emulator rather than real hardware | Speakeasy, Qiling | No risk to host; detectable via emulation artefacts; useful for shellcode |
Sandbox Evasion — Why Reports Are Incomplete
Malware Clustering — Finding Variants at Scale
- A clean sandbox report does not mean a sample is benign — sleep bombs, VM detection, user interaction requirements, and process list checks all produce clean reports from malicious samples
- Improving sandbox results requires customising the analysis environment — realistic hostname, MAC address, resource allocation, user simulation, and extended timeouts defeat the most common evasion classes
- imphash clustering groups samples by import structure — a cluster of 187 samples with the same imphash likely came from the same builder and need only one deep analysis
- ssdeep fuzzy hashing within an imphash cluster identifies configuration variants — samples at 95%+ similarity are the same binary with different embedded config; 60–80% similarity suggests same family with code changes
- Automated triage pipelines (hash → VT lookup → sandbox → cluster → YARA) let analysts focus manual time on genuinely novel samples rather than re-analysing known commodity malware
Malware Families, IR Integration, and Career
Applying Book 1 techniques to ransomware, stealers, loaders, and RATs — the analysis-to-containment loop, analyst career path, certifications, and practice resources
The preceding fourteen chapters covered individual techniques in isolation. This final chapter applies them together — showing what each family type looks like when you run the full Book 1 analysis pipeline, and how those findings flow directly into incident response actions. It also closes the loop on the career path: where malware analysis fits in security operations, and how to build the skill through structured practice.
Ransomware Triage — What Book 1 Finds
Ransomware's distinguishing characteristic is mass file system activity — the encryption loop. Even without knowing the cryptographic implementation, Book 1 techniques surface every observable indicator needed for immediate incident response.
Stealer — What Book 1 Finds
The Analysis-to-Containment Loop
The value of malware analysis in an IR context is measured by the speed of the loop from finding to containment action. Analysis findings drive five categories of IR action:
- Network isolation — C2 domains and IPs block at firewall and proxy immediately; DNS sinkholing for domains the C2 might rotate to
- Endpoint hunting — YARA scans across all endpoints for the sample's static indicators; Sigma rules deployed to SIEM for behavioural indicators; Event ID hunting for specific actions the malware performs
- Containment — isolate affected endpoints identified by hunting; disable persistence mechanisms found in analysis (Run keys, services, scheduled tasks)
- Recovery — remove malware components at the paths identified in analysis; verify removal with YARA scan; restore from backup where data was destroyed
- Lessons learned — detection gaps identified in the ATT&CK coverage assessment become the next sprint's detection engineering work
Malware Analyst Career Path
| Role | Focus | Key Skills | Certification |
|---|---|---|---|
| Malware Analyst (Tier 1) | Triage, IOC extraction, sandbox analysis, YARA rules | Book 1 content — static, dynamic, Volatility, YARA | BTL1, GCFE |
| Malware Analyst (Tier 2) | Reverse engineering, unpacking, anti-analysis, family deep dives | Book 1 + Book 2 — assembly, Ghidra, IDA, debuggers | GREM, FOR610 |
| Detection Engineer | Turning analysis into Sigma/YARA/Suricata rules, coverage assessment | ATT&CK, Sigma, SIEM query languages, threat intel integration | GCDA, vendor certs |
| Threat Intelligence Analyst | Clustering, attribution, campaign tracking, IOC sharing | MISP, STIX, pivoting, geopolitical context, reporting | GCTI, CTI certifications |
Practice Resources
- MalwareBazaar (abuse.ch) — free, searchable database of malware samples. Filter by tag, file type, or family. The primary free source for real samples.
- any.run (free tier) — interactive sandbox — you control the mouse and keyboard inside the analysis environment. Excellent for samples requiring user interaction.
- theZoo (GitHub) — curated repository of live malware samples for educational use. Handle with care — these are real, functional malware binaries.
- Malware Traffic Analysis (malware-traffic-analysis.net) — exercises in the form of PCAP files and questions. Excellent for network analysis and IOC extraction practice.
- FLARE-VM malware analysis challenges — Mandiant's periodic challenge samples released publicly after competitions.
- Blue Team Labs Online / CyberDefenders — structured labs that include malware analysis scenarios with guided questions and scoring.
- Ransomware's Book 1 signature is mass file rename events + vssadmin shadow copy deletion + a single large network POST on first execution — all visible without assembly knowledge
- Stealer behaviour is visible in ProcMon as file reads to browser profile directories — the Chrome
Login DataSQLite file and Firefoxlogins.jsonaccessed by an unknown process is an immediate containment trigger - The analysis-to-containment loop converts findings directly into IR actions — C2 domains to firewall blocks, persistence paths to registry cleanup, ATT&CK coverage gaps to detection rules
- GREM (GIAC Reverse Engineering Malware) is the senior certification for the discipline — FOR610 is the corresponding SANS course; both assume Book 1 skills and teach Book 2 content
- MalwareBazaar provides free real samples for practice — always analyse in an isolated VM, never on a production machine or connected to a production network
Malware Analysis Book 2: Reverse Engineering & Advanced Techniques
Book 1 covered the practitioner skills — triage, behavioural analysis, memory forensics, and detection engineering. Book 2 goes to the code level: reading x86/x64 assembly, reverse engineering with Ghidra and IDA Pro, unpacking custom packers, defeating anti-analysis techniques, and performing full reverse engineering of ransomware, RATs, and rootkits.
Assembly fundamentals · Ghidra workflow · IDA Pro · x64dbg dynamic debugging
Custom unpacking · Deobfuscation · API hashing · Anti-debugging · Anti-VM
Shellcode analysis · Rootkit internals · Ransomware RE · C2 protocol reversal