Master the full OSINT methodology — from passive DNS enumeration and Google dorking through breach data correlation, Shodan infrastructure mapping, job posting analysis, and the defensive hardening strategies that limit what attackers can discover before touching a single system.
OSINT — Open Source Intelligence
Open Source Intelligence is the collection and analysis of information from publicly available sources. In penetration testing, OSINT is the first phase of every engagement — gathering maximum information about the target without making a single connection to their systems. The goal is to build a complete picture of the attack surface before active scanning begins, so that scanning and exploitation can be directed with precision rather than noise.
Effective OSINT can reveal employee names and emails (phishing targets), technology stack (exploitable software versions), infrastructure (IP ranges and hosting providers), vendor relationships (supply chain pivot paths), and sometimes credentials from past breaches — all without the target knowing you are looking. For the CEH exam, OSINT sits squarely within Module 02: Footprinting and Reconnaissance, and distinguishing between passive and active reconnaissance is a direct exam objective.
Why OSINT Comes First — Every Time
The reconnaissance phase determines the quality of everything that follows. An attacker who skips OSINT and jumps straight to scanning will generate more noise, miss more attack vectors, and look less credible in a professional engagement than one who arrives knowing the organisation's internal tool names, their key personnel, and which of their subdomains is still running a three-year-old CMS.
OSINT-first is not just a methodology preference — it is operationally significant. Network intrusion detection systems log active scans. Firewalls record connection attempts. But no alarm fires when someone reads a company's LinkedIn page, searches their GitHub repositories, or queries a third-party breach database. The asymmetry is striking: an attacker can spend days building a detailed picture of a target's infrastructure while the target has no awareness whatsoever that reconnaissance is underway.
Before a professional burglar cases a target building, they do not walk up and rattle the front door handle. They observe from a distance. They read the building's planning permission records at the council office. They check the company website for office hours. They watch which delivery services come and go, noting the entry points used. They read reviews left by employees on job sites to understand the internal layout and security culture. None of this requires them to set foot on the property — and none of it triggers an alarm. OSINT reconnaissance is exactly this systematic pre-approach observation, conducted entirely through publicly accessible records that the target has no ability to monitor or restrict.
The OSINT Methodology
Effective OSINT follows a structured expansion: start with the single seed entity you know (usually a domain name or company name) and systematically branch outward through every data source that accepts that entity as input. Each discovery becomes a new seed for the next query. The graph of connections grows until you reach a point of diminishing returns — typically when new queries produce only data you've already seen from other paths.
Domain → DNS records, WHOIS, subdomains, cert logs, ASN/IP ranges Employees → LinkedIn, job postings, email pattern, GitHub profiles Tech Stack → Job ads, Wappalyzer, Shodan banners, cert SAN entries Breaches → HaveIBeenPwned, DeHashed, paste sites, dark web monitors Code → GitHub/GitLab public repos, secrets in commits, internal hostnames Shodan → Internet-facing infrastructure, open ports, software versions Documents → Google-indexed PDFs, XLSX, DOCX with metadata Social → Twitter/X, Instagram, press releases, conference talks
The CEH Footprinting Categories
The CEH organises footprinting techniques into specific categories. Each maps to a distinct set of tools and data sources, and the exam tests which category a given technique belongs to:
- Website footprinting: Crawling public web content, analysing page source for comments and metadata, checking robots.txt and sitemap.xml for hidden paths, extracting document metadata with ExifTool.
- DNS footprinting: MX, NS, A, AAAA, TXT, and CNAME record enumeration; zone transfer testing (AXFR); reverse DNS lookup of IP ranges; passive DNS history via SecurityTrails or DNSDB.
- Network footprinting: WHOIS for registrant data; BGP/ASN lookup for IP ranges; traceroute for network path mapping; geolocation of IP addresses.
- Email footprinting: MX record analysis; SPF/DMARC policy review; email header analysis from sample messages; email harvesting from public sources.
- Competitive intelligence: Job postings revealing technology choices and team structure; press releases announcing acquisitions and partnerships; financial filings disclosing infrastructure and vendors.
OSINT Techniques in Practice
Google advanced operators narrow searches to find sensitive files, login pages, and exposed data indexed by search engines. These queries work because Google has already crawled and indexed the content — the attacker is simply filtering its results with precision.
# Find login pages site:example-corp.com inurl:login # Find exposed documents site:example-corp.com filetype:pdf OR filetype:xlsx # Find config files accidentally indexed site:example-corp.com ext:env OR ext:config OR ext:sql # Find employee info site:linkedin.com "example corp" "security engineer"
Subdomains reveal internal systems, staging environments, APIs, and admin panels that may be less secured than the main site. Combining multiple passive sources produces more complete results than any single tool.
subfinder -d example-corp.com -silent mail.example-corp.com staging.example-corp.com api.example-corp.com jira.example-corp.com vpn.example-corp.com amass enum -passive -d example-corp.com dev-internal.example-corp.com ← internal system exposed
Developers often accidentally commit API keys, passwords, and private keys to public repositories. Even if the secret is deleted later, it remains in git history — permanently accessible to anyone who clones the repo.
truffleHog github --org=example-corp Reason: High Entropy String Path: config/database.yml Branch: main (commit: a3f92b1) password: "Pr0ductionDB!2023" # Search GitHub manually: site:github.com "example-corp" "API_KEY"
If employees use corporate email for external services that were breached, those credentials may still work — especially if passwords are reused across personal and corporate accounts.
haveibeenpwned.com → check domain example-corp.com Found in 3 breaches: - LinkedIn 2012: 47 accounts - Adobe 2013: 12 accounts - Collection #1 2019: 8 accounts # Credential stuffing risk: # If these employees reuse passwords → VPN / O365 at risk
What You Need to Know
The Full OSINT Source Landscape
A professional OSINT investigation draws from dozens of specialised data sources simultaneously. Understanding what each source reveals — and crucially what it does not — prevents both gaps in coverage and wasted time querying sources that won't yield useful data for the current target type.
Shodan — The Search Engine for the Internet of Things
Shodan deserves special attention because it is the most operationally significant OSINT source for infrastructure mapping. Unlike Google, which indexes web page content, Shodan indexes the responses that internet-connected devices give to direct connection probes — port banners, TLS certificates, and protocol handshakes. Every device that responds to a connection on any port gets catalogued.
From a defender's perspective, what Shodan reveals about your organisation is exactly what an attacker sees before they have done anything active. A Shodan search for your company's ASN or IP ranges produces a real-time inventory of every publicly reachable port on every device you operate — including devices your IT team may not know are public-facing.
Query Shodan for an organisation's infrastructure using their ASN, IP range, or organisation name. The results reveal open ports, software versions, and devices the organisation may not realise are internet-facing.
# Search by organisation name: org:"Example Corp Technologies" Results: 47 hosts Ports found: 22, 80, 443, 3389, 8080, 8443, 9200 9200/tcp open — Elasticsearch (no auth required!) # Search for specific vulnerable software across an IP range: net:104.21.44.0/22 product:"Apache httpd" version:"2.4.49" # Apache 2.4.49 is vulnerable to CVE-2021-41773 (path traversal / RCE) # Shodan finds every host in the range running this exact version # Find exposed RDP (common misconfiguration): org:"Example Corp" port:3389 3 hosts with RDP exposed directly to internet — high risk
Job Posting Intelligence — Technology Stack Disclosure
Job postings are one of the most underutilised OSINT sources, yet they reliably reveal more about an organisation's internal technology stack than almost any other public source. When a company posts a role for a "Senior DevOps Engineer — experience with Terraform, AWS EKS, Datadog, and HashiCorp Vault required," they have just disclosed their cloud provider, container orchestration platform, monitoring tool, and secrets management system to every attacker who searches for it.
This disclosure is not accidental — organisations need to attract qualified candidates. But it creates a detailed map of the attack surface that maps directly to CVE databases. An attacker who identifies the specific versions in use (often disclosed in more detailed job requirements or in conference talks by the organisation's engineers) can pre-research relevant exploits before touching the target at all.
Systematically mine job postings for technology disclosures. A single senior engineering role posting can reveal the entire infrastructure stack.
# Search for target's engineering job postings: site:linkedin.com OR site:indeed.com "example corp" "senior engineer" # Extracted technology intelligence from a single posting: Cloud: AWS (EC2, RDS, S3, CloudFront) Container: Docker, Kubernetes (EKS) IaC: Terraform, Ansible Monitoring: Datadog, PagerDuty Auth: Okta SSO, HashiCorp Vault DB: PostgreSQL 14, Redis 7 WAF: AWS WAF + Cloudflare # Cross-reference with CVE databases: # AWS WAF bypass techniques, known Cloudflare misconfigs, # PostgreSQL 14.x vulnerabilities, Redis auth bypass patterns
DNS and WHOIS — The Infrastructure Map
DNS is the telephone directory of the internet — it translates human-readable names into IP addresses. Because DNS is a public lookup system by design, it exposes a significant amount of an organisation's infrastructure to anyone willing to query it. The CEH tests knowledge of specific DNS record types and what each reveals about a target.
- A / AAAA records: Map hostnames to IPv4/IPv6 addresses. The primary address mapping — reveals hosting IP, which can be correlated with Shodan for port data.
- MX records: Mail exchange servers. Reveals the email provider (Google Workspace, Microsoft 365, or self-hosted), which informs phishing infrastructure choices.
- NS records: Name servers authoritative for the domain. Reveals the DNS provider — sometimes also the hosting provider. Multiple NS records from the same provider can link related domains.
- TXT records: Arbitrary text — but always contains SPF policy, DMARC policy, often DKIM selectors. A missing or permissive SPF record means the domain can be spoofed for phishing.
- CNAME records: Aliases — one hostname points to another. Dangling CNAMEs (pointing to deprovisioned third-party services) are targets for subdomain takeover attacks.
- SOA records: Start of Authority — contains the primary nameserver and admin email. Often reveals internal naming conventions and technical contact details.
Systematic DNS enumeration extracts every record type from a target domain, building a complete picture of their mail infrastructure, CDN usage, and any dangling records that might be takeover candidates.
# Enumerate all common record types: dig example-corp.com ANY +noall +answer example-corp.com. A 104.21.45.67 (behind Cloudflare) example-corp.com. MX mail.example-corp.com (self-hosted SMTP) example-corp.com. TXT "v=spf1 include:_spf.google.com ~all" example-corp.com. TXT "v=DMARC1; p=none; rua=..." ↑ p=none = no enforcement, domain can be spoofed! # Check for zone transfer (misconfiguration — reveals ALL records): dig axfr example-corp.com @ns1.example-corp.com ; Transfer failed. (Correctly configured — zone transfers disabled) # Passive DNS history — find origin IP behind CDN: curl "https://api.securitytrails.com/v1/history/example-corp.com/dns/a" Historical A record: 203.0.113.45 (pre-Cloudflare, direct server IP) # Direct IP bypasses CDN-level WAF — direct attack surface now known
p=none means the domain has monitoring but no enforcement — phishing emails spoofing the domain will not be blocked. p=quarantine routes them to spam. p=reject blocks them entirely. Any organisation with p=none or no DMARC record at all is trivially spoofable for phishing attacks targeting their customers and employees.The Full OSINT-to-Access Chain
Individual OSINT findings have limited value in isolation. The real power of OSINT is in combining discoveries across sources to construct a chain — where each piece of information enables the next, and the chain ends in actionable access or a credible attack path.
Step 1 — Domain enumeration: Starting from the company name, WHOIS reveals the registered domain and the registrant email. DNS enumeration surfaces vpn.example-corp.com from certificate transparency logs.
Step 2 — Technology identification: The VPN subdomain's TLS certificate and Shodan banner identify the VPN product as Cisco AnyConnect 4.9.x. A job posting confirms "Cisco AnyConnect administration experience required."
Step 3 — Employee enumeration: LinkedIn enumeration produces 47 employees at the company. Email pattern is confirmed as [email protected] from a press release signatory. A full employee list is constructed from LinkedIn names.
Step 4 — Breach correlation: HaveIBeenPwned shows the domain appeared in the 2016 LinkedIn breach. DeHashed returns 12 matching email/hash pairs. Three hashes crack to weak passwords — Welcome2016!, Summer16!, Corp2016!.
Step 5 — Access: The cracked passwords are tested against the VPN portal. One account — a senior network engineer still employed — reused their 2016 LinkedIn password. VPN access granted. No exploit was used. No scan was performed. The entire chain was built from public data.
Document Metadata — The Overlooked Intelligence Source
When organisations publish documents — PDFs, Word files, spreadsheets, presentations — those files carry embedded metadata that was generated automatically by the software that created them. This metadata is invisible to a casual reader but trivially extractable with free tools. It frequently reveals internal usernames, file paths exposing server naming conventions, software versions, and GPS coordinates from photos embedded in documents.
Every document an organisation publishes publicly is a potential intelligence source. ExifTool extracts all embedded metadata in seconds, revealing information the document authors never intended to share.
# Download a publicly indexed PDF from the target and extract metadata: exiftool annual_report_2024.pdf Creator : Microsoft Word 2019 Author : j.harrington Last Modified : 2024-03-14 09:22:11 Company : Example Corp Technologies Inc. Template : \\FILESERVER01\templates\corp_template.dotx ↑ Internal file server hostname revealed! Software : Adobe Acrobat 23.6.20320.6 # From a single PDF we now know: # - An employee username (j.harrington) → likely email: [email protected] # - An internal file server hostname (FILESERVER01) # - The internal UNC path format used for templates # - Exact software versions for CVE lookup
Reducing Your OSINT Footprint
The most important defensive insight from OSINT methodology is that the attacker's information gathering happens entirely outside your visibility. You cannot detect it with a firewall, an IDS, or endpoint monitoring. The only defence is reducing the volume and quality of information that is publicly available in the first place — and accepting that some exposure is inevitable, then building resilience to compensate.
WHOIS privacy: Enable domain privacy protection to prevent registrant details appearing in public WHOIS records.
Document metadata: Strip metadata before publishing any document. Adobe Acrobat, LibreOffice, and dedicated tools (ExifTool, mat2) can remove all embedded metadata before release.
DMARC enforcement: Move from p=none to p=reject to prevent domain spoofing in phishing attacks targeting your customers and staff.
GitHub hygiene: Enforce pre-commit hooks scanning for secrets. Audit all public repos. Remove sensitive data from git history using git-filter-repo.
Your own CT logs: Subscribe to certificate transparency monitoring (Certspotter, Facebook CT Monitor) to be alerted when new certs are issued for your domains — including ones you didn't create.
Breach databases: Monitor HaveIBeenPwned and similar services for your domain. Alert immediately when employee emails appear in new breach data.
Shodan alerts: Set up Shodan monitors for your IP ranges. Receive alerts when new ports open or software versions change — often the first sign of a misconfiguration.
Paste sites: Monitor Pastebin and similar sites for your domain name, IP ranges, and employee usernames appearing in credential dumps.
Core Concepts Summary
You've covered the theory. Now apply it hands-on in the simulated environment.
Start Lab — OSINT→← Return to all labs