Note: AWS, Microsoft Azure, and Google Cloud Platform are trademarks of their respective owners. This module is independent educational content and is not affiliated with, endorsed by, or sponsored by any cloud provider. All product names are used for educational identification purposes only.
Lucent Grid Learning  ·  Cloud Security

Cloud
Security

A complete cloud security practitioner's guide — from shared responsibility and IAM fundamentals through platform-specific deep dives for AWS, Azure, and GCP, container security, DevSecOps, cloud IR, and enterprise governance at scale. Fifteen chapters. Vendor-neutral foundations with platform-specific application.

15 chapters
~4 hrs reading
AWS · Azure · GCP covered
Vendor-neutral foundations
📍
Continue where you left off
Chapter 01 · ~12 min · Foundations

Cloud Security Fundamentals

How cloud security differs from on-premises, service models and their security implications, deployment models, and why most failures are misconfigurations

Vendor-Neutral

Cloud computing has fundamentally changed the security landscape — not by making it harder, (and certainly not by making it easier) but by making it different. The mental models, threat vectors, and defensive techniques that work in a traditional data centre translate imperfectly to cloud environments, and practitioners who apply on-premises thinking to cloud deployments reliably produce insecure architectures. This chapter establishes the foundational concepts that make cloud security distinct before we build the technical depth that fills the rest of this module.

Core Shift

On-premises security is perimeter-centric — the network boundary separates trusted from untrusted, and controls concentrate at that boundary. Cloud security is identity and data-centric — the perimeter dissolves when resources exist in a provider's data centre, accessed over the internet, by users from anywhere. Identity becomes the new perimeter; data becomes the new boundary.

Cloud Service Models and Their Security Implications

The three cloud service models — IaaS, PaaS, and SaaS — each carry a distinct security responsibility allocation that practitioners must understand before they can make correct security decisions.

ModelProvider ManagesCustomer ManagesSecurity Focus
IaaS (Infrastructure as a Service)Physical hardware, network fabric, hypervisor, storage infrastructureOS, runtime, middleware, application, data, identity, network controls (security groups)Everything at and above the OS: patching, hardening, IAM, network segmentation, encryption
PaaS (Platform as a Service)Physical hardware through runtime/middleware layerApplication code, data, identity, access configurationApplication security: secure coding, secrets management, access control, data classification
SaaS (Software as a Service)Physical hardware through application layerUser access management, data entered into the application, configuration of the application's security settingsIdentity, access governance, data classification, DLP, SSO integration, configuration review

The practical implication: an organisation running virtual machines in AWS (IaaS) is responsible for patching the OS on those VMs — AWS does not do this. An organisation using AWS RDS (PaaS — managed database) does not patch the database engine — AWS does. An organisation using Salesforce (SaaS) is responsible for configuring Salesforce's security settings and managing which users have access to which records, but not for patching the Salesforce application. Understanding the service model for every cloud resource determines who is responsible for what security control.

Deployment Models

  • Public cloud — resources run on infrastructure shared with other customers (though logically isolated). AWS, Azure, and GCP are public clouds. Shared infrastructure is a common concern but the isolation mechanisms (hypervisor isolation, virtualised networking) are robust — cloud provider infrastructure breaches are rare compared to customer misconfiguration breaches.
  • Private cloud — infrastructure dedicated to a single organisation, either on-premises or hosted exclusively for them. Higher control, higher cost, lower scalability. Common in highly regulated sectors (defence, certain financial services).
  • Hybrid cloud — a mix of on-premises infrastructure and public cloud, connected through private connectivity (VPN, dedicated circuits like AWS Direct Connect or Azure ExpressRoute). Most enterprise organisations operate hybrid environments during and after cloud migration.
  • Multi-cloud — using two or more public cloud providers simultaneously. Motivated by resilience, vendor risk diversification, or best-of-breed service selection. Creates additional complexity: separate IAM systems, separate logging pipelines, separate compliance verification.

Why Cloud Security Failures Are Almost Always Misconfigurations

Major cloud providers invest enormous resources in their own security. AWS has thousands of security engineers. Azure's physical data centres have multi-layer physical security. The hypervisors, network fabric, and physical infrastructure of major providers are genuinely well-secured. When cloud security incidents occur, the root cause is almost never "the cloud provider was breached." It is almost always a customer-side misconfiguration:

  • An S3 bucket with public access enabled containing sensitive data
  • An EC2 instance with SSH exposed to 0.0.0.0/0 with a weak password
  • An IAM role with AdministratorAccess attached to a Lambda function that processes web requests
  • A storage account in Azure with access keys embedded in application source code committed to a public GitHub repository
  • A service account in GCP with owner-level permissions used by an application

Verizon's Data Breach Investigations Report, IBM's Cost of a Data Breach, and the cloud providers' own security incident analyses consistently show misconfiguration as the dominant cloud breach vector. This framing is critical: cloud security is primarily a configuration and governance problem, not a technology problem. The tools to prevent all of the above misconfigurations exist in every major cloud platform. The challenge is deploying them consistently, across all accounts and resources, in an environment that changes continuously.

The Economics of Cloud Security

Cloud security has a counterintuitive economic dimension. Many security controls that are prohibitively expensive on-premises are commodity services in the cloud. Hardware security modules (HSMs) for key management cost tens of thousands of dollars to deploy on-premises; AWS KMS, Azure Key Vault, and GCP Cloud KMS provide HSM-backed key management for cents per key per month. DDoS mitigation appliances cost hundreds of thousands of dollars; AWS Shield Standard is included at no charge for all customers. This economic shift means that small organisations can access security capabilities previously available only to enterprises — but only if they configure and use them.

Key Takeaways — Chapter 1
  • Cloud security is identity and data-centric, not perimeter-centric — the mental model shift is the most important foundational concept
  • The service model (IaaS/PaaS/SaaS) determines the security responsibility allocation — understanding this for every resource is a prerequisite to correct security decisions
  • Cloud security failures are almost always customer-side misconfigurations, not provider breaches — configuration governance is the primary security challenge
  • Multi-cloud adds operational complexity (separate IAM, logging, compliance) that must be factored into architecture decisions
  • Cloud provides access to enterprise-grade security controls (KMS, DDoS protection, WAF) at commodity prices — the barrier is adoption, not cost
Chapter 02 · ~13 min · Foundations

The Shared Responsibility Model

Security of the cloud vs in the cloud, where organisations consistently get it wrong, how responsibility shifts by service type, and customer-managed keys

Vendor-NeutralAWSAzureGCP

The shared responsibility model is the contractual and conceptual framework that defines what security responsibilities the cloud provider accepts and what remains with the customer. Every major cloud provider publishes a version of this model. The principle is simple — the implementation is where practitioners consistently make mistakes that lead to real breaches.

The Core Framing

AWS phrases it as: the provider is responsible for security of the cloud (the physical infrastructure, the hardware, the virtualisation layer, the managed service software) and the customer is responsible for security in the cloud (everything they build, configure, and deploy on top of the provider's infrastructure). Azure and GCP use equivalent framings.

The Model by Service Type

Shared Responsibility — IaaS Example (EC2 / Azure VM / Compute Engine)
Application
Customer
Data
Customer
Runtime
Customer
OS
Customer
Virtualisation
Provider
Physical Network
Provider
Hardware
Provider
Data Centre
Provider

For IaaS, the customer owns almost everything above the hardware. For PaaS (like RDS, Azure App Service, Cloud SQL), the provider takes over the OS, runtime, and middleware layers. For SaaS, the provider manages everything except the data and access configuration.

Where Organisations Consistently Get It Wrong

Network Security Groups Are Not Automatic

The cloud provider manages the physical network fabric — routing, hardware, global backbone. The customer manages the logical network controls: security groups, NACLs, VPC configurations, firewall rules. A newly launched EC2 instance with a public IP address and no security group restrictions is exposed to the entire internet. This is the customer's responsibility to configure correctly, not the provider's.

Encryption Is Not Automatic

Cloud storage encryption-at-rest defaults have improved in recent years — S3 now encrypts objects by default, Azure Storage encrypts by default, GCP encrypts by default. However: EBS volumes (EC2 attached disks) are not encrypted by default unless you configure the account-level encryption default. RDS database instances do not enable encryption by default. Many services still require explicit configuration. Assume nothing is encrypted until you verify it.

S3 Buckets and Storage Containers

AWS S3 Block Public Access was not enforced by default until April 2023. Before that, any S3 bucket could be made public through a misconfigured bucket policy or ACL. Thousands of breaches have occurred through public S3 buckets containing sensitive data. AWS now enforces Block Public Access at the account level by default — but buckets created in older accounts before this default was set, or in accounts where a legacy setting was changed, may still be public. Verify explicitly.

IAM Defaults Are Permissive for Usability

Cloud IAM defaults prioritise usability over security. A new IAM user with the AdministratorAccess managed policy — attached with a single click — has full access to every service in the account. New developers are often given admin access "temporarily" for convenience and it is never removed. Service roles are granted broader permissions than needed because least privilege is harder to configure. The provider has given you the tools to implement least privilege; applying them is entirely the customer's responsibility.

Customer-Managed Keys (CMK) — Taking Back Control

Provider-managed encryption (the default) means the provider controls the encryption keys. If a provider employee with malicious intent (or under legal compulsion) wanted to decrypt your data, they could. Customer-managed keys change this: you create and control the encryption key in KMS/Key Vault/Cloud KMS, and encrypt your data with it. The provider's storage layer holds only ciphertext — without your key, they cannot decrypt it.

CMKs also enable key revocation — deleting or disabling your CMK renders all data encrypted with it inaccessible. This is a powerful control for data retention compliance: when a cloud account is closed or data must be destroyed, CMK deletion provides cryptographic assurance of data destruction without requiring every byte to be overwritten.

CMK Caution

Customer-managed keys transfer the risk of key management to the customer. A CMK that is accidentally deleted renders the data it encrypted permanently inaccessible. AWS KMS has a mandatory 7–30 day deletion window and enables key deletion protection; Azure Key Vault has soft-delete and purge protection; GCP Cloud KMS has key version destruction with a 24-hour minimum grace period. Enable all available protections before relying on CMKs for production data.

Key Takeaways — Chapter 2
  • The provider secures the infrastructure; the customer secures everything they deploy on it — this boundary is non-negotiable and cannot be contracted away
  • Network controls, encryption configuration, IAM policies, and storage access settings are all customer responsibilities in IaaS environments
  • S3 Block Public Access, EBS encryption defaults, and security group egress rules are all configurations that must be explicitly set — they are not provider defaults in all accounts
  • Customer-managed keys provide cryptographic control over data access and enable provable data destruction — at the cost of taking on key lifecycle management
  • The shared responsibility model means the provider's security certifications (ISO 27001, FedRAMP, SOC 2) certify the provider's controls, not the customer's
Chapter 03 · ~16 min · Identity

Identity and Access Management in the Cloud

IAM as the cloud perimeter, least privilege at scale, roles and policies, service-to-service auth, privilege escalation, permission boundaries, and credential lifecycle

Vendor-NeutralAWSAzureGCP

Identity is the cloud perimeter. In a traditional on-premises environment, a firewall controls what can reach a resource — you need network access before you need credentials. In cloud environments, resources are often accessible from anywhere on the internet; the only control between an attacker and your S3 bucket, your Azure VM, or your GCP database is whether they have valid credentials and the right permissions. Getting IAM right is not a secondary concern — it is the primary security control.

IAM Concepts Across Providers

ConceptAWSAzureGCP
Human identityIAM UserAzure AD / Entra ID UserGoogle Account / Cloud Identity User
Group of identitiesIAM GroupAzure AD GroupGoogle Group
Assumable permission setIAM RoleAzure RBAC RoleIAM Role
Machine identityIAM Role (instance profile)Managed IdentityService Account
Permission documentIAM Policy (JSON)Role Definition (JSON)IAM Policy (YAML/JSON)
Federated identityOIDC / SAML providerAzure AD B2B/B2C, SAMLWorkload Identity Federation
Cross-account accessCross-account IAM RoleCross-tenant B2BCross-project service accounts

Least Privilege at Cloud Scale

Least privilege — granting only the permissions required to perform a defined task — is harder to implement in cloud environments than the principle suggests. Cloud services have hundreds of individual API actions. A developer who needs to deploy a Lambda function needs permissions across Lambda, IAM (to pass roles), S3 (for deployment packages), CloudWatch Logs, and potentially VPC. Writing a minimal policy for this requires enumerating the specific actions needed, which requires understanding the service deeply.

Practical approaches to least privilege at scale:

  • Start with AWS managed policies or provider built-in roles — they are not perfectly minimal but are vetted and maintained. A custom minimal policy that is wrong is more dangerous than a slightly over-permissive managed policy.
  • Use IAM Access Analyser (AWS) / Azure AD Access Reviews / GCP IAM Recommender — these tools analyse actual usage and suggest permission reductions. IAM Recommender in GCP will tell you "this service account has X permissions but has only used Y in the last 90 days."
  • Permission boundaries (AWS) — set a maximum permission ceiling that no policy can exceed for an identity, regardless of what other policies are attached. Useful for delegating IAM administration to teams without allowing them to create identities with more permissions than they themselves have.
  • Conditions in policies — restrict permissions by source IP, time of day, MFA requirement, resource tag, or region. A policy that only allows S3 access from within a VPC endpoint adds meaningful least privilege without reducing functional permissions.

Service-to-Service Authentication

Application code running in the cloud must authenticate to other cloud services (databases, object storage, secrets managers, other APIs). The wrong way to do this is to create a long-lived IAM user, generate access keys, and store those keys as environment variables or in application configuration files. Long-lived static credentials are a persistent security liability: they do not expire, they are frequently committed to source control, and they provide persistent access if compromised.

The right way is to use service identities — cloud-native mechanisms that provide temporary credentials automatically and transparently:

Service Identity TypeAWSAzureGCP
Compute instance identityEC2 Instance Profile (IAM Role attached to EC2)Managed Identity (System or User assigned)Service Account attached to Compute Engine VM
Serverless identityLambda Execution RoleManaged Identity on Azure FunctionsService Account on Cloud Functions
Container identityEKS Pod Identity / IRSAAKS Workload IdentityGKE Workload Identity
How credentials are accessedEC2 Metadata Service (IMDS) endpoint; SDK handles automaticallyIMDS endpoint; SDK handles automaticallyMetadata server; SDK handles automatically

IAM Privilege Escalation

IAM privilege escalation is the most common high-impact cloud attack pattern. An attacker with limited IAM permissions exploits policy misconfigurations to gain broader access. Common escalation paths:

  • iam:CreatePolicyVersion — a user with this permission can create a new version of an IAM policy and set it as default, granting themselves any permissions they want
  • iam:PassRole + ec2:RunInstances — create a new EC2 instance with a highly privileged IAM role, then access the instance to retrieve the role's credentials via IMDS
  • iam:CreateAccessKey — create access keys for another IAM user, gaining persistent access as that user
  • sts:AssumeRole — assume a role the attacker's current identity is not supposed to be able to assume, if the role's trust policy is misconfigured

Detection: CloudTrail events for CreatePolicyVersion, RunInstances with unexpected IAM roles, CreateAccessKey for users other than the calling user, and AssumeRole events should all be monitored and alerted on for any identity that has not performed those actions before.

Credential Lifecycle Management

Long-lived IAM user access keys are one of the most reliably exploited cloud security weaknesses. Key hygiene requirements:

  • Rotate access keys every 90 days maximum — ideally eliminate long-lived keys entirely in favour of federated access and role assumption
  • Detect and alert on access keys not used in 90+ days — these are almost certainly forgotten and should be deactivated
  • Never create IAM users for applications — use instance profiles, managed identities, or service accounts instead
  • Enable MFA for all IAM users, mandatory for privileged access
  • Use AWS SSO / Azure AD SSO / GCP Cloud Identity as the identity foundation — federated access means no long-lived cloud-specific credentials
Key Takeaways — Chapter 3
  • Identity is the cloud perimeter — IAM misconfiguration is the most common path to cloud resource compromise
  • Use service identities (instance profiles, managed identities, service accounts) for application authentication — never long-lived static access keys
  • IAM privilege escalation via iam:PassRole, iam:CreatePolicyVersion, and iam:CreateAccessKey are high-priority detection targets in CloudTrail
  • Permission boundaries and policy conditions enable least privilege at scale without writing per-task custom policies for everything
  • IAM Access Analyser, Azure AD Access Reviews, and GCP IAM Recommender automate unused-permission detection — use them regularly
Chapter 04 · ~14 min · Network

Network Security in the Cloud

VPCs and virtual networking, Security Groups vs NACLs, private subnets, bastion hosts vs Session Manager, WAF, DDoS, and microsegmentation

Vendor-NeutralAWSAzure

Cloud networking is not the same as on-premises networking — and this gap trips up security practitioners with data centre backgrounds more than almost any other concept. In a data centre, network security relies on physical infrastructure: firewalls, switches with VLAN configurations, dedicated security appliances. In the cloud, all of this is software-defined — and the mental model shift from hardware-enforced to policy-enforced networking is essential before cloud network security becomes intuitive.

VPCs and Virtual Networking

A Virtual Private Cloud (VPC in AWS; Virtual Network / VNet in Azure; VPC in GCP) is a logically isolated section of the cloud provider's network in which you launch resources. It provides the illusion of a private network within the shared infrastructure of the public cloud.

Key VPC components:

  • Subnets — subdivisions of the VPC's IP range, associated with a specific Availability Zone. Public subnets have a route to an Internet Gateway (resources in them can be reached from the internet). Private subnets do not — resources there are only reachable from within the VPC or through explicit private connectivity.
  • Route tables — control where traffic from a subnet is sent. A route table with a default route (0.0.0.0/0) pointing to an Internet Gateway makes a subnet public. A route table with a default route pointing to a NAT Gateway allows private subnet resources to initiate outbound internet traffic without being reachable inbound.
  • Internet Gateway — the component that enables internet communication for public subnets. Attaching an Internet Gateway and adding a route to it is what makes a subnet public.
  • NAT Gateway — allows resources in private subnets to initiate outbound internet connections (for software updates, API calls) without exposing them to inbound internet traffic.
Secure Three-Tier VPC Architecture
Internet
WAF + Load Balancer
Public Subnet
Security Group boundary
Application Tier
Private Subnet
NAT Gateway
Outbound only
Security Group boundary
Database Tier
Private Subnet · No internet route

Security Groups vs Network ACLs

AWS has two network filtering mechanisms that are often confused:

Security GroupsNetwork ACLs (NACLs)
Applies toIndividual resources (EC2 instances, RDS instances, load balancers)Entire subnets
Stateful/StatelessStateful — return traffic automatically allowedStateless — inbound and outbound rules evaluated independently
DefaultDeny all inbound; allow all outbound (default SG varies)Allow all inbound and outbound (default NACL)
Rule evaluationAll rules evaluated; most permissive winsRules evaluated in number order; first match wins
Primary usePrimary filtering mechanism — define exactly what can reach each resourceDefence in depth — coarse subnet-level filtering

In practice: security groups do the heavy lifting. NACLs provide a subnet-level backstop but are less granular and harder to manage at scale. The most common NACL use case is blocking a specific IP range at the subnet level as an emergency response action.

Secure Administrative Access — Bastion Hosts vs Session Manager

Traditional practice for accessing private EC2 instances was a bastion host (jump server) in the public subnet — an EC2 instance with port 22 (SSH) open to a narrow IP range, used as an intermediary to reach private instances. This pattern has significant drawbacks: the bastion must be patched, monitored, and managed; SSH keys must be distributed; and the open SSH port is a persistent attack surface.

AWS Systems Manager Session Manager eliminates the bastion entirely. It provides a browser-based or CLI shell to EC2 instances through the SSM agent and the AWS API — no open ports, no SSH keys, no public IP required on the instance. All session activity is logged to CloudTrail and optionally to S3 or CloudWatch Logs. Azure Bastion provides a similar browser-based RDP/SSH service without requiring a public IP on the target VM. GCP's Identity-Aware Proxy (IAP) provides equivalent functionality through TCP tunnelling.

Best Practice

Default to Session Manager / Azure Bastion / IAP for all administrative access. If legacy tooling requires SSH, use it over SSM TCP tunnelling (aws ssm start-session --target i-xxxx --document-name AWS-StartSSHSession) rather than opening port 22 to any IP range. Port 22 open to 0.0.0.0/0 in a security group is the most commonly exploited initial access vector in cloud environments.

WAF and DDoS Protection

Cloud WAF services (AWS WAF, Azure WAF on Application Gateway, Google Cloud Armor) provide application-layer filtering for HTTP/HTTPS traffic. Core capabilities: OWASP Top 10 rule sets (SQLi, XSS, path traversal), rate limiting, geographic blocking, IP reputation lists, and custom rules for application-specific logic. WAF is typically deployed in front of load balancers or CDN distributions.

DDoS protection at the network layer is included by default in all major providers (AWS Shield Standard, Azure DDoS Network Protection on VNets, Google Cloud Armor's Always-On DDoS Protection). Advanced DDoS protection (AWS Shield Advanced, Azure DDoS IP Protection) adds volumetric attack mitigation, cost protection during attacks, and access to the provider's DDoS response team.

Key Takeaways — Chapter 4
  • Public subnets route to an Internet Gateway; private subnets do not — the fundamental segmentation between internet-facing and internal tiers
  • Security Groups are stateful and resource-level (primary control); NACLs are stateless and subnet-level (defence in depth)
  • Session Manager / Azure Bastion / IAP eliminates the bastion host pattern and all associated open ports — default to these for all administrative access
  • Port 22 open to 0.0.0.0/0 is the most commonly exploited cloud initial access vector — eliminate it from all security groups
  • WAF provides OWASP rule sets and rate limiting at the application layer; cloud-native DDoS protection covers network-layer volumetric attacks at no charge
Chapter 05 · ~14 min · Data Security

Data Security and Encryption

Encryption at rest and in transit, KMS key hierarchy, customer-managed keys, secrets management, DLP, and data residency

Vendor-NeutralAWSAzureGCP

Data security in the cloud encompasses protecting data at every stage of its lifecycle — in transit, at rest, in use, and in the pipeline between services. Cloud environments simultaneously simplify some aspects of data security (encryption key management services, native DLP, built-in TLS) and complicate others (data spread across dozens of services and regions, unclear data flows, multi-tenant storage). This chapter covers the technical and governance controls that together constitute a cloud data security programme.

Encryption at Rest

Encryption at rest protects data stored in cloud services from unauthorised access to the underlying physical storage. All three major providers encrypt stored data by default using AES-256. The question is not whether data is encrypted but who controls the keys.

  • Provider-managed keys (SSE-S3 / Microsoft-managed keys / Google-managed keys) — provider generates and manages the encryption keys. No customer action required. Keys are managed transparently. Customer has no control over key material or key rotation beyond the provider's schedule.
  • KMS-managed keys (SSE-KMS / Customer-managed keys in Azure Key Vault / Cloud KMS keys) — customer creates and manages keys in the provider's key management service. Customer controls key policy (who can use the key), key rotation schedule, and can revoke access by disabling or deleting the key. Audit trail in CloudTrail/Key Vault logs shows every key use.
  • Customer-provided keys (SSE-C / BYOK) — customer provides the key material for every request. Provider does not store the key. Highest customer control; highest operational complexity. Suitable for extreme compliance requirements.

Key Management Services

FeatureAWS KMSAzure Key VaultCloud KMS
Key typesSymmetric (AES-256), Asymmetric (RSA, ECC), HMACSymmetric, Asymmetric (RSA, EC), Secrets, CertificatesSymmetric, Asymmetric (RSA, EC), MAC
HSM backingCloudHSM (dedicated); KMS (multi-tenant HSM)Standard (software); Premium (HSM-backed)Software-protected; HSM via Cloud HSM
Key policyKey policy (resource-based) + IAM policyAccess policies or Azure RBACIAM policy on key resource
Envelope encryptionSDK generates data key; KMS wraps itSDK generates data key; Key Vault wraps itTINK library handles envelope encryption
Automatic rotationAnnual (symmetric keys); configurableConfigurable rotation policyConfigurable rotation period
Deletion protection7–30 day pending deletion windowSoft-delete + purge protection24-hour minimum destroy scheduled state

Secrets Management

Secrets — API keys, database passwords, OAuth tokens, TLS private keys — are the credentials that applications use to authenticate to other services. The wrong way to manage secrets is storing them in environment variables, configuration files, or source code. All three approaches have led to significant breaches.

The GitHub Secret Scanning Problem

GitHub scans all public repository commits for known secret patterns (AWS access keys, Azure storage connection strings, GCP service account keys) and notifies providers when found. AWS automatically revokes access keys found in public repositories. Despite this, secrets in public repositories remain one of the most common initial access vectors for cloud account compromise. Even secrets committed and immediately removed remain in git history and in GitHub's index unless the repository is purged.

Correct secrets management uses a dedicated secrets store with audit logging, automatic rotation, and access control:

AWSAzureGCPThird-party
AWS Secrets Manager (automatic rotation, cross-account)Azure Key Vault SecretsSecret ManagerHashiCorp Vault (self-hosted or HCP), Doppler
AWS SSM Parameter Store (simpler, lower cost, no automatic rotation)App Configuration + Key Vault referencesCloud Secret Manager (built-in rotation)Infisical, Akeyless

Data Residency and Sovereignty

Cloud data residency — where data physically resides — is a compliance requirement for many organisations. GDPR restricts personal data transfers outside the EEA without adequate safeguards. Some national regulations require data to remain in-country (Russia's Federal Law 242-FZ, China's MLPS). Defence and government organisations may have data sovereignty requirements mandating in-country storage.

Cloud providers offer region selection — you choose the AWS region (e.g., eu-west-2 for London), Azure region (uksouth), or GCP region (europe-west2) where data is stored. However, data residency is not automatically enforced: replication settings, CDN configurations, support access, and service metadata may move data outside the selected region unless explicitly configured. AWS has Data Residency policies, Azure has data residency commitments per region, and GCP has organisation policies for resource location restrictions. These must be explicitly configured — they are not defaults.

Key Takeaways — Chapter 5
  • All major providers encrypt data at rest by default — the question is who controls the keys: provider-managed, KMS-managed, or customer-provided
  • KMS-managed keys provide audit trails (every key use logged), access revocation (disable the key), and configurable rotation — use them for sensitive data
  • Enable deletion protection on all production KMS keys — accidental deletion renders encrypted data permanently inaccessible
  • Secrets belong in a dedicated secrets manager (Secrets Manager, Key Vault, Secret Manager) — never in environment variables, config files, or source code
  • Data residency requires explicit configuration — provider region selection is necessary but not sufficient; replication, CDN, and support access settings must also be controlled
Chapter 06 · ~13 min · CSPM

Cloud Security Posture Management

Configuration drift as the primary failure mode, native CSPM tools, third-party platforms, policy as code, CIS Benchmarks, and drift detection

Vendor-NeutralAWSAzureGCP

Cloud Security Posture Management (CSPM) is the practice of continuously assessing cloud resource configurations against security baselines and alerting when resources drift into non-compliant states. It is the primary operational response to the fact that cloud misconfigurations — not provider vulnerabilities — are the dominant cloud security failure mode. CSPM answers the question: "Of all the resources in our cloud environment right now, which ones are configured in ways that create security risk?"

Why Configuration Drift Happens

Cloud environments change continuously — developers launch new resources, infrastructure-as-code is updated, teams experiment and forget to clean up. A security group that was correctly configured on Monday may have a new rule allowing 0.0.0.0/0 on Thursday because a developer needed to debug something and planned to remove it later. CSPM detects these changes in near-real-time, before they are exploited.

Native CSPM Tools

AWS — Security Hub + Config

AWS Config records the configuration of every AWS resource continuously. When a resource's configuration changes, Config records the before and after state. Config Rules evaluate configurations against security policies — either AWS-managed rules (hundreds available) or custom rules written in Lambda. Config Conformance Packs bundle related rules into a compliance framework (AWS Security Best Practices, CIS AWS Benchmark).

AWS Security Hub aggregates findings from Config, GuardDuty, Inspector, Macie, IAM Access Analyser, and third-party integrations. It scores each finding against security standards and provides a consolidated compliance dashboard. The AWS Foundational Security Best Practices standard is a good starting point for most organisations.

Azure — Defender for Cloud

Microsoft Defender for Cloud (formerly Azure Security Centre + Azure Defender) provides security posture management across Azure, AWS (via connector), and GCP (via connector). It evaluates resources against the Microsoft Cloud Security Benchmark (MCSB), CIS controls, and regulatory compliance frameworks. The Secure Score gives a single percentage measure of posture — each recommendation has a score impact, so you can prioritise remediations by score improvement.

GCP — Security Command Centre

Security Command Centre (SCC) provides asset inventory, vulnerability findings (Security Health Analytics), threat detection (Event Threat Detection), and compliance posture for GCP resources. SCC Premium includes continuous compliance monitoring against CIS Benchmark and PCI-DSS. Findings can be exported to Pub/Sub for SIEM integration.

Third-Party CSPM Platforms

Where native tools provide per-cloud posture management, third-party CSPM platforms provide unified visibility across all three major clouds in a single interface — essential for multi-cloud organisations:

  • Wiz — agentless; scans cloud environments through API access; builds a cloud security graph connecting identities, resources, data, and vulnerabilities; identifies attack paths that combine multiple risk factors. Market-leading platform as of 2024.
  • Orca Security — agentless SideScanning technology that reads cloud storage directly without agents; provides full-stack context including OS vulnerabilities, misconfigurations, and sensitive data exposure.
  • Prisma Cloud (Palo Alto) — comprehensive CNAPP (Cloud Native Application Protection Platform) covering CSPM, CWPP (workload protection), CIEM (entitlements), and code security in a single platform.
  • Lacework — behaviour-based cloud security; uses ML to baseline normal cloud activity and detect anomalies; strong in cloud workload protection and compliance.

Policy as Code

Policy as code defines security requirements in machine-readable form that can be automatically evaluated against cloud resources — in CI/CD pipelines before deployment, and in production continuously. Key tools:

  • AWS Config Rules — Lambda-based or managed rules that evaluate resource configurations
  • Azure Policy — JSON policy definitions that evaluate ARM resources; can deny deployments that violate policy at deploy time
  • GCP Organisation Policies — constraints applied at org/folder/project level that restrict what can be deployed
  • OPA (Open Policy Agent) / Rego — a vendor-neutral policy engine widely used to evaluate Terraform plans, Kubernetes admission, and API requests against security policies
  • Checkov / tfsec / Terrascan — static analysis tools that scan IaC (Terraform, CloudFormation, Bicep) for misconfigurations before deployment

The shift-left principle applies directly to CSPM: detecting a public S3 bucket in a Terraform plan during code review costs seconds to fix; detecting it in production after data has been exposed costs vastly more.

Key Takeaways — Chapter 6
  • Configuration drift is continuous in cloud environments — CSPM provides the continuous evaluation that point-in-time audits cannot
  • AWS Security Hub + Config, Defender for Cloud, and GCP SCC provide native CSPM at no additional licensing cost — enable them in every account
  • Third-party CSPM (Wiz, Orca, Prisma Cloud) adds multi-cloud unified visibility and attack path analysis not available in native tools
  • Policy as code (Azure Policy, OPA, Checkov) catches misconfigurations before deployment — shift-left prevents production exposure
  • CIS Benchmarks for AWS, Azure, and GCP provide specific, measurable security baselines — start with these before building custom policy sets
Chapter 07 · ~19 min · AWS

AWS Security Deep Dive

IAM policy evaluation, VPC security patterns, S3 security, CloudTrail, GuardDuty, Security Hub, Config, Secrets Manager, WAF, and Inspector

AWS

AWS is the world's largest cloud provider by market share, and its security service ecosystem is the most mature and comprehensive of the three major providers. This chapter covers the security services and architectural patterns that every AWS security practitioner needs to know — not as a comprehensive AWS reference (AWS publishes excellent documentation) but as the operational knowledge that separates a practitioner who can secure AWS environments from one who merely uses them.

IAM Policy Evaluation Logic

AWS evaluates IAM policies through a specific precedence order that practitioners must understand to debug permission issues and avoid privilege escalation paths:

  1. Explicit deny — if any policy attached to the request's identity or the target resource contains an explicit Deny for the requested action, the request is denied. Explicit denies always win, regardless of what any Allow says.
  2. Service Control Policies (SCPs) — if the account is in an AWS Organisation and an SCP restricts the action, the request is denied even if IAM allows it. SCPs are organisational guardrails that cap the maximum permissions anyone in an account can have.
  3. Permission boundaries — if the identity has a permission boundary, the requested action must be allowed by both the permission boundary and an identity-based policy.
  4. Resource-based policies — if a resource policy (S3 bucket policy, KMS key policy, SQS queue policy) has an explicit Allow for the requesting identity, the request is allowed even without an identity policy (for same-account requests).
  5. Identity-based policies — if no deny has been encountered and no resource policy allow applies, the identity-based policies are evaluated. An explicit Allow is required; the default is implicit deny.
Common IAM Debugging Pattern

When an IAM permission check fails unexpectedly: (1) Check for explicit denies in all attached policies and SCPs first — these override everything. (2) Check the KMS key policy if the error involves a KMS action — KMS key policies are resource-based and must explicitly allow the calling identity. (3) Check permission boundaries if the identity was created by an IAM administrator with delegation rights. (4) Use IAM Policy Simulator to evaluate exactly which policy statement is causing the deny.

S3 Security — The Full Picture

S3 has more security misconfigurations documented in public breach reports than any other AWS service. A complete S3 security posture requires:

  • Block Public Access — enable all four Block Public Access settings at the account level and on every bucket. This prevents bucket policies and ACLs from making objects publicly readable regardless of what they say.
  • Bucket policies over ACLs — S3 ACLs (Access Control Lists) are a legacy mechanism that AWS recommends disabling. Use bucket policies for access control. Disable ACLs with the BucketOwnerEnforced object ownership setting.
  • S3 server-side encryption — enable default encryption on every bucket. Use SSE-KMS with a customer-managed key for sensitive data — this provides audit logging of every decryption event in CloudTrail.
  • S3 Object Lock — enables WORM (Write Once, Read Many) storage in Compliance or Governance mode. Required for regulatory compliance in financial services, healthcare, and other regulated sectors. Also effective as a ransomware protection mechanism for critical backups.
  • S3 Access Logging — logs every request made to the bucket including requester IP, action, and response. Not enabled by default. Essential for investigation when a data exposure occurs.
  • VPC Endpoint policies — S3 Gateway Endpoints allow EC2 instances to access S3 without traversing the internet. Endpoint policies can restrict which buckets are accessible through the endpoint, preventing data exfiltration to attacker-controlled buckets.
AWS CLI — Enforce S3 Security Baseline
# Enable Block Public Access at account level
aws s3control put-public-access-block \
  --account-id 123456789012 \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,\
    BlockPublicPolicy=true,RestrictPublicBuckets=true

# Enable default SSE-KMS on a bucket
aws s3api put-bucket-encryption \
  --bucket my-sensitive-bucket \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "aws:kms",
        "KMSMasterKeyID": "arn:aws:kms:eu-west-2:123456789012:key/key-id"
      },
      "BucketKeyEnabled": true
    }]
  }'

CloudTrail — The Audit Foundation

CloudTrail logs every API call made to AWS — who called what service, from where, with what parameters, and what the response was. It is the single most important evidence source for AWS security investigations and the foundation of all compliance monitoring. Critical configuration requirements:

  • Multi-region trail — CloudTrail trails are regional by default. Create an organisation-wide, multi-region trail to ensure all regions (including regions you don't think you use — attackers spin up resources in unused regions to avoid detection) are logged.
  • Log integrity validation — enables SHA-256 signing of log files so you can detect if logs have been modified or deleted. Enable this on every trail.
  • S3 data events — by default, CloudTrail logs management events (API calls that create, modify, or delete resources) but not data events (GetObject, PutObject on S3 objects). Enable data event logging for sensitive S3 buckets — this is what reveals which objects were accessed during a breach.
  • CloudWatch Logs integration — forward CloudTrail to CloudWatch Logs to enable near-real-time alerting on specific API patterns (IAM policy changes, security group modifications, root account usage).
Stealth Finding — CloudTrail Disabled

GuardDuty generates a Stealth:IAMUser/CloudTrailLoggingDisabled finding when CloudTrail is disabled. This is a critical finding: an attacker who has gained sufficient IAM access will frequently disable CloudTrail as their first action to eliminate the audit trail. This finding must be treated as a potential active compromise, and an EventBridge rule on the StopLogging API call should alert immediately — before GuardDuty's detection latency, which can be minutes.

GuardDuty

GuardDuty is a managed threat detection service that analyses CloudTrail management events, VPC Flow Logs, DNS query logs, S3 data events, EKS audit logs, and RDS login activity for threat indicators. It requires no infrastructure deployment — enable it with a single API call and it begins generating findings within minutes.

GuardDuty findings are pre-mapped to ATT&CK techniques and pre-scored by severity. Key finding categories to prioritise:

  • UnauthorizedAccess:IAMUser/ConsoleLoginSuccess.B — console login from Tor exit node. Immediate investigation required.
  • PrivilegeEscalation:IAMUser/AdministrativePermissions — attacker granted themselves admin permissions. Critical.
  • Backdoor:EC2/C&CActivity.B — EC2 instance communicating with known C2 infrastructure. Active compromise indicator.
  • Stealth:IAMUser/CloudTrailLoggingDisabled — attacker covering tracks. Critical.
  • Exfiltration:S3/ObjectRead.Unusual — unusual S3 read volume. Investigate bucket contents and data classification.

Enable GuardDuty in every AWS account and every region. Use AWS Organisations to enable it centrally via a delegated administrator account — this prevents member accounts from disabling it.

Other Key AWS Security Services

  • AWS Inspector v2 — continuous vulnerability assessment for EC2 instances (OS and application vulnerabilities) and Lambda functions (package vulnerabilities). Integrates with Security Hub.
  • Amazon Macie — uses ML to discover and classify sensitive data (PII, PHI, credentials) in S3 buckets. Essential for data classification and GDPR compliance.
  • IAM Access Analyser — identifies resources that are accessible from outside your account or AWS Organisation (external access), and unused permissions in roles and users (unused access).
  • AWS Detective — investigative service that automatically correlates CloudTrail, VPC Flow Logs, and GuardDuty findings into interactive graphs for security investigation. Reduces time-to-investigate for GuardDuty findings.
  • AWS Secrets Manager — managed secrets storage with automatic rotation. Supports native rotation for RDS, Redshift, DocumentDB, and custom rotation via Lambda for other services.
Key Takeaways — Chapter 7
  • IAM policy evaluation: explicit deny wins → SCPs → permission boundaries → resource policies → identity policies — understanding this order is essential for debugging and privilege escalation prevention
  • S3 security requires Block Public Access + SSE-KMS + disabled ACLs + access logging + Object Lock for regulated data — none of these are defaults in legacy accounts
  • CloudTrail must be multi-region, organisation-wide, with log integrity validation and CloudWatch Logs integration — default configuration is insufficient
  • GuardDuty should be enabled in every account and region via Organisations — member accounts must not be able to disable it
  • A CloudTrail StopLogging event requires immediate response — it is the most common first action of an attacker who has achieved sufficient IAM access
Chapter 08 · ~18 min · Azure

Azure Security Deep Dive

Entra ID / Azure AD, Conditional Access, PIM, RBAC, Key Vault, Defender for Cloud, Sentinel, Azure Policy, managed identities, and storage security

Azure

Microsoft Azure's security model is deeply integrated with Microsoft Entra ID (formerly Azure Active Directory) — the identity platform that underpins not just Azure resource access but Microsoft 365, Dynamics, and thousands of third-party SaaS applications. This integration means that Azure security and identity security are inseparable in a way that is more explicit than on AWS or GCP. Understanding Entra ID is the prerequisite for understanding Azure security.

Microsoft Entra ID (formerly Azure Active Directory)

Entra ID is Microsoft's cloud identity platform — a fully managed identity-as-a-service that provides authentication and authorisation for Azure resources, Microsoft 365, and any application that supports SAML, OpenID Connect, or OAuth 2.0. It is not simply "Active Directory in the cloud" — it has a fundamentally different architecture and a much broader capability set.

Security-critical Entra ID features:

  • Conditional Access — policy engine that evaluates every authentication request against conditions (user location, device compliance status, risk level, application sensitivity) and enforces controls (MFA requirement, block access, require compliant device). The most impactful single security control in an Entra ID deployment. A conditional access policy requiring MFA from all locations except trusted IP ranges eliminates the credential-only compromise path.
  • Privileged Identity Management (PIM) — enables just-in-time privileged access. Instead of permanently assigning Global Administrator or Subscription Owner roles, PIM requires users to activate elevated roles on-demand, with justification, for a time-limited window. Activation is logged and can require MFA confirmation and manager approval.
  • Identity Protection — ML-based risk scoring for every sign-in. Risk signals include leaked credentials (from HaveIBeenPwned and similar sources), impossible travel, anonymous IP, malware-linked IP, and unfamiliar sign-in properties. High-risk sign-ins can be automatically blocked or required to complete MFA reset.
  • Entra ID Access Reviews — periodic automated reviews that prompt resource owners to confirm whether users still need their access. Integrates with PIM for privileged role reviews.

Azure RBAC

Azure uses Role-Based Access Control (RBAC) for authorisation to Azure resources. The key concepts:

  • Role definition — a collection of allowed and not-allowed operations. Built-in roles (Owner, Contributor, Reader, and hundreds of service-specific roles) cover most use cases. Custom roles allow granular permission sets.
  • Security principal — who is being assigned the role: a user, group, service principal (application identity), or managed identity.
  • Scope — where the assignment applies: management group, subscription, resource group, or individual resource. Assignments at higher scopes inherit to lower scopes.
  • Role assignment — the combination of role definition + security principal + scope.

Least privilege in Azure RBAC: use the most specific built-in role available before creating custom roles. Assign at the resource group scope rather than subscription scope unless the role genuinely needs subscription-wide access. Use groups rather than assigning roles to individual users — access reviews are easier to maintain when group membership is the control point.

Azure Key Vault

Azure Key Vault stores and manages three types of objects: secrets (connection strings, API keys, passwords), keys (cryptographic keys for encryption/decryption operations), and certificates (TLS/SSL certificates with automatic renewal support). Access is controlled by either Key Vault access policies (legacy) or Azure RBAC (recommended). All operations are logged in Azure Monitor.

Critical Key Vault configuration requirements:

  • Enable soft-delete (default since 2020) and purge protection — these prevent accidental or malicious key/secret deletion from immediately destroying data. Purge protection means a deleted vault or object cannot be permanently deleted during the retention period (7–90 days) even by an administrator.
  • Enable diagnostic settings to forward audit logs to a Log Analytics workspace or Event Hub — Key Vault access is security-critical and must be monitored.
  • Use private endpoints to restrict Key Vault access to your VNet — eliminating public internet exposure of the key management service.
  • Use managed identities for application access to Key Vault — no service principal secrets to manage, no rotation required.

Microsoft Defender for Cloud

Defender for Cloud provides two capabilities: Cloud Security Posture Management (CSPM) and Cloud Workload Protection Platform (CWPP). The CSPM component continuously assesses Azure resources against the Microsoft Cloud Security Benchmark, CIS Controls, and regulatory standards, producing a Secure Score. The CWPP component provides threat detection for specific workload types when Defender plans are enabled:

  • Defender for Servers — extends MDE capabilities to Azure VMs; vulnerability assessment; just-in-time VM access (blocking inbound management ports except during approved windows)
  • Defender for Storage — detects malware uploads, suspicious access patterns, unusual geo-location access to storage accounts
  • Defender for SQL — SQL injection detection, anomalous database activity, data exfiltration alerts
  • Defender for Containers — AKS cluster security, container image vulnerability scanning, runtime threat detection
  • Defender for Key Vault — detects unusual access patterns to Key Vault (access from Tor, access from unfamiliar applications, high volume of access)

Microsoft Sentinel

Sentinel is Azure's cloud-native SIEM and SOAR platform. It natively ingests all Azure log sources (Activity Log, Entra ID sign-in logs, Defender for Cloud alerts, Microsoft 365 Unified Audit Log, Defender XDR) with minimal configuration. Analytics rules are written in KQL (Kusto Query Language) and can be scheduled or real-time (using fusion ML correlation). Playbooks (Azure Logic Apps or Microsoft Defender automation rules) provide SOAR capability — automatically enriching, containing, or notifying on confirmed incidents.

For Azure-centric environments, Sentinel provides the most deeply integrated SIEM experience available — data connectors for Azure sources require no log forwarding configuration, and the Microsoft Defender integration provides cross-domain XDR correlation out of the box.

Managed Identities

Managed identities are the Azure equivalent of AWS instance profiles — application identities that are managed by Azure and provide temporary credentials automatically. Two types: system-assigned (tied to a specific resource; deleted when the resource is deleted) and user-assigned (independent lifecycle; can be assigned to multiple resources). Best practice: prefer user-assigned managed identities for production workloads where the identity lifecycle should be managed independently of specific compute resources.

Using Managed Identity — Application Example

An Azure App Service application needs to read secrets from Key Vault. The correct approach:

1. Enable a user-assigned managed identity on the App Service.

2. Assign the Key Vault Secrets User RBAC role to the managed identity on the Key Vault.

3. In application code, use DefaultAzureCredential (from the Azure SDK) — it automatically uses the managed identity when running in Azure, without any credential configuration in code.

No client secrets, no certificates, no rotation required. The identity is automatically managed by Azure.

Key Takeaways — Chapter 8
  • Conditional Access is the highest-impact single security control in an Entra ID deployment — requiring MFA for all locations eliminates credential-only compromise paths
  • PIM enables just-in-time privileged access — permanent Global Administrator assignments should not exist in mature Azure environments
  • Key Vault purge protection prevents accidental or malicious key destruction during the retention period — enable it on every production vault
  • Defender for Cloud Secure Score provides a single prioritised remediation queue — work through it systematically starting with highest-impact, lowest-effort items
  • Managed identities eliminate service principal secrets entirely for Azure-to-Azure authentication — there is no reason to use service principal credentials for workloads running in Azure
Chapter 09 · ~16 min · GCP

GCP Security Deep Dive

Resource hierarchy, IAM inheritance, VPC firewall rules, Cloud Audit Logs, Security Command Centre, Cloud Armor, Binary Authorization, and BeyondCorp

GCP

Google Cloud Platform has a security architecture that reflects Google's own internal security practices — including BeyondCorp (Zero Trust) principles, strong default encryption, and a resource hierarchy that enables governance at organisational scale. GCP is the third-largest cloud provider but has distinctive security capabilities that are architecturally ahead of the market in some areas (Confidential Computing, Binary Authorization, BeyondCorp Enterprise). This chapter covers the GCP-specific security concepts and services that practitioners need.

Resource Hierarchy and IAM Inheritance

GCP's resource hierarchy is fundamental to understanding how IAM works at scale. The hierarchy has four levels:

  • Organisation — the root node, corresponding to a company's Google Workspace or Cloud Identity domain. IAM policies at the organisation level apply to all resources beneath it. Organisation Policies (constraints) are applied here.
  • Folders — optional groupings below the organisation. Used to mirror organisational structure (business units, teams, environments). IAM policies and Organisation Policies applied to a folder affect all projects within it.
  • Projects — the primary grouping of GCP resources. All billing, APIs, and IAM are scoped to a project. Projects isolate resources from each other — a VPC in one project cannot see a VPC in another by default.
  • Resources — individual GCP resources (VM, bucket, database, function). IAM policies at the resource level affect only that resource.

IAM policies are additive and inherit downward: a permission granted at the organisation level is automatically granted at all folders, projects, and resources below it. There is no equivalent of AWS's explicit Deny in GCP IAM policies — GCP uses Organisation Policy constraints for preventive controls rather than deny policies in IAM. This is an important architectural difference: in GCP, the way to prevent a project owner from creating public storage buckets is an Organisation Policy constraint, not an IAM deny.

GCP IAM — Roles and Service Accounts

GCP IAM assigns roles (collections of permissions) to principals (identities). Three role types:

  • Basic roles (primitive roles) — Owner, Editor, Viewer. Apply to all GCP services. Extremely broad permissions. Should never be used in production environments except for the initial project owner bootstrap.
  • Predefined roles — granular, service-specific roles defined by Google (e.g., roles/storage.objectViewer, roles/compute.instanceAdmin.v1). Use predefined roles by default.
  • Custom roles — created by the customer with specific permissions. Used when predefined roles are too broad; require maintenance as GCP adds new permissions.

Service accounts are GCP's machine identities — used to authenticate applications and VMs. Common misuse patterns:

  • Attaching a service account with Owner or Editor role to a compute instance — gives any attacker who compromises the instance full control of the project
  • Downloading service account key files and storing them in application configuration — creates long-lived static credentials equivalent to AWS access keys with all their associated risks
  • Granting broad roles to service accounts because scoping them precisely is more work

Best practice: use Workload Identity Federation for external workloads (GitHub Actions, AWS, on-premises) to authenticate to GCP without service account keys. For GCP workloads, attach a project-specific service account with minimal permissions to each compute resource.

VPC Firewall Rules

GCP VPC firewall rules differ architecturally from AWS security groups in important ways. GCP firewall rules are:

  • Global — not tied to subnets or availability zones. A firewall rule applies to any resource in the VPC that matches the target (by network tag or service account).
  • Priority-based — rules are evaluated in priority order (lower number = higher priority). The first matching rule wins. An implicit deny-all exists at the lowest priority (65535).
  • Network tag targeted — firewall rules apply to instances with specific network tags, not to specific instances or subnets. An instance with tag web-server gets rules targeting web-server; adding or removing the tag changes which rules apply.
  • Separate ingress and egress rules — unlike AWS security groups (stateful), GCP firewall rules require explicit ingress and egress rules. Default egress allows all outbound; you must create deny rules to restrict outbound traffic.
GCP Firewall — Default Rules to Change

Every new VPC has a default firewall rule allowing all ingress from within the VPC (default-allow-internal) and egress to all destinations (default-allow-egress). The default-allow-ssh and default-allow-rdp rules allow SSH and RDP from 0.0.0.0/0. Delete these immediately in any production network and replace with rules that target specific source ranges or use IAP for administrative access.

Cloud Audit Logs

GCP Cloud Audit Logs are the CloudTrail equivalent — records of who did what in your GCP environment. Four log types:

  • Admin Activity — configuration and metadata changes (VM created, IAM policy changed). Always enabled; cannot be disabled.
  • Data Access — API calls that read or write resource data (Cloud Storage object read, BigQuery dataset query). Disabled by default — must be explicitly enabled. Without Data Access logs, you cannot determine which objects were accessed during a breach.
  • System Events — GCP system actions (live migration of a VM). Always enabled; cannot be disabled.
  • Policy Denied — access denied by VPC Service Controls or Organisation Policies. Enabled by default.

Enable Data Access audit logs for all services handling sensitive data. This is the most commonly missing GCP log source in breach investigations.

Security Command Centre (SCC)

SCC provides three capabilities for GCP security: Security Health Analytics (CSPM — evaluating resource configurations), Event Threat Detection (threat detection from Cloud Audit Logs and other sources), and Container Threat Detection (runtime detection for GKE). SCC Premium adds compliance posture monitoring against CIS Benchmark, PCI-DSS, HIPAA, and other standards.

Key SCC threat detection findings to prioritise: brute force SSH, domain phishing, anomalous IAM grants, service account key exposure in public repositories, data exfiltration via Storage.

Binary Authorization

Binary Authorization enforces a policy that only signed, trusted container images can be deployed to GKE (and Cloud Run). The policy requires that images be attested (signed with a cryptographic key) before deployment. This prevents the deployment of unverified images — including images that have not been vulnerability-scanned, images from untrusted registries, and images that have been tampered with. Binary Authorization implements the supply chain integrity principle in the deployment pipeline.

BeyondCorp Enterprise

BeyondCorp Enterprise is Google's Zero Trust access product, evolved from Google's internal BeyondCorp programme that eliminated the corporate VPN in favour of continuous per-request authorisation based on identity and device context. BeyondCorp Enterprise provides: application-level access control (access to specific web applications based on user identity + device compliance, without a network-level VPN), context-aware access policies, and integration with Chrome Browser Cloud Management for device posture verification.

Key Takeaways — Chapter 9
  • GCP IAM policies are additive and inherit downward — Organisation Policies (constraints) provide the preventive controls that deny policies provide in AWS
  • Service accounts with Owner/Editor roles attached to compute instances are a critical vulnerability — scope service account permissions to the minimum required
  • Data Access audit logs are disabled by default in GCP — enable them for all services handling sensitive data before you need them for an investigation
  • GCP VPC firewall default rules (allow-ssh, allow-rdp from 0.0.0.0/0) must be deleted in production networks and replaced with IAP-based access
  • Binary Authorization enforces container image provenance at deployment time — a supply chain integrity control not available natively on the other major providers
Chapter 10 · ~16 min · Containers

Container & Kubernetes Security

Image security, runtime attack surface, Kubernetes RBAC, network policies, pod security, secrets in K8s, service mesh, and Falco runtime detection

Vendor-NeutralEKSAKSGKE

Containers and Kubernetes have become the dominant deployment model for cloud-native applications — and they introduce a security model that is meaningfully different from both traditional VMs and serverless functions. Containers share the host kernel (unlike VMs, which have separate kernels), creating a different trust boundary. Kubernetes adds an orchestration layer with its own RBAC system, network model, and attack surface that must be secured independently of the underlying cloud infrastructure.

Container Image Security

A container image's security is established before it ever runs — the base image, installed packages, and embedded secrets determine the image's vulnerability surface from the moment it is built.

  • Base image selection — prefer minimal base images (distroless, Alpine, scratch) that contain only what the application needs. A Debian full base image includes hundreds of packages the application does not use but an attacker can exploit. Google's distroless images contain only the application runtime and its dependencies — no shell, no package manager, nothing for a post-exploitation attacker to use.
  • Vulnerability scanning in the pipeline — scan every image for known CVEs before it reaches production. Tools: Trivy (open-source, fast, excellent coverage), Snyk Container, AWS ECR scanning (powered by Inspector), Azure Container Registry scanning (powered by Defender), GCR vulnerability scanning. Block deployments of images with critical vulnerabilities — this is the shift-left principle applied to container security.
  • Image signing — Cosign (from the Sigstore project) provides keyless container image signing using OIDC identity. Signed images can be verified at deployment time by Binary Authorization (GCP) or OPA Gatekeeper (any cloud) — proving the image was built by your CI/CD pipeline and has not been tampered with.
  • No secrets in images — never bake API keys, database passwords, or TLS certificates into container images. Build-time secrets committed to an image layer persist in the image history and in any registry the image is pushed to, even if later layers appear to delete them.
  • Least-privilege Dockerfile — set a non-root USER in your Dockerfile; drop Linux capabilities with --cap-drop ALL; use read-only root filesystems where possible.
Secure Dockerfile Pattern
FROM gcr.io/distroless/nodejs20-debian12

# Copy only production artifacts
COPY --chown=nonroot:nonroot dist/ /app/

# Run as non-root
USER nonroot

EXPOSE 3000
CMD ["/app/server.js"]

No shell, no package manager, non-root user, no capabilities. An attacker who achieves code execution in this container has a severely limited post-exploitation environment.

Kubernetes RBAC

Kubernetes has its own RBAC system independent of the cloud provider's IAM. Cloud IAM controls who can manage the Kubernetes cluster itself (via the cloud API); Kubernetes RBAC controls what authenticated users can do within the cluster (create pods, read secrets, exec into containers).

  • ClusterRole / Role — ClusterRoles apply cluster-wide; Roles apply within a specific namespace. Prefer namespace-scoped Roles over ClusterRoles for workload access.
  • ClusterRoleBinding / RoleBinding — binds a role to a user, group, or service account. A RoleBinding grants the subject the role's permissions within the binding's namespace.
  • Service Accounts in Kubernetes — pods authenticate to the Kubernetes API using a service account token automatically mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. This token can be used by a compromised container to interact with the cluster API. Disable auto-mounting of service account tokens for pods that don't need cluster API access: automountServiceAccountToken: false.
Kubernetes RBAC Anti-Pattern

Assigning the cluster-admin ClusterRole to a service account or user gives unrestricted access to every Kubernetes resource — create, read, update, delete, exec. This is the Kubernetes equivalent of AWS AdministratorAccess. A compromised pod running with a cluster-admin bound service account can read all secrets, exec into other pods, create privileged pods, and escalate to host-level access. Audit cluster-admin bindings regularly: kubectl get clusterrolebindings -o json | jq '.items[] | select(.roleRef.name=="cluster-admin")'

Kubernetes Network Policies

By default, every pod in a Kubernetes cluster can communicate with every other pod — there is no network segmentation between namespaces or workloads. Network Policies implement firewall rules at the pod level using label selectors, restricting which pods can communicate with which other pods on which ports. Without Network Policies, a compromised pod can reach every database, every API, and every other service in the cluster.

Default-deny network policy — apply to every namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}      # selects all pods in namespace
  policyTypes:
  - Ingress
  - Egress             # deny all ingress and egress by default

Apply a default-deny policy to every namespace, then add specific allow policies for required communication paths. This is the micro-segmentation equivalent of "default deny, explicit allow" in traditional firewall policy.

Pod Security

Kubernetes Pod Security Admission (PSA) — the replacement for the deprecated PodSecurityPolicy — enforces security standards on pods at admission time. Three built-in policy levels:

  • Privileged — no restrictions. Allows all pod configurations including privileged containers, host network access, host path mounts. Do not use in production.
  • Baseline — blocks known privilege escalation vectors: privileged containers, host PID/network/IPC, most dangerous capabilities. A reasonable minimum for most workloads.
  • Restricted — heavily restricted: running as non-root required, read-only root filesystem encouraged, all capabilities dropped, seccomp profile required. Follows pod hardening best practices.

Secrets in Kubernetes

Kubernetes Secrets store sensitive data (passwords, API keys, TLS certificates) that pods can access via environment variables or volume mounts. The fundamental problem: Kubernetes Secrets are base64-encoded by default, not encrypted — anyone with read access to the etcd datastore has access to all secrets in plaintext. Mitigation approaches:

  • Encryption at rest — enable etcd encryption with a KMS provider (AWS KMS, Azure Key Vault, GCP Cloud KMS) so secrets are encrypted in etcd using a provider-managed key. Supported natively in EKS, AKS, and GKE.
  • External secrets operators — store secrets in the cloud provider's secrets manager (Secrets Manager, Key Vault, Secret Manager) and use an External Secrets Operator or Secrets Store CSI Driver to inject them into pods at runtime. The secret never lives persistently in etcd.
  • Vault agent injection — HashiCorp Vault injects secrets into pods as a sidecar, with short-lived dynamic credentials that rotate automatically. Eliminates static secrets entirely for supported databases and services.

Runtime Security — Falco

Falco is a CNCF open-source runtime security tool that detects anomalous behaviour in containers and Kubernetes workloads by monitoring system calls using eBPF or kernel modules. Where image scanning detects known vulnerabilities and RBAC controls authorisation, Falco detects behavioural anomalies at runtime — things that should not happen in a well-behaved container:

  • A shell process spawned inside a container that has no business running a shell
  • A container writing to /etc/passwd, /etc/shadow, or other sensitive system files
  • A container making an unexpected network connection to an external IP
  • A container accessing the Kubernetes service account token file
  • A privileged container starting

Falco generates alerts that can be forwarded to a SIEM, Slack, or a response playbook. Cloud provider equivalents: Amazon GuardDuty EKS runtime monitoring, Microsoft Defender for Containers runtime detection, GCP Container Threat Detection in SCC.

Key Takeaways — Chapter 10
  • Container image security is established at build time — minimal base images, vulnerability scanning in CI, image signing, and no embedded secrets are the foundation
  • Disable automountServiceAccountToken for pods that don't need cluster API access — the default-mounted token is a high-value target in a compromised container
  • Apply a default-deny NetworkPolicy to every namespace then add explicit allow rules — without it, all pods can reach all other pods by default
  • Kubernetes Secrets are base64-encoded in etcd, not encrypted — enable etcd encryption with a KMS provider or use external secrets operators for production secrets
  • Falco runtime detection catches anomalous syscall behaviour that image scanning and RBAC cannot — it is the behavioural IDS for container workloads
Chapter 11 · ~14 min · Serverless & IaC

Serverless & Infrastructure as Code Security

Serverless attack surface, function IAM, event injection, IaC static analysis, shift-left misconfiguration detection, and IaC drift

Vendor-NeutralLambdaFunctionsCloud Functions

Serverless computing — Functions as a Service — eliminates server management but does not eliminate security responsibility. The security model shifts: you are no longer responsible for OS patching or container hardening, but you are responsible for the function's IAM permissions, its dependencies, its event sources, and the data it processes. Infrastructure as Code transforms cloud infrastructure from a manually managed system into a software artefact — and brings both the benefits of software engineering practices and the risks of code vulnerabilities to cloud configuration.

Serverless Security Model

The provider manages: the runtime environment (Node.js, Python, Java runtimes), OS patching, network infrastructure, and scaling. The customer is responsible for: the function's code and its dependencies, the execution role's IAM permissions, the trigger/event sources, secrets handling within the function, and VPC configuration if required.

The serverless attack surface differs from VM or container attacks:

  • No persistent filesystem — only /tmp is writable and is not guaranteed to persist between invocations. Malware persistence mechanisms that rely on filesystem modification do not apply.
  • Short execution windows — Lambda functions have a 15-minute maximum execution time; most run in seconds. Some attack techniques that require sustained presence are impractical.
  • Over-privileged execution roles — the most common serverless security failure. A Lambda function with AdministratorAccess because "it needed to access several services" becomes a powerful pivot point for any attacker who achieves code execution in it.
  • Event injection — serverless functions process events from sources (S3 put, API Gateway request, SQS message, DynamoDB stream). If an attacker can control the event source data, they may be able to inject malicious payloads that the function processes — analogous to SQL injection but at the cloud event layer.
  • Dependency supply chain — serverless functions typically include third-party npm/pip/Maven packages. A compromised package in the function's dependency tree (like the log4shell vulnerability in Log4j) gives the attacker code execution inside the function with its IAM permissions.
Lambda Least Privilege — Pattern

Instead of attaching AmazonS3FullAccess to a Lambda function that only reads from one bucket, create a minimal inline policy:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject"],
    "Resource": "arn:aws:s3:::my-specific-bucket/*"
  }]
}

Additionally restrict by condition: add "Condition": {"StringEquals": {"aws:SourceVpc": "vpc-xxxxxxxx"}} if the function runs in a VPC and only needs to access S3 from within it.

Infrastructure as Code Security

Infrastructure as Code (IaC) — Terraform, AWS CloudFormation, Azure Bicep, GCP Deployment Manager, Pulumi — represents cloud infrastructure as version-controlled code. This enables peer review, automated testing, and consistent deployment. It also means that a security misconfiguration in IaC will be deployed consistently and repeatedly across environments, amplifying rather than isolating mistakes.

Static Analysis of IaC

Static analysis tools scan IaC files for security misconfigurations before they are deployed — the shift-left prevention mechanism for cloud infrastructure:

  • Checkov (Bridgecrew/Prisma Cloud) — open-source; supports Terraform, CloudFormation, Kubernetes YAML, Dockerfile, Helm; over 1,000 built-in policies; extensible with custom checks.
  • tfsec — Terraform-specific static analysis; fast; integrates well in CI pipelines; produces findings with severity and remediation guidance.
  • Terrascan — multi-platform IaC scanner; supports Terraform, CloudFormation, Helm; built-in policies for AWS, Azure, GCP, Kubernetes.
  • KICS (Keeping Infrastructure as Code Secure) — Checkmarx open-source scanner; broad format support including Docker Compose and Ansible.
Checkov in a CI Pipeline (GitHub Actions)
- name: Run Checkov IaC scan
  uses: bridgecrewio/checkov-action@master
  with:
    directory: terraform/
    framework: terraform
    soft_fail: false          # fail the pipeline on findings
    skip_check: CKV_AWS_18   # skip specific checks if justified
    output_format: sarif      # upload to GitHub Security tab

Secrets in IaC

IaC files frequently contain secrets — database passwords hardcoded in CloudFormation parameters, API keys in Terraform variables with default values, private keys in Ansible playbooks. These secrets end up in version control and in the IaC deployment tool's state files.

  • Never hardcode secrets in IaC — use parameter store references, secrets manager ARNs, or Vault dynamic secrets
  • Terraform state files contain resource configurations including sensitive outputs — encrypt state (S3 with SSE-KMS, Azure Storage with CMK) and restrict access
  • Use sensitive = true in Terraform to mark outputs as sensitive — prevents logging in CI output but does not encrypt state
  • Deploy git-secrets or detect-secrets as pre-commit hooks to prevent secret commits before they reach version control

IaC Drift

IaC drift occurs when the deployed infrastructure diverges from the IaC definition — because a developer made a manual change in the console "just to test something" or an automated process modified a resource outside the IaC lifecycle. Drift creates security risk: a security group rule added manually during an incident may remain in place long after the incident is resolved. terraform plan shows drift; tools like Driftle or Snyk Cloud (formerly Fugue) provide continuous drift detection.

Key Takeaways — Chapter 11
  • Over-privileged execution roles are the most common serverless security failure — every function should have a purpose-built minimal IAM role
  • Event injection treats function input as untrusted — validate and sanitise all event data before processing, regardless of source
  • Dependency supply chain risk applies to serverless — scan function packages for known vulnerabilities as part of the build process
  • IaC static analysis (Checkov, tfsec) catches misconfigurations before deployment — integrate as a CI/CD pipeline gate that blocks on critical findings
  • Terraform state contains sensitive resource configuration — encrypt it with CMK and restrict access to the state backend
Chapter 12 · ~17 min · Cloud IR

Cloud Detection & Incident Response

Cloud IR vs on-premises IR, cloud forensics workflow, evidence preservation, cloud-specific attack patterns, the cloud IR playbook, and multi-cloud incidents

AWSAzureGCP

Cloud incident response is fundamentally different from on-premises IR. There is no physical access to hardware. Evidence exists in logs and API-accessible storage rather than on physical disks. Infrastructure can be terminated and re-provisioned in minutes, destroying forensic evidence. Compute is ephemeral, and the attacker knows this too — some cloud attacks are designed to complete and clean up within a single Lambda invocation or a short-lived container. This chapter addresses how to detect, investigate, and respond to cloud security incidents with the constraints and capabilities the cloud environment provides.

How Cloud IR Differs from On-Premises IR

AspectOn-Premises IRCloud IR
Physical evidenceDisk imaging, hardware seizure possibleNo physical access; evidence is in logs and API-accessible snapshots
Memory forensicsRAM acquisition from live or hibernated systemPossible on IaaS VMs via SSM/live response; not possible for serverless or terminated instances
Network capturesTAP or SPAN at physical switch layerVPC Flow Logs (metadata only); PCAP possible via traffic mirroring (AWS) or VNet TAP (Azure) if pre-configured
Evidence volatilitySystems persist until powered off or re-imagedSpot instances, containers, Lambda functions can self-terminate; logs expire on provider-defined schedules
Provider involvementFull control over own infrastructureProvider controls physical infrastructure; will not assist without legal process
Scope determinationNetwork-based lateral movement visible in logsIAM-based lateral movement via role assumption requires CloudTrail/audit log analysis

Cloud Forensics Workflow

AWS EC2 Instance Forensics

  1. Preserve evidence before containment — take an EBS snapshot of all volumes attached to the instance. Create the snapshot before isolation so it captures current state.
  2. Isolate the instance — use EDR isolation if Defender or CrowdStrike is deployed. Otherwise, modify the security group to remove all inbound and outbound rules. Do not terminate the instance — termination may destroy ephemeral data.
  3. Capture volatile state — use SSM Session Manager to run live response commands (process list, network connections, open files) and capture output to S3.
  4. Create forensic volume — from the EBS snapshot, create a new EBS volume in the same region. Attach it to a dedicated forensic analysis instance as a secondary volume (do not boot from it).
  5. Analyse offline — mount the forensic volume read-only on the analysis instance and run forensic tools (Volatility for memory if a memory dump was captured, TSK for filesystem analysis, log parsing).
AWS — Snapshot and Forensic Volume Creation
# Create snapshot of compromised instance's root volume
aws ec2 create-snapshot \
  --volume-id vol-xxxxxxxxxxxxxxxxx \
  --description "Forensic snapshot - incident INC-2024-001 - $(date -u +%Y%m%dT%H%M%SZ)"

# Tag snapshot for chain of custody
aws ec2 create-tags \
  --resources snap-xxxxxxxxxxxxxxxxx \
  --tags Key=IncidentID,Value=INC-2024-001 \
         Key=ExaminerID,[email protected] \
         Key=CaptureTime,Value=$(date -u +%Y%m%dT%H%M%SZ)

# Create forensic volume from snapshot (different AZ = no accidental boot)
aws ec2 create-volume \
  --snapshot-id snap-xxxxxxxxxxxxxxxxx \
  --availability-zone eu-west-2b \
  --volume-type gp3

Azure VM Forensics

Capture a managed disk snapshot via the portal or Azure CLI. Export the snapshot as a VHD to Azure Blob Storage using the az snapshot grant-access command to generate a SAS URL. Download or mount the VHD for offline analysis. Azure's Just-In-Time VM access (Defender for Servers) and Run Command (equivalent to SSM Session Manager) provide live response capability without requiring open RDP/SSH ports.

Cloud Log Preservation

Cloud logs expire. CloudTrail logs in S3 persist based on your S3 lifecycle policy — many organisations retain only 90 days of standard events. Azure activity logs are retained for 90 days by default. GCP audit logs have variable retention (Admin Activity: 400 days; Data Access: 30 days). When an incident is declared, immediately:

  • Issue a legal hold / preservation notice on affected log storage buckets (S3 Object Lock, Azure immutability policy)
  • Export relevant CloudTrail events for the investigation period to a separate analysis bucket
  • Extend log retention if within the investigation window
  • Identify and preserve any VPC Flow Logs, S3 access logs, and application logs that may be on shorter retention schedules

Cloud-Specific Attack Patterns

SSRF to Metadata Service Credential Theft

Server-Side Request Forgery (SSRF) in a cloud-hosted application allows an attacker to make the server issue HTTP requests on their behalf. The EC2 instance metadata service (IMDS) is reachable at 169.254.169.254 from any process running on the instance — including one exploiting an SSRF vulnerability. The metadata service returns temporary IAM credentials for the instance's attached role.

Mitigation: require IMDSv2 (token-based IMDS) which is not accessible via simple SSRF. With IMDSv2, the attacker must first obtain a session token via a PUT request — which SSRF cannot do from a browser context. Enforce IMDSv2 via EC2 instance metadata option or AWS account-level setting.

Enforce IMDSv2 — Account-Level Setting
aws ec2 modify-instance-metadata-defaults \
  --http-tokens required \
  --region eu-west-2

IAM Privilege Escalation

Detected in CloudTrail by monitoring for: iam:CreatePolicyVersion, iam:AttachUserPolicy, iam:AttachRolePolicy, iam:PutRolePolicy, iam:PassRole used in combination with ec2:RunInstances or lambda:CreateFunction. Any IAM modification by an identity that has not previously performed IAM actions should alert immediately.

Data Exfiltration via Object Storage

Bulk S3 GetObject events in S3 access logs or CloudTrail data events. Replication configuration changes creating cross-account or cross-region replication to attacker-controlled buckets. Presigned URL generation for large sets of objects. Amazon Macie and GuardDuty Exfiltration:S3/ObjectRead.Unusual findings cover this pattern.

The Cloud IR Playbook — High Level

  1. Detect — GuardDuty/SCC/Defender alert, SIEM correlation, or threat hunt finding triggers investigation
  2. Preserve — snapshot affected volumes, preserve relevant logs to immutable storage, extend retention
  3. Scope — identify all affected accounts, roles, and resources via CloudTrail/audit log analysis. Map all actions taken by compromised identities.
  4. Contain — revoke compromised IAM credentials (invalidate sessions, disable/delete access keys, revoke role sessions via sts:RevokeRole). Isolate affected compute. Block C2 IPs at security group/NACL level.
  5. Eradicate — remove attacker persistence (new IAM users/keys/backdoor policies created by attacker). Rotate all credentials that were potentially accessible. Patch the initial access vector.
  6. Recover — restore from clean snapshots or redeploy via IaC. Verify via CSPM scan that environment matches expected baseline.
  7. Lessons learned — update detection rules, rotate credentials as standard, close policy gaps identified during investigation
Key Takeaways — Chapter 12
  • Cloud forensics relies on logs and snapshots rather than physical media — evidence preservation must happen immediately on incident declaration before logs expire
  • EBS snapshots (AWS), disk snapshots (Azure/GCP) provide point-in-time disk images accessible for forensic analysis without affecting the running instance
  • IMDSv2 prevents SSRF-based metadata service credential theft — enforce it at the account level, not just on new instances
  • CloudTrail data events for S3 are disabled by default — without them, you cannot determine which objects were accessed during a breach
  • IAM-based lateral movement (role assumption chains) is the most common cloud lateral movement vector — scope by analysing AssumeRole chains in CloudTrail
Chapter 13 · ~15 min · DevSecOps

DevSecOps & Secure CI/CD

Secure SDLC, pipeline attack surface, secrets in pipelines, SAST, SCA, DAST, container scanning, policy gates, and DevSecOps culture

Vendor-Neutral

DevSecOps is the integration of security practices into the DevOps software delivery lifecycle — moving security from a gate at the end of development to a continuous practice embedded throughout. For cloud security specifically, DevSecOps means the security of cloud infrastructure and applications is validated at every stage of the pipeline, from the developer's local environment to production deployment. The alternative — security reviewed only by a separate team after development completes — cannot scale to the velocity of modern cloud-native development.

The Secure SDLC for Cloud Applications

  1. Requirements — security requirements defined alongside functional requirements. Data classification determines which services can store what data. Regulatory obligations (GDPR, PCI-DSS) constrain architecture choices. Threat modelling identifies the highest-risk components before a line of code is written.
  2. Design — architecture review with security lens: principle of least privilege for service accounts, encryption key management design, network segmentation plan, secrets management approach. The cheapest time to fix a security design flaw is before implementation begins.
  3. Development — SAST (Static Application Security Testing) runs in the IDE and on every commit. Dependency scanning (SCA) flags vulnerable packages. Pre-commit hooks prevent secrets from reaching version control. Security unit tests verify security controls in application code.
  4. Build — container image scanning for vulnerabilities. IaC static analysis (Checkov, tfsec). SAST on the final build artifact. Image signing. Software Bill of Materials (SBOM) generation.
  5. Test — DAST (Dynamic Application Security Testing) against a deployed test environment. Penetration testing for significant releases. Compliance scan against security baseline.
  6. Deploy — policy gates block deployment of non-compliant artifacts. IaC deployed via pipeline (not manual console changes). Binary Authorization or OPA Gatekeeper enforces signed image requirement.
  7. Operate — CSPM continuous monitoring, SIEM alerting, runtime security (Falco, Defender for Containers), vulnerability management for running workloads.

CI/CD Pipeline Attack Surface

The CI/CD pipeline is a high-value attack target because it has privileged access to deploy infrastructure and code to production. A compromised pipeline can modify application code, exfiltrate secrets, and deploy backdoored infrastructure — affecting every downstream environment simultaneously.

  • Pipeline injection — if user-supplied data (pull request title, branch name, issue content) is interpolated into pipeline scripts without sanitisation, an attacker can inject arbitrary pipeline commands. GitHub Actions is particularly susceptible via ${{ github.event.pull_request.title }} expressions used in run steps.
  • Compromised third-party actions/plugins — GitHub Actions workflows that use community-provided actions are dependent on the security of those action repositories. An attacker who compromises a popular GitHub Action can inject malicious code into every pipeline that uses it. Pin actions to specific commit SHAs rather than tags (uses: actions/checkout@abc1234) to prevent tag re-pointing attacks.
  • Overprivileged pipeline credentials — pipelines that deploy to AWS/Azure/GCP using long-lived access keys with broad permissions. If the key leaks (via log output, compromised pipeline configuration), the attacker gains production-level cloud access.
  • Lateral movement from pipeline to production — a compromised pipeline runner has network access to the VPC where it deploys. Without strict egress controls on the runner environment, an attacker can use the runner as a pivot point into the production environment.

Secrets in Pipelines — The Right Way

The correct approach for cloud credentials in CI/CD is OIDC federation — eliminating long-lived access keys entirely. GitHub Actions, GitLab CI, and other modern CI/CD platforms support OIDC token exchange: the pipeline presents a short-lived OIDC token signed by the CI provider, and AWS/Azure/GCP exchange it for temporary cloud credentials valid only for the duration of the pipeline run.

GitHub Actions — OIDC to AWS (no access keys)
permissions:
  id-token: write    # required for OIDC token request
  contents: read

steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy
      aws-region: eu-west-2
      # No aws-access-key-id or aws-secret-access-key needed
      # Pipeline receives temporary STS credentials valid for ~1 hour

Security Testing Tools

CategoryToolWhat It Finds
SASTSemgrep, CodeQL (GitHub), SonarQubeHard-coded secrets, insecure functions, SQL injection patterns, insecure deserialization in source code
SCA (Dependencies)Dependabot, Snyk, OWASP Dependency-CheckKnown CVEs in third-party packages and transitive dependencies
Secret scanningGitleaks, detect-secrets, GitHub Secret ScanningAPI keys, tokens, private keys committed to version control
Container scanningTrivy, Grype, Snyk ContainerOS and application package vulnerabilities in container images
IaC scanningCheckov, tfsec, TerrascanSecurity misconfigurations in Terraform, CloudFormation, Bicep, Helm
DASTOWASP ZAP, Burp Suite Enterprise, StackHawkRuntime application vulnerabilities: XSS, SQLi, auth bypass, IDOR

The DevSecOps Culture Challenge

The technology of DevSecOps is relatively straightforward to implement. The cultural change is harder. Security teams accustomed to reviewing deliverables at release gates must become service providers who help development teams build securely throughout the cycle. Developers accustomed to security being "someone else's job" must accept responsibility for the security of the code and infrastructure they write.

The most effective DevSecOps programmes share a common characteristic: security teams treat developers as customers and optimise for developer experience. A SAST tool that produces 400 findings on a new project, most of them false positives, trains developers to ignore security findings. A well-tuned SAST tool that produces 8 high-confidence findings on a new project, all actionable and with clear remediation guidance, trains developers to take security findings seriously. The investment in reducing noise pays compounding returns in developer trust and security engagement.

Key Takeaways — Chapter 13
  • Security shifts left by being embedded at every SDLC stage — design (threat model), develop (SAST, secrets scan), build (container scan, IaC scan), deploy (policy gates), operate (CSPM, runtime)
  • OIDC federation eliminates long-lived CI/CD cloud credentials — every major cloud provider supports it with GitHub Actions, GitLab CI, and other platforms
  • Pin third-party GitHub Actions to commit SHAs, not tags — tag re-pointing attacks have compromised multiple supply chains
  • Well-tuned security tooling with low false positive rates builds developer trust; high-noise tools train developers to ignore findings
  • Pipeline injection via unsanitised event data is a critical CI/CD vulnerability — never interpolate external input directly into pipeline run commands
Chapter 14 · ~14 min · Governance

Cloud Compliance & Governance at Scale

Multi-account structure, guardrails vs guidelines, landing zones, FedRAMP and HIPAA in the cloud, CIS Benchmarks, FinOps security, and multi-cloud governance

Vendor-NeutralAWSAzureGCP

As cloud environments grow — from a handful of accounts to hundreds, from a single team to dozens of engineering groups — governance and compliance cannot scale through manual process. Every new account needs the same security baseline. Every new resource needs to comply with the same policies. Achieving this at scale requires architectural patterns that make compliance the path of least resistance, and guardrails that make non-compliance technically difficult rather than simply policy-violating.

Multi-Account Structure for Governance

The foundational cloud governance pattern is organising workloads into separate accounts (AWS), subscriptions (Azure), or projects (GCP) with a management hierarchy that enables centralised policy enforcement:

ConceptAWSAzureGCP
Top-level groupingAWS OrganisationManagement GroupOrganisation
Mid-level groupingOrganisational Units (OUs)Management Group hierarchyFolders
Workload isolationAWS AccountSubscriptionProject
Preventive policyService Control Policies (SCPs)Azure Policy (deny effect)Organisation Policy constraints
Central loggingCloudTrail organisation trail → central S3Log Analytics workspace with diagnostic settingsOrganisation-level audit log sink → Cloud Storage
Central securityGuardDuty delegated admin → Security Hub aggregationDefender for Cloud + Sentinel aggregation workspaceSCC organisation-level view

Guardrails vs Guidelines

The most important governance design decision is which security requirements to implement as guardrails (technical preventive controls that cannot be bypassed) and which to implement as guidelines (policy and process controls that can be violated but should not be).

Guardrails — SCPs, Azure Policy deny effects, GCP Org Policy constraints — enforce that certain configurations are impossible, regardless of what individual account administrators do. Examples of things that should be guardrails:

  • Preventing the creation of public S3 buckets or Azure Blob containers in production accounts
  • Preventing the disabling of CloudTrail, GuardDuty, or Security Hub
  • Restricting resource creation to approved regions only
  • Requiring encryption on all storage resources
  • Preventing IAM root user access key creation

Everything else — tagging standards, naming conventions, cost allocation — is guideline territory, enforced through process and detective controls rather than hard technical prevention.

Landing Zones

A landing zone is a pre-configured, governed cloud environment baseline that new accounts are provisioned into automatically. It provides every new account with the security guardrails, logging configuration, network architecture, and identity integration required by organisational standards — without requiring the new team to configure it from scratch.

  • AWS Control Tower — AWS's managed landing zone service. Provisions a multi-account structure with mandatory guardrails (SCPs preventing disabling of logging and security services), a centralised log archive account, an audit account, and optional optional guardrails from a library. Extending with Account Factory for Terraform (AFT) enables fully automated account vending.
  • Azure Landing Zone — a reference architecture for enterprise Azure deployments. Microsoft publishes Bicep/Terraform templates implementing the recommended management group hierarchy, policy assignments, and connectivity hub. Deployed via Azure Deployment Environments or Terraform.
  • GCP Landing Zone — Google publishes a Cloud Foundation Toolkit (CFT) providing Terraform modules for organisation setup, folder structure, IAM constraints, VPC service controls, and centralised logging.

Compliance Frameworks in the Cloud Context

FedRAMP

FedRAMP (Federal Risk and Authorization Management Program) is the US government's programme for assessing and authorising cloud services used by federal agencies. Cloud service providers pursuing FedRAMP authorisation undergo a rigorous assessment against NIST SP 800-53 controls. AWS GovCloud, Azure Government, and GCP's public sector regions are FedRAMP-authorised. A FedRAMP-authorised cloud provider does not mean a customer's deployment is FedRAMP-compliant — the customer must still implement their own controls and obtain an ATO (Authority to Operate) for their specific application.

HIPAA in the Cloud

All three major cloud providers sign Business Associate Agreements (BAAs) for services that handle Protected Health Information (PHI). The BAA establishes the provider's responsibilities under HIPAA for those services. Not all cloud services are covered under the BAA — customers must ensure that PHI is only stored and processed in BAA-covered services. AWS publishes a list of HIPAA-eligible services; Azure and GCP publish equivalent lists. Selecting a non-covered service for PHI processing voids the BAA protection for that use.

FinOps Meets Security

Cloud cost management (FinOps) and cloud security intersect in a surprising but important way: abandoned and unmanaged cloud resources are both a cost waste and a security risk. An EC2 instance launched for a proof of concept three years ago, forgotten, and never terminated represents a running bill and an unpatched, unmonitored attack surface. An S3 bucket created for a temporary data migration that was never deleted may still contain sensitive data and lack current security controls.

Regular cloud resource reviews — identifying resources that are unused, untagged, or unowned — serve both cost and security objectives. Enforcing tagging standards (owner, environment, data classification) makes these reviews automatable: any resource without required tags is flagged for review and potential remediation.

Key Takeaways — Chapter 14
  • Multi-account structure (AWS Org, Azure Management Groups, GCP Resource Hierarchy) enables centrally enforced security policies that individual account owners cannot bypass
  • Guardrails (SCPs, Azure Policy deny, GCP Org constraints) technically prevent non-compliance; guidelines rely on process and are frequently violated — default to guardrails for security-critical requirements
  • Landing zones give every new account a compliant baseline automatically — without them, compliance is a manual effort that slows growth and creates inconsistency
  • HIPAA BAAs with cloud providers cover specific services only — using non-covered services for PHI is a HIPAA violation regardless of the BAA
  • Untagged and unowned cloud resources are both a cost risk and a security risk — tagging enforcement and regular resource reviews serve both programmes
Chapter 15 · ~14 min · Architecture

Cloud Security Architecture Patterns

Well-Architected security pillar, Zero Trust in the cloud, hub-and-spoke, data perimeter, immutable infrastructure, defence in depth, and migration decision framework

Vendor-NeutralAWSAzureGCP

Security architecture is the practice of designing cloud environments so that security is a structural property of the system rather than a layer bolted on top. An architecture where a misconfigured security group exposes an entire production database to the internet has a security design flaw, regardless of how many other security controls are deployed around it. This final chapter covers the architectural patterns that make cloud environments structurally secure — and the decision framework for migrating existing security controls into cloud-native models.

The Well-Architected Security Pillar

AWS's Well-Architected Framework defines six pillars for cloud architecture excellence. The Security Pillar covers seven design principles that are broadly applicable across all three major providers:

  1. Implement a strong identity foundation — least privilege, MFA everywhere, service identities over long-lived credentials, centralised identity management
  2. Enable traceability — log all actions and changes in real time; integrate with SIEM and alerting
  3. Apply security at all layers — defence in depth: network (WAF, security groups), compute (OS hardening, EDR), application (input validation, auth), data (encryption, DLP)
  4. Automate security best practices — IaC for infrastructure, CSPM for continuous compliance, SOAR for automated response
  5. Protect data in transit and at rest — TLS everywhere, encryption at rest with CMKs for sensitive data
  6. Keep people away from data — service-to-service access over human access to data; break-glass procedures for exceptional direct access
  7. Prepare for security events — IR playbooks, forensic readiness (pre-positioned snapshots, log preservation), regular exercises

Zero Trust in the Cloud

Zero Trust architecture — never trust, always verify; least privilege access; assume breach — is well-aligned with cloud architecture principles. In a cloud context, Zero Trust means:

  • Identity as the perimeter — every request is authenticated and authorised regardless of network origin. Even requests from within the VPC require valid credentials and explicit authorisation.
  • Microsegmentation — security groups, NACLs, and network policies limit blast radius. A compromised workload can only reach resources explicitly permitted by policy.
  • Continuous verification — short-lived credentials (STS tokens, managed identity tokens) that expire and are re-issued force continuous re-authentication. Conditional Access (Azure) and resource-based policy conditions (AWS) enforce contextual verification.
  • Assume breach posture — CSPM, SIEM, and runtime detection operate on the assumption that the perimeter has been or will be breached. The goal is to detect and contain, not merely prevent.

Hub-and-Spoke Network Architecture

Hub-and-spoke is the enterprise cloud networking pattern for centralising shared security services while maintaining workload isolation:

Hub-and-Spoke — Enterprise Cloud Network
Production VPC
Spoke account
Hub / Transit VPC
Network Firewall · WAF · DNS · VPN
Dev VPC
Spoke account
Data VPC
Spoke account
On-premises
Direct Connect / VPN
Security VPC
SIEM · Forensics · IR tooling

The hub account hosts shared security services — AWS Network Firewall, centralised DNS resolution (with DNS Firewall), NAT Gateways, VPN endpoints, and Direct Connect connections. Spoke accounts contain workloads and route all internet-bound and cross-VPC traffic through the hub. This architecture ensures that all traffic is inspected by centralised security controls regardless of which spoke account it originates in.

Data Perimeter

A data perimeter ensures that your data can only be accessed by trusted identities, from trusted networks, via trusted services. The three controls:

  • Identity perimeter — only identities in your AWS Organisation (or Entra ID tenant) can access your data. Enforced via resource-based policy conditions: "aws:PrincipalOrgID": "o-xxxxxxxxxx" on S3 bucket policies, KMS key policies, and SQS queue policies.
  • Network perimeter — access to cloud storage only from within your network (VPC endpoints with endpoint policies). S3 Gateway Endpoint policy that denies access except from your VPC prevents exfiltration to attacker-controlled S3 buckets even if credentials are compromised.
  • Service perimeter — VPC Service Controls (GCP) and AWS resource control policies restrict which services and APIs can interact with your data, preventing data from flowing to untrusted services even by authorized identities.

Immutable Infrastructure

Immutable infrastructure is the practice of never modifying running resources — instead, building a new resource from a known-good definition and replacing the old one. Every deployment produces a fresh instance from a signed, scanned base image. No configuration drift. No accumulated configuration changes applied over the instance's lifetime. No running malware that survived a partial remediation.

The security benefits are significant: immutable infrastructure eliminates the class of persistence attacks that modify running systems, ensures every running resource has a known provenance, and makes "re-image from clean baseline" the default recovery action rather than an exception.

Security Architecture Migration Decision Framework

When migrating existing on-premises security controls to the cloud, each control needs a deliberate decision:

On-Premises ControlCloud DecisionRationale
Hardware firewall / NGFWReplace with cloud-native (Security Groups + Network Firewall / Azure Firewall)Cloud-native scales automatically; hardware firewall becomes a bottleneck and single point of failure
On-premises SIEM (Splunk, QRadar)Lift-and-shift initially; migrate to cloud-native (Sentinel, Security Lake) over timeCloud logs volume and egress costs make sending everything off-cloud expensive; cloud-native SIEMs have native integrations
Hardware HSM for key managementReplace with KMS / Key Vault / Cloud KMS (with CloudHSM/Premium tier for regulatory requirements)Managed KMS provides HSM-backed storage at fraction of hardware HSM cost and operational overhead
Vulnerability scanner (Nessus)Replace with cloud-native (Inspector, Defender CSPM) for cloud resources; retain for on-premisesCloud-native scanners have native integration with cloud asset inventory; external scanners miss ephemeral resources
VPN for remote accessReplace with Zero Trust access (BeyondCorp, Azure AD Application Proxy, AWS Verified Access)VPN grants network access; Zero Trust grants application-level access with continuous verification
DLP applianceReplace with cloud-native (Amazon Macie, Microsoft Purview DLP, GCP DLP API) for cloud dataCloud DLP has native visibility into cloud storage without requiring traffic hairpin
Key Takeaways — Chapter 15
  • The Well-Architected Security Pillar's seven principles — from strong identity through security event preparation — form a practical cloud security design checklist
  • Zero Trust in the cloud means identity as the perimeter, microsegmentation, short-lived credentials, and assume-breach detection posture — not a product purchase
  • Hub-and-spoke centralises shared security services (firewall, DNS, NAT) while maintaining workload isolation in spoke accounts/subscriptions
  • Data perimeter (identity + network + service controls) ensures data only leaves through approved paths — even if credentials are compromised
  • Immutable infrastructure eliminates persistence-based attacks and configuration drift — replace, don't patch, running instances wherever possible