IaC RetailPlatform Infrastructure Narrative
← Back to Public Papers
Infrastructure Narrative · CloudFormation 7-Stack IaC

RetailPlatform
Infrastructure Narrative

A fully serverless AWS e-commerce platform from browser DNS resolution to analytics warehouse, with zero server management and cost that scales precisely with demand.

Author Rus Teston
Date June 8, 2026
Project RetailPlatform CloudFormation
Reference rus-teston.com/projects/03-public-papers/cloudformation/
8
Request Flow Steps
7
CF Stacks
$369
Est. Monthly Cost
100%
Serverless Compute
Customer request flow. Click any step to jump to that section.
The problem

The Business Problem

A mid-sized retail company needs a cloud platform that can handle their e-commerce storefront, process orders and inventory in real time, protect customer data, scale automatically during peak events like Black Friday, and give the business analytics visibility into what is happening across all of it, without managing servers.

Request flow - step by step
1
The Customer's Browser
Route 53 · Global Accelerator · CloudFront Entry Point

A shopper types store.example.com into their browser. DNS is managed by Amazon Route 53, which resolves the domain to an AWS Global Accelerator anycast IP and returns a CloudFront distribution alias. The browser connects to the nearest CloudFront edge location, which could be hundreds of miles closer than the actual data center.

Route 53 Global Accelerator CloudFront
2
The Security Perimeter
AWS WAF · 5 Managed Rule Sets · Edge Enforcement

Before the request reaches any application code, AWS WAF evaluates it against five managed rule sets running at the CloudFront edge. Malicious traffic is dropped here. It never reaches the application.

  • IP Reputation List: Blocks known malicious actors, Tor exit nodes, and botnets before they consume any compute.
  • OWASP Common Rule Set: Catches SQL injection, cross-site scripting, and path traversal attempts.
  • Known Bad Inputs: Blocks Log4Shell exploit strings and SSRF probes.
  • SQLi Rule Set: Adds extra protection specifically for product search and order endpoints.
  • Rate Limiter: Blocks any single IP sending more than 2,000 requests per 5-minute window, protecting against scraping and credential stuffing during sales events.
AWS WAF IP Reputation OWASP Rules SQLi Protection Rate Limiting
3
CloudFront CDN
Cache Behaviors · Origin Verification · 80% Load Reduction

Clean requests are evaluated by CloudFront's cache behavior rules. CloudFront injects an X-Origin-Verify secret header on every request it forwards to the origin. This is the handshake that proves the request came through CloudFront and not directly from the internet.

  • /static/*: CSS, JavaScript bundles, and web fonts are served from the edge cache with one-year headers. The origin is never hit after the first visitor warms the cache.
  • /images/*: Product photos and category banners are cached at the edge, cutting origin load by roughly 80% during flash sales.
  • /api/*: Product search, cart operations, and checkout bypass the cache entirely, forwarded with all cookies, headers, and query parameters intact for personalized responses.
  • All other requests: Storefront HTML and account pages are forwarded uncached so ECS containers can render personalized content.
CloudFront Cache Behaviors X-Origin-Verify Edge Cache ACM / TLS
4
The Application Load Balancer
Header Validation · TLS Termination · HTTP/2 · ECS Target Group

Dynamic requests arrive at the Application Load Balancer sitting in the public subnets of the VPC. The ALB's HTTPS listener checks every incoming request for the X-Origin-Verify header. Any request missing it, meaning someone trying to bypass CloudFront and hit the ALB directly, receives an immediate HTTP 403. This completely closes the ALB to direct internet access.

Valid requests from CloudFront are forwarded to the ECS Fargate target group. The ALB terminates TLS using an ACM-managed certificate so application containers never handle encryption directly. HTTP/2 multiplexing is enabled for efficient connection handling from CloudFront's persistent origin connections.

Application Load Balancer ACM Certificate HTTP/2 HTTP 403 Guard Public Subnets
5
ECS Fargate Application Tier
Private Subnets · 2–20 Task Auto Scaling · Secrets Manager · Bedrock AI

The request lands on one of the running ECS Fargate tasks in the private application subnets. These subnets have no public IP addresses and no direct internet path. Inbound traffic comes only from the ALB and outbound traffic routes through the NAT Gateways.

The ECS service maintains a minimum of two tasks spread across two Availability Zones. If CPU utilization climbs above 70% or ALB requests per task exceed the configured threshold, Application Auto Scaling launches additional tasks automatically, scaling from 2 to 20 in minutes. When traffic subsides, it scales back down.

The application container retrieves its Aurora database credentials at startup from AWS Secrets Manager and injected as environment variables by the ECS agent, so the application code never handles credential management. The task role grants precisely the permissions it needs and nothing more: write to the data lake, publish to EventBridge, query DynamoDB, and call Bedrock for AI-powered product recommendations.

ECS Fargate Auto Scaling Secrets Manager Multi-AZ Bedrock AI NAT Gateway Cloud Map
6
The Data Tier
ElastiCache Redis · Aurora Serverless v2 · 0.5–16 ACUs · KMS Encryption

For read-heavy operations such as product listings, search results, and user session data, the application checks ElastiCache Serverless Redis first. ElastiCache sits in the isolated private data subnets alongside Aurora, accessible only from the application and Lambda security groups. A cache hit returns data in under a millisecond and never touches Aurora at all.

On a cache miss, the application queries the Aurora Serverless v2 PostgreSQL cluster. Read operations including product catalog browsing and inventory availability checks go to the Aurora reader endpoint, which distributes load across the reader instance. Write operations including order creation, payment processing, and inventory updates go to the writer endpoint, ensuring strong consistency.

Aurora Serverless v2 scales compute automatically between 0.5 and 16 ACUs based on actual load. During a flash sale it scales up in 5-second increments. At 3am it scales down to minimum capacity. The database costs exactly what the workload requires at any given moment.

All data at rest including Aurora, ElastiCache, S3, SQS, CloudWatch Logs, and Secrets Manager is encrypted with a single KMS Customer Managed Key that rotates annually. The KMS key is the cryptographic root of trust for the entire platform.

ElastiCache Serverless Aurora Serverless v2 PostgreSQL KMS CMK S3 AWS Backup
7
The Event-Driven Tier
EventBridge · SQS + DLQ · Lambda · SNS Fanout · 90-Day Archive

When a customer places an order, the ECS application publishes an OrderPlaced event to the RetailEventBus, a custom Amazon EventBridge event bus dedicated to retail domain events. This is the architectural moment where the synchronous customer-facing request ends and asynchronous processing begins. The customer gets an immediate confirmation while the platform processes the order in the background.

EventBridge evaluates the event against three routing rules simultaneously:

  • Order Processing Rule: Matches OrderPlaced and PaymentProcessed events. The order processor Lambda picks up the message, writes the order to Aurora, and archives the event payload to the S3 data lake.
  • Inventory Update Rule: Matches InventoryReserved events. The inventory handler Lambda decrements available stock in Aurora and invalidates the affected product's cache entry in ElastiCache so the next shopper sees accurate availability.
  • Customer Notification Rule: Matches OrderPlaced and OrderShipped events. The notification Lambda publishes to the SNS order notifications topic, which fans out to email confirmations, SMS shipping alerts, or push notifications.

Each SQS queue has a dead-letter queue. If a Lambda fails to process a message after three retries, the message moves to the DLQ and a CloudWatch alarm fires immediately, giving the team 14 days to investigate and replay. The EventBridge archive retains all retail domain events for 90 days, enabling full state reconstruction after any outage or discovered bug.

EventBridge SQS Dead-Letter Queue Lambda SNS Event Archive CloudWatch Alarms
8
The Analytics Tier
S3 Data Lake · Glue ETL · Athena · Redshift Serverless · CloudWatch Dashboard

In parallel with all customer-facing activity, event payloads are continuously written to the S3 data lake under prefixes organized by domain: raw/orders/, raw/inventory/, raw/products/. The S3 bucket has EventBridge notifications enabled so every new object PUT triggers an event on the default bus.

The crawler trigger Lambda starts the appropriate AWS Glue crawler for the affected prefix. The crawler inspects new JSON files, infers the schema, and updates table definitions in the Glue Data Catalog automatically, without any manual schema management.

A daily Glue ETL job reads the raw JSON files, applies type casting and field normalization, and writes compressed Parquet files to the curated/ prefix partitioned by date. These Parquet files feed two distinct query paths:

  • Amazon Athena: Queries curated Parquet directly from S3 using the Glue Data Catalog as its schema source. Analysts write standard SQL and pay only for the data scanned, typically fractions of a cent per query.
  • Redshift Serverless: Provides a full SQL data warehouse for complex aggregations, multi-table joins, and BI tool connections. Enhanced VPC routing ensures all data transfer flows through the S3 Gateway VPC endpoint, never touching the public internet, and never incurring NAT Gateway charges.

The CloudWatch operational dashboard brings all visibility into a single view covering ECS task health, ALB request rates, Aurora connections, ElastiCache hit rates, SQS queue depths, DLQ message counts, and Lambda error rates.

S3 Data Lake AWS Glue Glue Data Catalog Amazon Athena Redshift Serverless CloudWatch Dashboard Parquet
Architecture benefits

What This Architecture Delivers

⚙️
Zero Server Management
ECS Fargate, Aurora Serverless, ElastiCache Serverless, Lambda, and Redshift Serverless all scale without patching, sizing, or capacity planning.
🔒
Defense in Depth
WAF at the edge, security group isolation between every tier, KMS encryption on every data store, and no public IPs on application or data resources.
🛡️
Resilience by Default
Multi-AZ across every tier, Application Auto Scaling on compute, automated failover on Aurora, and dead-letter queues on every event stream.
📉
Cost That Matches Demand
Serverless tiers scale to zero when idle. CloudFront absorbs static asset load so origin compute stays small. Data lake storage automatically tiers to cheaper storage classes over time.
📋
Full Auditability
VPC Flow Logs, CloudTrail, WAF logs, EventBridge archive, and DLQ replay give the operations team complete visibility into everything that happened, with the ability to reconstruct any state.
Financial

Monthly Cost Estimate

Estimated for light-to-moderate traffic. Serverless tiers scale with actual usage. Costs rise during peak events and fall during off-hours.

Stack Primary Services Monthly Cost
NetworkingNAT Gateways, VPC Flow Logs~$70
SecurityKMS CMK, Secrets Manager~$2
IngressWAF, CloudFront, ALB~$42
ApplicationECS Fargate, ECR, CloudWatch~$34
DataAurora, ElastiCache, S3, Backup~$145
EventsEventBridge, SQS, Lambda, SNS~$3
AnalyticsGlue, Redshift Serverless, CloudWatch~$73
Total ~$369 / month
Deployment

CloudFormation Stack Reference

File Stack Key Resources
01-networking.yamlNetworkingVPC, Subnets, NAT Gateways, VPC Endpoints, Flow Logs
02-security.yamlSecurityKMS CMK, Security Groups, Secrets Manager
03-ingress.yamlIngressWAF, CloudFront, ALB, ACM, Route 53
04-application.yamlApplicationECS Fargate, ECR, Auto Scaling, Cloud Map
05-data.yamlDataAurora Serverless v2, ElastiCache, S3, AWS Backup
06-events.yamlEventsEventBridge, SQS, Lambda, SNS
07-analytics.yamlAnalyticsGlue, Redshift Serverless, CloudWatch Dashboard

Deploy order: Stacks must be deployed in sequence. Each stack imports outputs from its predecessors via CloudFormation cross-stack references.

1
2
3
4
5
6
7

Required capability: --capabilities CAPABILITY_NAMED_IAM on all stacks.

Stack 3 note: Two-phase deployment: WAF scope to us-east-1 first, then CDN scope to primary region second.