Common Optimisation Patterns

The following patterns emerge frequently when running AI Recommendations against Lambda-based API architectures. They are presented in the order they are typically applied.

Pattern 1 — Add CloudFront for API Caching

Trigger: High API Gateway RPS with repetitive request patterns, or latency budget pressure on read-heavy endpoints.

What to do: Accept the Add CloudFront recommendation. Configure the CloudFront distribution in the Node Configuration panel with appropriate cache TTLs per endpoint type. Re-simulate — you should see API Gateway RPS drop and estimated cost reduce.

Why it works: CloudFront caches responses at edge locations globally. For a recommendation API where many users request the same popular items, a high percentage of requests can be served from the edge without ever reaching API Gateway or Lambda. This reduces both latency (edge is geographically closer to the user) and cost (fewer Lambda invocations).

Pattern 2 — Enable Lambda Provisioned Concurrency

Trigger: Lambda latency spikes visible under Spike traffic pattern; cold start behaviour under burst load; Lambda cold start error rate above 1–2% at target RPS.

What to do: Accept the Enable Lambda Provisioned Concurrency recommendation. In the Lambda Node Configuration, set provisioned concurrency to a value that covers your expected burst floor — typically 20–30% of your peak concurrency. Re-simulate with a Spike pattern and confirm that p99 latency stabilises.

Why it works: Provisioned concurrency pre-initialises execution environments, keeping them warm and ready to respond immediately. Without it, Lambda must provision a new environment for each concurrent request that exceeds the warm pool — this provisioning time is the cold start. At 10,000 RPS with provisioned concurrency of 1,000, the first 1,000 concurrent requests are handled instantly; requests beyond that trigger cold starts.

Restart required after concurrency changes

After changing Lambda concurrency settings, stop and restart the simulation rather than resuming. Concurrency changes are applied at simulation initialisation — resuming a paused run will not reflect the updated warm pool size.

Account concurrency limit

The default Lambda concurrency limit per region per AWS account is 1,000 concurrent executions. Up to 900 can be reserved across all functions (100 is held as unreserved headroom). If your architecture requires more than 1,000 concurrent executions, you must request a quota increase from AWS — this cannot be simulated away.

Pattern 3 — Introduce Circuit Breaker Pattern

Trigger: Lambda calling downstream services (RDS, DynamoDB, third-party APIs) without failure isolation; cascading failure risk when a downstream service degrades.

What to do: Accept the Introduce Circuit Breaker recommendation. This adds protective logic that stops cascading failures when a downstream service degrades. Verify the circuit breaker thresholds in the Lambda configuration match your downstream SLAs.

Why it works: Without a circuit breaker, if DynamoDB is throttling or a downstream API is slow, Lambda functions will queue up waiting for responses, consuming concurrency and eventually failing in bulk. A circuit breaker detects the downstream degradation and fails fast, freeing concurrency for requests that can succeed.

Pattern 4 — Implement Asynchronous Processing via SQS

Trigger: Synchronous Lambda invocations on write paths; Lambda timeout risk when calling slow downstream services; Lambda throttling at target RPS that cannot be resolved by increasing concurrency.

What to do: Accept the Implement Asynchronous Processing recommendation. SQS is added between API Gateway and Lambda, decoupling the write path. Adjust the SQS visibility timeout in Node Configuration to exceed your Lambda timeout with margin.

Why it works: SQS absorbs incoming requests into a queue, smoothing the request rate that Lambda sees. API Gateway can immediately acknowledge the incoming request (returning a 202 Accepted), while Lambda consumes from the queue at its own pace. This eliminates the direct concurrency pressure on Lambda and prevents request drops during traffic spikes.

Visibility timeout rule

The SQS visibility timeout controls how long a message is hidden from other consumers while being processed. Set it to at least 1.5× your Lambda function's timeout. If Lambda takes up to 30 seconds to process a message, set visibility timeout to at least 45 seconds to prevent the same message being delivered twice.

Pattern 5 — Configure Lambda Auto Scaling

Trigger: Concurrency utilisation approaching 100% under Ramp or Spike tests; risk of throttling at burst; architecture needs to handle variable load without manual concurrency management.

What to do: Accept the Configure Auto Scaling recommendation. Set concurrency scaling targets in Lambda Node Configuration to keep utilisation within a safe operating range — typically 60–70% of reserved concurrency — at your expected peak RPS.

Pattern 6 — Address API Gateway as a Throttle Bottleneck

Trigger: API Gateway throttling that persists even after Lambda concurrency has been addressed; target RPS approaching or exceeding API Gateway's per-account limits; API Gateway identified as the sole remaining bottleneck at very high RPS (100K+).

What to do: This pattern requires combining multiple changes:

Add CloudFront before API Gateway (if not already present). CloudFront absorbs cacheable requests before they reach API Gateway, reducing the effective RPS that API Gateway must handle.
Switch the write path to async processing. Move from synchronous Lambda invocation (API Gateway → Lambda) to async buffering (API Gateway → SQS → Lambda). API Gateway's role becomes accepting requests into the queue rather than waiting for Lambda to respond — this dramatically reduces API Gateway's concurrency exposure.
Increase the API Gateway throttle limit in the Stage settings (Rate limit per second). This is a soft quota that can be increased via AWS Support.

Why API Gateway is the hardest bottleneck to fix

Unlike Lambda concurrency (which can be increased by provisioning) or DynamoDB (which scales on on-demand mode), API Gateway's request limits are account-level quotas with fixed defaults. At very high RPS — 100K+ per second — even a well-tuned API Gateway will throttle, and the correct architectural response is to reduce the synchronous load it carries, not simply to increase its limit.

Trigger: Architecture needs to process events in multiple AWS regions simultaneously; a single SQS queue is becoming a throughput bottleneck; active-active multi-region processing is required.

What to do: Place an SNS topic between your API Gateway (or other event source) and your SQS queues. Configure each regional SQS queue as a subscription to the SNS topic. SNS will fan out each incoming message to all subscribed queues simultaneously.

Example topology:

API Gateway → SNS Topic
                → SQS Queue (Region 1) → Lambda (Region 1) → DynamoDB (Region 1)
                → SQS Queue (Region 2) → Lambda (Region 2) → DynamoDB (Region 2)

Configuration considerations:

Use standard SNS topics for recommendation and analytics workloads where strict message ordering is not required. Use FIFO topics only when strict ordering must be preserved — FIFO adds cost and reduces maximum throughput.
Configure SNS subscription filter policies on each SQS subscription if your downstream Lambda needs to filter messages (e.g. only processing events relevant to a specific region or segment). Filtering at SNS is cheaper and cleaner than filtering inside Lambda.
The SQS visibility timeout must exceed your Lambda function's timeout on both regional queues.

Pattern 1 — Add CloudFront for API Caching​

Pattern 2 — Enable Lambda Provisioned Concurrency​

Pattern 3 — Introduce Circuit Breaker Pattern​

Pattern 4 — Implement Asynchronous Processing via SQS​

Pattern 5 — Configure Lambda Auto Scaling​

Pattern 6 — Address API Gateway as a Throttle Bottleneck​

Pattern 7 — SNS Fan-Out for Multi-Region Async Processing​