Serverless Architecture
Serverless computing lets you run code without provisioning or managing servers. You upload a function, define a trigger, and the cloud provider handles scaling, patching, and availability. You pay only for the compute time you actually use — down to the millisecond. But "serverless" does not mean there are no servers. It means you do not think about them. The tradeoff is control: you give up control over the execution environment in exchange for zero operational overhead.
The Execution Model
How Lambda Actually Works
The Lambda Lifecycle
What Happens During Init
// Everything OUTSIDE the handler runs during Init (cold start)
import { DynamoDB } from '@aws-sdk/client-dynamodb';
import { SSMClient, GetParameterCommand } from '@aws-sdk/client-ssm';
// These run once during cold start — expensive but cached
const dynamodb = new DynamoDB({ region: 'us-east-1' });
const ssm = new SSMClient({ region: 'us-east-1' });
// Pre-fetch config during init
let dbConfig: DatabaseConfig;
async function initConfig() {
const param = await ssm.send(new GetParameterCommand({
Name: '/myapp/db-config',
WithDecryption: true,
}));
dbConfig = JSON.parse(param.Parameter!.Value!);
}
const configPromise = initConfig();
// The handler runs on every invocation — keep this fast
export async function handler(event: APIGatewayProxyEvent) {
await configPromise; // Ensure init is complete
const result = await dynamodb.getItem({
TableName: 'orders',
Key: { id: { S: event.pathParameters!.id! } },
});
return {
statusCode: 200,
body: JSON.stringify(result.Item),
};
}Cold Starts: Real Numbers
Cold starts are the most debated serverless topic. Here are actual measured numbers.
Cold Start Duration by Runtime
| Runtime | Cold Start (p50) | Cold Start (p99) | Notes |
|---|---|---|---|
| Node.js 20 | 150-300ms | 500-800ms | Fastest for most workloads |
| Python 3.12 | 200-400ms | 600-1000ms | Slightly slower than Node |
| Go | 80-150ms | 200-400ms | Compiled binary, fastest overall |
| Java 21 (plain) | 3-8s | 8-12s | JVM startup is brutal |
| Java 21 (SnapStart) | 200-400ms | 500-800ms | Snapshots JVM state pre-init |
| Rust | 50-100ms | 150-300ms | Compiled, no runtime overhead |
| .NET 8 | 400-800ms | 1-2s | AOT compilation helps |
| .NET 8 (AOT) | 150-300ms | 400-600ms | Ahead-of-time compilation |
What Affects Cold Start Duration
| Factor | Impact | Mitigation |
|---|---|---|
| Package size | +100ms per 10MB | Tree-shake, use layers, exclude dev deps |
| Dependencies loaded | +50-200ms per heavy dependency | Lazy-load, use lighter alternatives |
| VPC attachment | +200-500ms (used to be 10s+) | Now uses Hyperplane ENI, much faster |
| Runtime | 50ms (Rust) to 8s (Java) | Choose runtime wisely for latency-sensitive functions |
| Memory allocated | More memory = proportionally more CPU = faster init | 512MB-1024MB is sweet spot for most |
| Init code complexity | DB connections, SDK init, config fetching | Move to init phase, cache in global scope |
Provisioned Concurrency
For latency-sensitive workloads, provisioned concurrency keeps instances warm.
# serverless.yml — provisioned concurrency config
functions:
api:
handler: src/handler.main
runtime: nodejs20.x
memorySize: 1024
timeout: 30
provisionedConcurrency: 10 # Always 10 warm instances
events:
- http:
path: /orders/{id}
method: get
# Auto-scaling provisioned concurrency
resources:
Resources:
ApiProvisionedConcurrencyTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MinCapacity: 5
MaxCapacity: 50
ResourceId: !Sub "function:${{self:service}}-${{sls:stage}}-api:current"
ScalableDimension: lambda:function:ProvisionedConcurrency
ServiceNamespace: lambda
ApiProvisionedConcurrencyScaling:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: utilization
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ApiProvisionedConcurrencyTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: 0.7 # Scale when 70% of provisioned is in use
PredefinedMetricSpecification:
PredefinedMetricType: LambdaProvisionedConcurrencyUtilizationCost of provisioned concurrency:
| Setting | Monthly Cost (us-east-1) | Explanation |
|---|---|---|
| 10 provisioned, 128MB | ~$35/month | Baseline for always-warm |
| 10 provisioned, 1024MB | ~$280/month | Higher memory = higher cost |
| 50 provisioned, 512MB | ~$700/month | Approaching EC2 cost territory |
Event Sources
Lambda is triggered by events from dozens of AWS services. The event source determines the invocation pattern.
| Pattern | Retry Behavior | Error Handling |
|---|---|---|
| Synchronous | Client retries | Return error to caller |
| Asynchronous | 2 automatic retries, then DLQ | Configure DLQ or on-failure destination |
| Stream/Queue | Retries until record expires or succeeds | Configure bisect on batch failure, max retries |
Event Source Examples
// API Gateway trigger — synchronous
export async function httpHandler(event: APIGatewayProxyEventV2) {
const orderId = event.pathParameters?.id;
const order = await getOrder(orderId);
return {
statusCode: 200,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(order),
};
}
// S3 trigger — asynchronous
export async function s3Handler(event: S3Event) {
for (const record of event.Records) {
const bucket = record.s3.bucket.name;
const key = record.s3.object.key;
// Process uploaded file (resize image, parse CSV, etc.)
await processFile(bucket, key);
}
}
// SQS trigger — queue polling
export async function sqsHandler(event: SQSEvent) {
const failedIds: string[] = [];
for (const record of event.Records) {
try {
const body = JSON.parse(record.body);
await processMessage(body);
} catch (error) {
failedIds.push(record.messageId);
}
}
// Partial batch failure reporting
return {
batchItemFailures: failedIds.map(id => ({
itemIdentifier: id,
})),
};
}
// DynamoDB Streams trigger — change data capture
export async function dynamoStreamHandler(event: DynamoDBStreamEvent) {
for (const record of event.Records) {
if (record.eventName === 'INSERT') {
const newItem = record.dynamodb?.NewImage;
await indexInElasticsearch(newItem);
} else if (record.eventName === 'MODIFY') {
const oldItem = record.dynamodb?.OldImage;
const newItem = record.dynamodb?.NewImage;
await updateElasticsearch(oldItem, newItem);
}
}
}Cost Model
Lambda Pricing (us-east-1, 2026)
| Component | Price | Notes |
|---|---|---|
| Invocations | $0.20 per 1M requests | First 1M free/month |
| Duration | $0.0000166667 per GB-second | Per 1ms granularity |
| Provisioned concurrency | $0.0000041667 per GB-second | For always-warm instances |
| Free tier | 1M invocations + 400,000 GB-seconds/month | Enough for small apps |
Cost Comparison: Lambda vs EC2 vs Fargate
For a REST API handling 10M requests/month, average 200ms execution at 512MB:
Lambda:
Invocations: 10M × $0.20/1M = $2.00
Duration: 10M × 0.2s × 0.5GB × $0.0000166667 = $16.67
Total: $18.67/month
EC2 (t3.medium, reserved 1yr):
Instance: $30.37/month (reserved)
Always running, even at 3am with zero traffic
Total: $30.37/month
Fargate (0.25 vCPU, 0.5GB, 2 tasks):
CPU: 2 × 0.25 × $0.04048/hr × 730hrs = $14.75
Memory: 2 × 0.5 × $0.004445/hr × 730hrs = $3.25
Total: $18.00/monthThe Crossover Point
Lambda is cheapest when:
- Traffic is spiky or unpredictable
- There are long periods of zero traffic
- Functions execute in under 1 second
- You value zero ops over cost optimization
Containers are cheapest when:
- Traffic is steady and predictable
- Functions run for minutes (batch processing)
- You have ops capacity to manage infrastructure
- Traffic exceeds ~100M requests/month
Limitations You Must Know
| Limitation | Value | Impact |
|---|---|---|
| Execution timeout | 15 minutes max | Cannot run long-running processes |
| Memory | 128MB - 10,240MB | CPU scales proportionally with memory |
| Package size | 50MB zipped / 250MB unzipped | Large ML models need container images |
| Container image | 10GB max | Enough for most workloads |
| Ephemeral storage | 512MB - 10GB /tmp | Not persistent between invocations |
| Concurrent executions | 1000 default (can increase) | Burst limit varies by region |
| Payload size | 6MB sync / 256KB async | Large payloads need S3 |
| Environment variables | 4KB total | Use SSM Parameter Store for more |
| Connections | No persistent connections across invocations | DB connection pooling via RDS Proxy |
Connection Pooling Problem and Solution
// Use RDS Proxy for connection pooling
import { RDSDataClient, ExecuteStatementCommand } from '@aws-sdk/client-rds-data';
// Option 1: RDS Data API (HTTP-based, no connection management)
const rdsData = new RDSDataClient({ region: 'us-east-1' });
export async function handler(event: any) {
const result = await rdsData.send(new ExecuteStatementCommand({
resourceArn: process.env.DB_CLUSTER_ARN!,
secretArn: process.env.DB_SECRET_ARN!,
database: 'mydb',
sql: 'SELECT * FROM orders WHERE id = :id',
parameters: [{ name: 'id', value: { stringValue: event.orderId } }],
}));
return result.records;
}Step Functions: Orchestrating Serverless Workflows
For workflows that exceed a single Lambda's 15-minute limit or require complex branching, AWS Step Functions orchestrate multiple Lambdas.
{
"Comment": "Order processing workflow",
"StartAt": "ValidateOrder",
"States": {
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:validate-order",
"Next": "CheckInventory",
"Catch": [{
"ErrorEquals": ["ValidationError"],
"Next": "RejectOrder"
}]
},
"CheckInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:check-inventory",
"Next": "ProcessPayment",
"Catch": [{
"ErrorEquals": ["OutOfStockError"],
"Next": "BackorderNotify"
}]
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:process-payment",
"Retry": [{
"ErrorEquals": ["PaymentRetryableError"],
"IntervalSeconds": 5,
"MaxAttempts": 3,
"BackoffRate": 2.0
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "CancelOrder"
}],
"Next": "FulfillOrder"
},
"FulfillOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:fulfill-order",
"Next": "SendConfirmation"
},
"SendConfirmation": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:send-confirmation",
"End": true
},
"RejectOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:reject-order",
"End": true
},
"BackorderNotify": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:backorder-notify",
"End": true
},
"CancelOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:cancel-order",
"Next": "RefundPayment"
},
"RefundPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456:function:refund-payment",
"End": true
}
}
}When Serverless Beats Containers
| Scenario | Serverless | Containers | Winner |
|---|---|---|---|
| Spiky traffic (0 to 10K req/s) | Auto-scales instantly, $0 at idle | Need min capacity running | Serverless |
| Steady high traffic (100K req/s) | Expensive at scale | Fixed cost, efficient | Containers |
| Event processing (S3, SQS) | Native integration | Need polling infrastructure | Serverless |
| Long-running jobs (> 15 min) | Not possible | No time limit | Containers |
| Startup/MVP | Zero ops, fast iteration | Need K8s/ECS expertise | Serverless |
| WebSocket connections | Limited support | Full control | Containers |
| ML inference (GPU) | No GPU support | GPU instances available | Containers |
| Cron jobs (run 5 min/day) | Pay for 5 min/day | Pay for 24 hours/day | Serverless |
Vendor Lock-In Mitigation
The Lock-In Spectrum
| Component | Lock-In Level | Mitigation |
|---|---|---|
| Function handler interface | Low | Adapter pattern per cloud |
| Event sources (S3, SQS) | High | Abstract behind interfaces |
| Step Functions | Very High | Use Temporal/Conductor instead |
| DynamoDB in function | High | Use abstractions, consider portable DBs |
| IAM/permissions model | Very High | Accept it — security is always cloud-specific |
Portable Function Pattern
// Core business logic — zero cloud dependencies
export class OrderProcessor {
constructor(
private orderRepo: OrderRepository,
private paymentGateway: PaymentGateway,
private notifier: Notifier,
) {}
async processOrder(input: ProcessOrderInput): Promise<ProcessOrderResult> {
const order = await this.orderRepo.findById(input.orderId);
const payment = await this.paymentGateway.charge(order.total);
await this.notifier.sendConfirmation(order.userId, order.id);
return { orderId: order.id, paymentId: payment.id };
}
}
// AWS Lambda adapter
import { APIGatewayProxyHandlerV2 } from 'aws-lambda';
const processor = new OrderProcessor(
new DynamoDBOrderRepository(),
new StripePaymentGateway(),
new SESNotifier(),
);
export const handler: APIGatewayProxyHandlerV2 = async (event) => {
const input = JSON.parse(event.body!);
const result = await processor.processOrder(input);
return { statusCode: 200, body: JSON.stringify(result) };
};
// Google Cloud Functions adapter
import { HttpFunction } from '@google-cloud/functions-framework';
const processor = new OrderProcessor(
new FirestoreOrderRepository(),
new StripePaymentGateway(),
new SendGridNotifier(),
);
export const handler: HttpFunction = async (req, res) => {
const result = await processor.processOrder(req.body);
res.json(result);
};Serverless Architecture Patterns
Pattern 1: API Backend
Pattern 2: Event Processing Pipeline
Pattern 3: Scheduled Jobs
functions:
dailyReport:
handler: src/reports/daily.handler
timeout: 900 # 15 minutes
memorySize: 2048
events:
- schedule:
rate: cron(0 6 * * ? *) # 6am UTC daily
input:
reportType: "daily-summary"
cleanupExpired:
handler: src/maintenance/cleanup.handler
timeout: 300
events:
- schedule:
rate: rate(1 hour)Key Takeaways
- Cold starts matter — choose runtime wisely (Go/Rust < Node.js < Python < Java), use provisioned concurrency for latency-sensitive paths
- Cost model favors spiky workloads — at steady high traffic, containers are cheaper
- 15-minute limit is real — use Step Functions for longer workflows
- Connection pooling is critical — use RDS Proxy or DynamoDB, never open new DB connections per invocation
- Put initialization outside the handler — SDK clients, config fetching, DB connections go in module scope
- Serverless shines for event-driven architectures — native integration with S3, SQS, DynamoDB Streams, EventBridge
- Vendor lock-in is manageable — isolate business logic from cloud-specific adapters
Related Pages
- AWS Lambda — detailed Lambda infrastructure guide
- Cost of Scale — comparing serverless costs at scale
- Event-Driven APIs — event sources for serverless
- Edge Computing — serverless at the edge
- SQS and SNS — queue-based Lambda triggers
- DynamoDB Internals — the serverless database
Real-World Examples
BBC Online
BBC Online serves over 60 million weekly users using a serverless architecture on AWS Lambda. Their content delivery pipeline processes article metadata, generates page variants, and serves personalized content — all without managing a single server. Lambda handles their massive traffic spikes during breaking news events (10-50x normal traffic in minutes), scaling automatically from idle to thousands of concurrent executions.
Coca-Cola
Coca-Cola migrated their vending machine backend to AWS Lambda + API Gateway. Their 1 million+ smart vending machines send telemetry data sporadically — Lambda's pay-per-use model means they pay only for the milliseconds of processing each event requires, rather than maintaining servers 24/7. This reduced their infrastructure costs by over 65% compared to EC2.
iRobot
iRobot uses AWS Step Functions to orchestrate the entire lifecycle of Roomba robot commands. When a user presses "Clean" in the app, a Step Function workflow validates the command, sends it to the robot via IoT Core, monitors execution progress, and handles failures with automatic retries. The visual state machine makes the complex workflow auditable and debuggable without custom orchestration code.
Interview Tip
What to say
"Serverless is ideal for event-driven, spiky workloads where you'd otherwise pay for idle capacity. I'd use Lambda for API endpoints under 50M requests/month, S3-triggered file processing, and cron jobs that run minutes per day. The key constraints are: 15-minute execution limit, cold starts (150-300ms for Node.js, 3-8s for Java without SnapStart), and the connection pooling problem (1000 Lambda instances opening 1000 database connections). I'd mitigate cold starts with provisioned concurrency for latency-sensitive paths, use RDS Proxy for connection pooling, and isolate business logic from the Lambda handler for portability. Beyond 100M requests/month with steady traffic, containers become cheaper — the crossover point matters."