Skip to content
Unverified — AI-generated content. Help verify this page

Security in System Design

Security is not a feature you bolt on after the architecture is done. It is a design constraint that shapes every decision — from where you terminate TLS to how you propagate identity between services. The most common security failures in system design interviews and real systems are not exotic attacks. They are basics: missing authentication on internal endpoints, secrets in environment variables, trusting user input at the wrong boundary, and ignoring the principle of least privilege.

Authentication at Architecture Level

Authentication (AuthN) answers: "Who is this request from?" The architectural question is where and how authentication happens in a distributed system.

Authentication Patterns

Comparison

PatternSecurityPerformanceComplexityBest For
Gateway AuthMediumHighLowSmall teams, < 10 services
Per-Service AuthHighMediumMediumZero-trust environments
Token ExchangeHighestMediumHighLarge orgs, compliance needs

JWT vs Session: Architectural Impact

typescript
// JWT: stateless — any service can validate without calling auth service
// Good for microservices, bad for revocation
interface JWTClaims {
  sub: string;         // User ID
  email: string;
  roles: string[];
  permissions: string[];
  iat: number;         // Issued at
  exp: number;         // Expires at (short-lived: 15-60 min)
}

// Session: stateful — requires session store lookup
// Good for revocation, bad for cross-service propagation
interface SessionLookup {
  sessionId: string;
  // Requires call to Redis/DB to resolve:
  userId: string;
  roles: string[];
  createdAt: Date;
  expiresAt: Date;
}

See our JWT Deep Dive and Session Management for implementation details.

Authorization at Architecture Level

Authorization (AuthZ) answers: "Is this user allowed to do this?" This is harder than authentication and the design choices have deep architectural impact.

Authorization Models

ModelDescriptionBest For
RBACRoles → permissions. User has roles.Simple systems, small permission sets
ABACPolicies evaluate attributes (user, resource, environment)Complex rules, dynamic conditions
ReBACPermissions based on relationships (owner, member, viewer)Social apps, document sharing, multi-tenant

See our RBAC, ABAC, ReBAC comparison.

Where Authorization Lives

typescript
// Layer 1 (Gateway): "Can this user access the Orders API at all?"
// Implemented as middleware
function gatewayAuth(req: Request): boolean {
  return req.user.permissions.includes('orders:read');
}

// Layer 2 (Service): "Can this user access orders for this organization?"
function serviceAuth(req: Request, orgId: string): boolean {
  return req.user.organizations.includes(orgId);
}

// Layer 3 (Domain): "Can this user view THIS specific order?"
function domainAuth(user: User, order: Order): boolean {
  return order.userId === user.id ||
         user.role === 'admin' ||
         user.managedTeams.includes(order.teamId);
}

Centralized Policy Engine

For complex authorization, use a policy engine like OPA (Open Policy Agent) or Cedar:

rego
# OPA policy (Rego language)
package orders

# Allow users to view their own orders
allow {
    input.action == "read"
    input.resource.type == "order"
    input.resource.owner == input.user.id
}

# Allow managers to view orders from their team
allow {
    input.action == "read"
    input.resource.type == "order"
    input.resource.team_id == input.user.team_id
    input.user.role == "manager"
}

# Allow admins to view all orders
allow {
    input.action == "read"
    input.resource.type == "order"
    input.user.role == "admin"
}

See our Policy Engines and Zanzibar pages.

TLS Everywhere

The TLS Architecture

SegmentTLS TypeCertificateWhy
Client → LBTLS 1.3Public CA (Let's Encrypt, ACM)Encrypt user traffic, browser trust
LB → ServicemTLSInternal CAVerify service identity, encrypt internal traffic
Service → ServicemTLSInternal CA (via service mesh)Zero-trust internal network
Service → DatabaseTLSDatabase CA certEncrypt data in transit
Service → External APITLSExternal CAEncrypt outbound traffic

The mistake teams make: TLS at the edge only, plain HTTP internally. This means any compromised internal service can sniff all traffic. Use a service mesh for automatic mTLS between services. See our TLS Handshake page.

Secrets Management

Never store secrets in code, environment variables, or config files. Use a secrets manager.

Secrets Architecture Pattern

typescript
// Application fetches secrets at startup from Vault/AWS Secrets Manager
class SecretProvider {
  private cache: Map<string, { value: string; expiresAt: number }> = new Map();

  async getSecret(name: string): Promise<string> {
    const cached = this.cache.get(name);
    if (cached && cached.expiresAt > Date.now()) {
      return cached.value;
    }

    // Fetch from secrets manager
    const secret = await this.secretsManager.getSecretValue(name);

    // Cache for 5 minutes (secrets rotate, so don't cache forever)
    this.cache.set(name, {
      value: secret,
      expiresAt: Date.now() + 5 * 60 * 1000,
    });

    return secret;
  }
}

// Usage
const dbPassword = await secrets.getSecret('prod/database/password');
const apiKey = await secrets.getSecret('prod/stripe/api-key');

See our Secrets Management section and Vault Deep Dive.

Multi-Layer Rate Limiting

Rate limiting at a single layer is insufficient. Apply it at multiple layers for defense in depth.

LayerRate Limit ByPurposeTool
CDN/WAFIP addressBlock DDoS, brute forceCloudFlare, AWS WAF
API GatewayAPI keyEnforce plan limitsKong, AWS API GW
ApplicationUser IDPrevent abuseRedis + sliding window
DatabaseConnection poolProtect from overloadpgBouncer, connection limits

See our Rate Limiting and Advanced Rate Limiting pages.

Input Validation Boundaries

The key question: where do you validate input? The answer: at every trust boundary.

typescript
// Validation at each boundary

// Gateway: structural validation (OpenAPI schema)
// Rejects: missing fields, wrong types, oversized payloads
const gatewaySchema = {
  type: 'object',
  required: ['email', 'amount'],
  properties: {
    email: { type: 'string', format: 'email', maxLength: 254 },
    amount: { type: 'number', minimum: 0.01, maximum: 999999.99 },
    description: { type: 'string', maxLength: 500 },
  },
  additionalProperties: false,
};

// Service: business validation
function validatePayment(input: PaymentInput, user: User): ValidationResult {
  const errors: string[] = [];

  if (input.amount > user.dailyLimit) {
    errors.push('Amount exceeds daily limit');
  }

  if (user.accountStatus !== 'active') {
    errors.push('Account is not active');
  }

  // Sanitize for storage
  input.description = sanitizeHtml(input.description);

  return { valid: errors.length === 0, errors };
}

// Database: last line of defense
// CREATE TABLE payments (
//   amount NUMERIC(10,2) CHECK (amount > 0),
//   user_id UUID REFERENCES users(id),
//   status VARCHAR(20) CHECK (status IN ('pending', 'completed', 'failed'))
// );

OWASP Top 10 in System Design

How the OWASP Top 10 maps to architectural decisions:

OWASP RiskArchitectural Mitigation
A01: Broken Access ControlAuthZ at every layer, least privilege, deny-by-default
A02: Cryptographic FailuresTLS everywhere, secrets manager, encrypt at rest, strong hashing
A03: InjectionParameterized queries, input validation at trust boundaries
A04: Insecure DesignThreat modeling before coding, abuse case analysis
A05: Security MisconfigurationInfrastructure as code, security scanning in CI/CD, hardened defaults
A06: Vulnerable ComponentsDependency scanning (Snyk/Dependabot), SBOM, update policy
A07: Auth FailuresMFA, account lockout, brute force protection, secure session management
A08: Data IntegrityCode signing, CI/CD pipeline security, SBOM verification
A09: Logging & MonitoringCentralized logging, alerting on auth failures, audit trail
A10: SSRFAllowlist outbound URLs, no internal URL access from user input

See our OWASP section for detailed coverage of each risk.

Threat Modeling for Architects

Threat modeling is the practice of systematically identifying security risks before you build. Use STRIDE:

ThreatDefinitionExampleMitigation
SpoofingPretending to be someone elseForged JWT tokenToken validation, mTLS
TamperingModifying data in transitMan-in-the-middleTLS, request signing
RepudiationDenying an action"I never made that transfer"Audit logs, digital signatures
Information DisclosureExposing sensitive dataAPI returns password hashesField-level access control
Denial of ServiceMaking system unavailableFlood of requestsRate limiting, WAF
Elevation of PrivilegeGaining unauthorized accessVertical privilege escalationLeast privilege, AuthZ checks

Threat Model Example: Payment API

FlowThreatSTRIDEMitigation
1Stolen API key used to make chargesSAPI key + JWT, rate limit per key
1Modified payment amount in transitTTLS, server-side amount calculation
2Replayed payment requestSIdempotency keys, nonce
3Stripe API key exposedISecrets manager, key rotation
4Payment record tamperedTDatabase audit log, checksums
5Message queue poisoningTMessage signing, schema validation
6Notification sent to wrong userIVerify recipient matches payment

Security Architecture Checklist

Authentication and Authorization

  • [ ] AuthN at the edge (gateway or first service)
  • [ ] AuthZ at every service boundary
  • [ ] Short-lived tokens (15-60 min JWTs)
  • [ ] Token refresh mechanism
  • [ ] Service-to-service identity (mTLS or service tokens)

Data Protection

  • [ ] TLS 1.3 for all external traffic
  • [ ] mTLS for all internal service-to-service traffic
  • [ ] Encryption at rest for all databases
  • [ ] Secrets in Vault/Secrets Manager (not env vars)
  • [ ] PII encrypted with envelope encryption

Network Security

  • [ ] WAF in front of all public endpoints
  • [ ] Rate limiting at CDN, gateway, and service levels
  • [ ] Network segmentation (services cannot reach what they do not need)
  • [ ] VPC/private networking for databases
  • [ ] Egress filtering (services cannot call arbitrary external URLs)

Monitoring and Incident Response

  • [ ] Audit logging for all auth events
  • [ ] Alert on authentication failures (brute force detection)
  • [ ] Alert on authorization failures (privilege escalation attempts)
  • [ ] Centralized security event logging
  • [ ] Incident response runbook

"What I cannot create, I do not understand." — Richard Feynman