Skip to content
Unverified — AI-generated content. Help verify this page

Design an API Gateway

1. Problem Statement & Requirements

Functional Requirements

#RequirementDetails
FR-1Request RoutingRoute requests to appropriate backend services based on path, headers, method
FR-2Rate LimitingEnforce per-client, per-API, and global rate limits
FR-3AuthenticationValidate JWT tokens, API keys, OAuth 2.0
FR-4AuthorizationEnforce RBAC/ABAC policies per endpoint
FR-5Request TransformationRewrite paths, add/remove headers, transform payloads
FR-6Response TransformationFilter fields, rename fields, aggregate responses
FR-7Circuit BreakingDetect failing services and stop routing to them
FR-8Load BalancingDistribute traffic across service instances
FR-9Service DiscoveryDynamically discover backend service instances
FR-10ObservabilityLogging, metrics, distributed tracing

Non-Functional Requirements

#RequirementTarget
NFR-1Latency< 5ms added latency (P99)
NFR-2Throughput100K+ requests/second per instance
NFR-3Availability99.999% uptime
NFR-4ScalabilityHorizontal scaling, stateless instances
NFR-5ExtensibilityPlugin architecture for custom logic
NFR-6SecurityTLS termination, DDoS protection, WAF

2. Back-of-Envelope Estimation

Traffic Scale

Total API Requests/Day=10BAverage QPS=10B86,400115,740 req/sPeak QPS=115,740×5=578,700 req/s

Gateway Instance Count

Capacity per Instance=50,000 req/sMin Instances=578,700/50,000=12 instancesWith 50% headroom=12×1.5=18 instances

Latency Budget

Total Added Latency Budget=5 ms (P99)
ComponentBudget
TLS termination0.5 ms
Authentication (JWT validation)0.5 ms
Rate limit check0.2 ms
Routing decision0.1 ms
Request transformation0.2 ms
Proxy to backend2.0 ms
Response transformation0.2 ms
Logging + metrics0.3 ms
Network overhead1.0 ms

Rate Limiting Data

Unique API Clients=10MRate Limit Entry Size=50 bytes (client ID + counter + timestamp)Rate Limit Storage=10M×50 B=500 MB (fits in Redis)

3. High-Level Design

Architecture Diagram

Request Processing Pipeline

API Design (Gateway Configuration)

typescript
// Route Configuration API
POST /admin/routes
     // Body: { path: "/api/v1/users/*", serviceId: "user-service",
     //         methods: ["GET", "POST"], stripPrefix: true }
GET    /admin/routes
PUT    /admin/routes/{routeId}
DELETE /admin/routes/{routeId}

// Rate Limit Configuration
POST /admin/rate-limits
     // Body: { name: "default", limit: 1000, window: 60, key: "client_id" }
GET  /admin/rate-limits

// Service Registration
POST /admin/services
     // Body: { name: "user-service", instances: ["10.0.1.1:8080", "10.0.1.2:8080"],
     //         healthCheck: { path: "/health", interval: 10 } }

// Plugin Configuration
POST /admin/plugins
     // Body: { name: "jwt-auth", config: { secret: "...", algorithm: "RS256" },
     //         routes: ["user-routes"] }

// Observability
GET /admin/metrics
GET /admin/health
GET /admin/routes/{routeId}/stats

4. Database Schema (Configuration Store)

Routes Table

sql
CREATE TABLE routes (
    route_id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(100) UNIQUE NOT NULL,
    path_pattern    VARCHAR(500) NOT NULL,   -- "/api/v1/users/*" or regex
    methods         VARCHAR(10)[] NOT NULL,   -- ['GET', 'POST', 'PUT']
    service_id      UUID NOT NULL REFERENCES services(service_id),
    strip_prefix    BOOLEAN DEFAULT TRUE,
    priority        INT DEFAULT 0,           -- Higher = matched first
    timeout_ms      INT DEFAULT 30000,
    retry_count     INT DEFAULT 1,
    retry_on        VARCHAR(20)[] DEFAULT ARRAY['5xx', 'reset', 'connect-failure'],
    enabled         BOOLEAN DEFAULT TRUE,
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_routes_path ON routes(path_pattern);
CREATE INDEX idx_routes_priority ON routes(priority DESC);

Services Table

sql
CREATE TABLE services (
    service_id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(100) UNIQUE NOT NULL,
    protocol        VARCHAR(10) DEFAULT 'http',  -- http, https, grpc
    discovery_type  VARCHAR(20) DEFAULT 'static', -- static, consul, kubernetes
    instances       JSONB, -- [{"host": "10.0.1.1", "port": 8080, "weight": 100}]
    health_check    JSONB, -- {"path": "/health", "interval": 10, "timeout": 5}
    lb_algorithm    VARCHAR(20) DEFAULT 'round_robin',
    -- round_robin, weighted_round_robin, least_connections, ip_hash
    circuit_breaker JSONB, -- {"threshold": 5, "timeout": 30, "halfOpen": 3}
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

Rate Limit Rules Table

sql
CREATE TABLE rate_limit_rules (
    rule_id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(100) NOT NULL,
    route_id        UUID REFERENCES routes(route_id), -- NULL = global
    key_type        VARCHAR(20) NOT NULL,
    -- client_id, ip, api_key, user_id, header:{name}
    limit_count     INT NOT NULL,           -- Max requests
    window_seconds  INT NOT NULL,           -- Time window
    burst           INT,                    -- Max burst (token bucket)
    action          VARCHAR(20) DEFAULT 'reject', -- reject, queue, throttle
    enabled         BOOLEAN DEFAULT TRUE,
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

API Keys Table

sql
CREATE TABLE api_keys (
    key_id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    key_hash        VARCHAR(64) UNIQUE NOT NULL, -- SHA-256 of API key
    key_prefix      VARCHAR(8) NOT NULL,  -- First 8 chars for identification
    client_id       UUID NOT NULL,
    client_name     VARCHAR(100),
    scopes          TEXT[],               -- ['read:users', 'write:orders']
    rate_limit_tier VARCHAR(20) DEFAULT 'standard', -- free, standard, premium
    expires_at      TIMESTAMPTZ,
    enabled         BOOLEAN DEFAULT TRUE,
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    last_used_at    TIMESTAMPTZ
);

CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
CREATE INDEX idx_api_keys_client ON api_keys(client_id);

5. Detailed Component Design

5.1 Request Routing

The router matches incoming requests to backend services based on configurable rules.

typescript
interface RouteRule {
  routeId: string;
  pathPattern: string;
  methods: string[];
  serviceId: string;
  priority: number;
  headers?: Record<string, string>;
  queryParams?: Record<string, string>;
  predicates?: RoutePredicate[];
  filters?: RouteFilter[];
  stripPrefix: boolean;
  timeout: number;
}

class RequestRouter {
  private routes: RouteRule[] = [];
  private compiledRoutes: CompiledRoute[] = [];

  // Load routes from config store (cached, refreshed periodically)
  async loadRoutes(): Promise<void> {
    const routes = await this.configStore.getRoutes();
    this.routes = routes.sort((a, b) => b.priority - a.priority);
    this.compiledRoutes = routes.map(r => this.compileRoute(r));
  }

  resolve(request: IncomingRequest): RouteMatch | null {
    for (const route of this.compiledRoutes) {
      if (this.matches(route, request)) {
        return {
          route: route.original,
          pathParams: route.extractParams(request.path),
          targetPath: route.original.stripPrefix
            ? this.stripPrefix(request.path, route.original.pathPattern)
            : request.path,
        };
      }
    }
    return null; // 404
  }

  private matches(route: CompiledRoute, request: IncomingRequest): boolean {
    // Method check
    if (!route.original.methods.includes(request.method)) return false;

    // Path check
    if (!route.pathRegex.test(request.path)) return false;

    // Header check
    if (route.original.headers) {
      for (const [header, value] of Object.entries(route.original.headers)) {
        if (request.headers[header.toLowerCase()] !== value) return false;
      }
    }

    // Custom predicates
    if (route.original.predicates) {
      for (const predicate of route.original.predicates) {
        if (!this.evaluatePredicate(predicate, request)) return false;
      }
    }

    return true;
  }

  private compileRoute(route: RouteRule): CompiledRoute {
    // Convert path pattern to regex
    // /api/v1/users/{userId}/orders -> /api/v1/users/([^/]+)/orders
    const regexStr = route.pathPattern
      .replace(/\{(\w+)\}/g, '(?<$1>[^/]+)')  // Named params
      .replace(/\*/g, '.*');                     // Wildcards

    return {
      original: route,
      pathRegex: new RegExp(`^${regexStr}$`),
      extractParams: (path: string) => {
        const match = new RegExp(`^${regexStr}$`).exec(path);
        return match?.groups ?? {};
      },
    };
  }

  // A/B routing: split traffic between versions
  async resolveWithTrafficSplitting(
    request: IncomingRequest
  ): Promise<RouteMatch> {
    const route = this.resolve(request);
    if (!route) throw new NotFoundError();

    // Check for canary/blue-green configuration
    const splitConfig = await this.getTrafficSplit(route.route.routeId);
    if (splitConfig) {
      const hash = this.hashRequest(request); // Deterministic per client
      const bucket = hash % 100;

      if (bucket < splitConfig.canaryPercentage) {
        route.route.serviceId = splitConfig.canaryServiceId;
      }
    }

    return route;
  }
}

5.2 Rate Limiting

Token Bucket Algorithm:

Tokens=min(capacity,tokens+rate×Δt)
typescript
class TokenBucketRateLimiter {
  // Uses Redis for distributed rate limiting
  async isAllowed(key: string, rule: RateLimitRule): Promise<RateLimitResult> {
    const now = Date.now();
    const redisKey = `ratelimit:${rule.name}:${key}`;

    // Lua script for atomic token bucket in Redis
    const script = `
      local key = KEYS[1]
      local capacity = tonumber(ARGV[1])
      local rate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])
      local requested = tonumber(ARGV[4])

      local data = redis.call('HMGET', key, 'tokens', 'last_refill')
      local tokens = tonumber(data[1])
      local last_refill = tonumber(data[2])

      if tokens == nil then
        tokens = capacity
        last_refill = now
      end

      -- Refill tokens based on elapsed time
      local elapsed = (now - last_refill) / 1000.0
      tokens = math.min(capacity, tokens + elapsed * rate)

      local allowed = false
      if tokens >= requested then
        tokens = tokens - requested
        allowed = true
      end

      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)

      return {allowed and 1 or 0, math.floor(tokens), capacity}
    `;

    const [allowed, remaining, limit] = await this.redis.eval(
      script, 1, redisKey,
      rule.limit_count,                          // capacity
      rule.limit_count / rule.window_seconds,    // rate (tokens per second)
      now,                                        // current time
      1                                           // tokens requested
    ) as [number, number, number];

    return {
      allowed: allowed === 1,
      remaining,
      limit,
      resetAt: new Date(now + (rule.window_seconds * 1000)),
      headers: {
        'X-RateLimit-Limit': limit.toString(),
        'X-RateLimit-Remaining': remaining.toString(),
        'X-RateLimit-Reset': Math.ceil((now + rule.window_seconds * 1000) / 1000).toString(),
      },
    };
  }
}

// Sliding Window Counter: combines fixed window with interpolation
class SlidingWindowCounter {
  async isAllowed(key: string, limit: number, windowSec: number): Promise<boolean> {
    const now = Date.now();
    const currentWindow = Math.floor(now / (windowSec * 1000));
    const previousWindow = currentWindow - 1;
    const windowProgress = (now % (windowSec * 1000)) / (windowSec * 1000);

    const currentKey = `rate:${key}:${currentWindow}`;
    const previousKey = `rate:${key}:${previousWindow}`;

    const [currentCount, previousCount] = await this.redis.mget(currentKey, previousKey);
    const curr = parseInt(currentCount ?? '0');
    const prev = parseInt(previousCount ?? '0');

    // Weighted count: previous window * remaining portion + current window
    const estimatedCount = prev * (1 - windowProgress) + curr;

    if (estimatedCount >= limit) {
      return false;
    }

    // Increment current window
    const pipe = this.redis.pipeline();
    pipe.incr(currentKey);
    pipe.expire(currentKey, windowSec * 2);
    await pipe.exec();

    return true;
  }
}

Multi-tier rate limiting:

typescript
class MultiTierRateLimiter {
  private tiers: RateLimitTier[] = [
    { name: 'global',     key: () => 'global',           limit: 1_000_000, window: 60 },
    { name: 'per-api',    key: (r) => r.path,            limit: 100_000,   window: 60 },
    { name: 'per-client', key: (r) => r.clientId,        limit: 1_000,     window: 60 },
    { name: 'per-ip',     key: (r) => r.ip,              limit: 100,       window: 60 },
  ];

  async check(request: IncomingRequest): Promise<RateLimitResult> {
    for (const tier of this.tiers) {
      const key = tier.key(request);
      const result = await this.rateLimiter.isAllowed(key, tier);

      if (!result.allowed) {
        return {
          allowed: false,
          tier: tier.name,
          retryAfter: result.resetAt,
        };
      }
    }

    return { allowed: true };
  }
}

5.3 Authentication

typescript
class AuthenticationMiddleware {
  async authenticate(request: IncomingRequest): Promise<AuthResult> {
    // Try multiple authentication methods
    const authHeader = request.headers['authorization'];

    if (authHeader?.startsWith('Bearer ')) {
      return this.validateJWT(authHeader.substring(7));
    }

    if (request.headers['x-api-key']) {
      return this.validateAPIKey(request.headers['x-api-key']);
    }

    if (request.query?.access_token) {
      return this.validateJWT(request.query.access_token as string);
    }

    throw new UnauthorizedError('No valid authentication provided');
  }

  private async validateJWT(token: string): Promise<AuthResult> {
    try {
      // Fast path: check local cache
      const cached = this.tokenCache.get(token);
      if (cached) return cached;

      // Verify JWT signature
      const decoded = jwt.verify(token, this.publicKey, {
        algorithms: ['RS256'],
        issuer: this.expectedIssuer,
        audience: this.expectedAudience,
      });

      const result: AuthResult = {
        authenticated: true,
        userId: decoded.sub as string,
        scopes: decoded.scope?.split(' ') ?? [],
        clientId: decoded.client_id as string,
        expiresAt: new Date((decoded.exp as number) * 1000),
      };

      // Cache valid tokens (with remaining TTL)
      const ttl = (decoded.exp as number) - Math.floor(Date.now() / 1000);
      if (ttl > 0) {
        this.tokenCache.set(token, result, { ttl: Math.min(ttl, 300) * 1000 });
      }

      return result;
    } catch (error) {
      if (error.name === 'TokenExpiredError') {
        throw new UnauthorizedError('Token expired');
      }
      throw new UnauthorizedError('Invalid token');
    }
  }

  private async validateAPIKey(apiKey: string): Promise<AuthResult> {
    // Hash the key and look up in database
    const keyHash = crypto.createHash('sha256').update(apiKey).digest('hex');

    // Check cache first
    const cached = this.apiKeyCache.get(keyHash);
    if (cached) return cached;

    // Query database
    const keyRecord = await this.db.query(
      'SELECT * FROM api_keys WHERE key_hash = $1 AND enabled = true',
      [keyHash]
    );

    if (!keyRecord.rows[0]) {
      throw new UnauthorizedError('Invalid API key');
    }

    const key = keyRecord.rows[0];

    // Check expiry
    if (key.expires_at && new Date(key.expires_at) < new Date()) {
      throw new UnauthorizedError('API key expired');
    }

    const result: AuthResult = {
      authenticated: true,
      clientId: key.client_id,
      scopes: key.scopes,
      rateLimitTier: key.rate_limit_tier,
    };

    // Cache for 5 minutes
    this.apiKeyCache.set(keyHash, result, { ttl: 300_000 });

    // Update last_used_at asynchronously
    this.db.query(
      'UPDATE api_keys SET last_used_at = NOW() WHERE key_id = $1',
      [key.key_id]
    ).catch(() => {}); // Fire and forget

    return result;
  }
}

5.4 Circuit Breaking

typescript
interface CircuitBreakerConfig {
  failureThreshold: number;    // Number of failures to open circuit
  successThreshold: number;    // Number of successes to close circuit
  timeout: number;             // Seconds before trying half-open
  halfOpenMaxCalls: number;    // Max concurrent calls in half-open
  failureRateThreshold: number; // Percentage (0-100)
  slidingWindowSize: number;   // Number of calls to track
}

class CircuitBreaker {
  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
  private failureCount: number = 0;
  private successCount: number = 0;
  private lastFailureTime: number = 0;
  private halfOpenCalls: number = 0;
  private slidingWindow: CallResult[] = [];

  constructor(
    private readonly serviceId: string,
    private readonly config: CircuitBreakerConfig
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (!this.canExecute()) {
      this.metrics.increment('circuit_breaker.rejected', { service: this.serviceId });
      throw new ServiceUnavailableError(
        `Circuit breaker OPEN for service ${this.serviceId}`
      );
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure(error);
      throw error;
    }
  }

  private canExecute(): boolean {
    switch (this.state) {
      case 'CLOSED':
        return true;
      case 'OPEN':
        // Check if timeout has elapsed
        if (Date.now() - this.lastFailureTime > this.config.timeout * 1000) {
          this.transitionTo('HALF_OPEN');
          return true;
        }
        return false;
      case 'HALF_OPEN':
        // Allow limited calls through
        return this.halfOpenCalls < this.config.halfOpenMaxCalls;
    }
  }

  private onSuccess(): void {
    this.slidingWindow.push({ success: true, timestamp: Date.now() });
    this.trimWindow();

    switch (this.state) {
      case 'HALF_OPEN':
        this.successCount++;
        if (this.successCount >= this.config.successThreshold) {
          this.transitionTo('CLOSED');
        }
        break;
      case 'CLOSED':
        // Reset failure count on success
        this.failureCount = 0;
        break;
    }
  }

  private onFailure(error: Error): void {
    this.slidingWindow.push({ success: false, timestamp: Date.now() });
    this.trimWindow();
    this.lastFailureTime = Date.now();

    switch (this.state) {
      case 'CLOSED':
        this.failureCount++;
        const failureRate = this.getFailureRate();
        if (
          this.failureCount >= this.config.failureThreshold ||
          failureRate >= this.config.failureRateThreshold
        ) {
          this.transitionTo('OPEN');
        }
        break;
      case 'HALF_OPEN':
        this.transitionTo('OPEN');
        break;
    }
  }

  private getFailureRate(): number {
    if (this.slidingWindow.length < this.config.slidingWindowSize) return 0;
    const failures = this.slidingWindow.filter(c => !c.success).length;
    return (failures / this.slidingWindow.length) * 100;
  }

  private transitionTo(newState: 'CLOSED' | 'OPEN' | 'HALF_OPEN'): void {
    const oldState = this.state;
    this.state = newState;

    // Reset counters
    this.failureCount = 0;
    this.successCount = 0;
    this.halfOpenCalls = 0;

    this.metrics.increment('circuit_breaker.state_change', {
      service: this.serviceId,
      from: oldState,
      to: newState,
    });

    // Alert on state changes
    if (newState === 'OPEN') {
      this.alerting.fire(`Circuit breaker OPEN for ${this.serviceId}`);
    }
  }
}

5.5 Load Balancing

typescript
interface ServiceInstance {
  host: string;
  port: number;
  weight: number;
  healthy: boolean;
  activeConnections: number;
  responseTimeMs: number; // Exponential moving average
}

class LoadBalancer {
  private instances: ServiceInstance[];
  private roundRobinIndex: number = 0;

  constructor(
    private readonly algorithm: 'round_robin' | 'weighted_round_robin' |
                                'least_connections' | 'ip_hash' | 'random'
  ) {}

  selectInstance(request?: IncomingRequest): ServiceInstance {
    const healthy = this.instances.filter(i => i.healthy);
    if (healthy.length === 0) throw new NoHealthyInstanceError();

    switch (this.algorithm) {
      case 'round_robin':
        return this.roundRobin(healthy);
      case 'weighted_round_robin':
        return this.weightedRoundRobin(healthy);
      case 'least_connections':
        return this.leastConnections(healthy);
      case 'ip_hash':
        return this.ipHash(healthy, request!.ip);
      case 'random':
        return healthy[Math.floor(Math.random() * healthy.length)];
      default:
        return this.roundRobin(healthy);
    }
  }

  private roundRobin(instances: ServiceInstance[]): ServiceInstance {
    const index = this.roundRobinIndex % instances.length;
    this.roundRobinIndex++;
    return instances[index];
  }

  private weightedRoundRobin(instances: ServiceInstance[]): ServiceInstance {
    // Smooth weighted round robin (Nginx algorithm)
    const totalWeight = instances.reduce((sum, i) => sum + i.weight, 0);

    let bestInstance: ServiceInstance | null = null;
    let bestWeight = -Infinity;

    for (const instance of instances) {
      instance['currentWeight'] = (instance['currentWeight'] ?? 0) + instance.weight;

      if (instance['currentWeight'] > bestWeight) {
        bestWeight = instance['currentWeight'];
        bestInstance = instance;
      }
    }

    bestInstance!['currentWeight'] -= totalWeight;
    return bestInstance!;
  }

  private leastConnections(instances: ServiceInstance[]): ServiceInstance {
    return instances.reduce((min, i) =>
      i.activeConnections < min.activeConnections ? i : min
    );
  }

  private ipHash(instances: ServiceInstance[], clientIP: string): ServiceInstance {
    // Consistent hashing for session affinity
    const hash = this.hash(clientIP);
    const index = hash % instances.length;
    return instances[index];
  }
}

5.6 Service Discovery

typescript
class ServiceDiscovery {
  private serviceCache: Map<string, ServiceInstance[]> = new Map();

  async discoverInstances(serviceId: string): Promise<ServiceInstance[]> {
    // Check cache first (refreshed every 10 seconds)
    const cached = this.serviceCache.get(serviceId);
    if (cached) return cached;

    // Query service registry (Consul, etcd, Kubernetes)
    const instances = await this.registry.getHealthyInstances(serviceId);

    this.serviceCache.set(serviceId, instances);
    return instances;
  }

  // Consul-based discovery
  async discoverFromConsul(serviceName: string): Promise<ServiceInstance[]> {
    const response = await fetch(
      `${this.consulUrl}/v1/health/service/${serviceName}?passing=true`
    );
    const services = await response.json();

    return services.map((svc: any) => ({
      host: svc.Service.Address,
      port: svc.Service.Port,
      weight: parseInt(svc.Service.Meta?.weight ?? '100'),
      healthy: true,
      metadata: svc.Service.Meta,
    }));
  }

  // Kubernetes-based discovery
  async discoverFromKubernetes(serviceName: string, namespace: string): Promise<ServiceInstance[]> {
    const endpoints = await this.k8sClient.getEndpoints(serviceName, namespace);

    return endpoints.subsets.flatMap(subset =>
      subset.addresses.map(addr => ({
        host: addr.ip,
        port: subset.ports[0].port,
        weight: 100,
        healthy: true,
        nodeName: addr.nodeName,
      }))
    );
  }

  // Watch for changes (event-driven updates)
  async watchService(serviceId: string): Promise<void> {
    const watcher = this.registry.watch(serviceId);

    watcher.on('change', (instances: ServiceInstance[]) => {
      this.serviceCache.set(serviceId, instances);
      this.loadBalancers.get(serviceId)?.updateInstances(instances);
    });
  }
}

5.7 Request/Response Transformation

typescript
class RequestTransformer {
  async transform(
    request: IncomingRequest,
    route: RouteRule,
    auth: AuthResult
  ): Promise<TransformedRequest> {
    const transformed = { ...request };

    // Strip path prefix
    if (route.stripPrefix) {
      transformed.path = this.stripPrefix(request.path, route.pathPattern);
    }

    // Add headers
    transformed.headers = {
      ...transformed.headers,
      'X-Request-ID': request.headers['x-request-id'] ?? crypto.randomUUID(),
      'X-Forwarded-For': request.ip,
      'X-Forwarded-Host': request.headers['host'],
      'X-Forwarded-Proto': request.protocol,
      'X-Authenticated-User': auth.userId ?? '',
      'X-Client-ID': auth.clientId ?? '',
    };

    // Remove sensitive headers
    delete transformed.headers['authorization'];
    delete transformed.headers['cookie'];

    // Apply route-specific transformations
    for (const filter of route.filters ?? []) {
      switch (filter.type) {
        case 'add_header':
          transformed.headers[filter.name] = filter.value;
          break;
        case 'remove_header':
          delete transformed.headers[filter.name];
          break;
        case 'rewrite_path':
          transformed.path = request.path.replace(
            new RegExp(filter.from), filter.to
          );
          break;
        case 'add_query_param':
          transformed.queryParams[filter.name] = filter.value;
          break;
      }
    }

    return transformed;
  }
}

class ResponseTransformer {
  async transform(
    response: BackendResponse,
    route: RouteRule
  ): Promise<TransformedResponse> {
    const transformed = { ...response };

    // Add CORS headers
    transformed.headers['access-control-allow-origin'] = '*';
    transformed.headers['access-control-allow-methods'] = route.methods.join(', ');

    // Remove internal headers
    delete transformed.headers['x-internal-trace'];
    delete transformed.headers['server'];

    // Add security headers
    transformed.headers['x-content-type-options'] = 'nosniff';
    transformed.headers['x-frame-options'] = 'DENY';
    transformed.headers['strict-transport-security'] = 'max-age=31536000';

    return transformed;
  }
}

5.8 Observability

typescript
class ObservabilityMiddleware {
  async handle(
    request: IncomingRequest,
    next: () => Promise<Response>
  ): Promise<Response> {
    const requestId = request.headers['x-request-id'] ?? crypto.randomUUID();
    const startTime = process.hrtime.bigint();

    // Start distributed trace span
    const span = this.tracer.startSpan('gateway.request', {
      attributes: {
        'http.method': request.method,
        'http.url': request.path,
        'http.client_ip': request.ip,
        'gateway.request_id': requestId,
      },
    });

    try {
      const response = await next();

      // Record metrics
      const durationMs = Number(process.hrtime.bigint() - startTime) / 1_000_000;

      this.metrics.histogram('gateway.request_duration_ms', durationMs, {
        method: request.method,
        path: this.normalizePath(request.path),
        status: response.statusCode.toString(),
        service: request.routedService ?? 'unknown',
      });

      this.metrics.increment('gateway.requests_total', {
        method: request.method,
        status: response.statusCode.toString(),
      });

      // Structured access log
      this.logger.info({
        type: 'access_log',
        requestId,
        method: request.method,
        path: request.path,
        status: response.statusCode,
        durationMs,
        clientIp: request.ip,
        userAgent: request.headers['user-agent'],
        service: request.routedService,
        bytesIn: request.contentLength,
        bytesOut: response.contentLength,
        userId: request.auth?.userId,
        clientId: request.auth?.clientId,
      });

      span.setStatus({ code: response.statusCode < 400 ? 'OK' : 'ERROR' });
      span.end();

      return response;
    } catch (error) {
      span.recordException(error);
      span.setStatus({ code: 'ERROR', message: error.message });
      span.end();

      this.metrics.increment('gateway.errors_total', {
        type: error.constructor.name,
      });

      throw error;
    }
  }

  // Normalize paths for metric cardinality control
  // /api/v1/users/12345 -> /api/v1/users/{id}
  private normalizePath(path: string): string {
    return path
      .replace(/\/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/gi, '/{uuid}')
      .replace(/\/\d+/g, '/{id}');
  }
}

6. Scaling & Bottlenecks

What Breaks First

ComponentBottleneckSolution
Gateway throughputCPU-bound (TLS, auth, routing)Horizontal scaling, TLS offload to L4 LB
Rate limitingRedis single-point bottleneckRedis Cluster, local approximate counters
Service discoveryConfig propagation delayEvent-driven watches, DNS TTL reduction
Circuit breakerPer-instance state inconsistencyShare state via Redis, or accept local-only
LoggingI/O bottleneck at high QPSAsync logging, sampling, buffer + batch send
Connection poolBackend connection exhaustionConnection pooling, HTTP/2 multiplexing

Deployment Architecture


7. Trade-offs & Alternatives

Build vs. Buy

ApproachProCon
Build CustomFull control, optimized for use caseHigh engineering cost
Kong (Open Source)Plugin ecosystem, Lua extensibleComplex setup
AWS API GatewayManaged, serverless optionVendor lock-in, latency
Envoy + IstioCloud-native, service meshSteep learning curve
NGINX PlusProven performanceLess programmable
TraefikAuto-discovery, Docker nativeLess feature-rich at scale

Rate Limiting: Local vs. Distributed

ApproachProCon
Local onlyNo network call, fastestInconsistent across instances
Distributed (Redis)Globally accurateRedis dependency, +0.5ms latency
HybridFast with eventual consistencyClients may exceed limit briefly

Authentication Location

ApproachProCon
At GatewaySingle enforcement pointGateway becomes bottleneck
At ServiceService-specific policiesDuplicated logic
Sidecar (Envoy)Per-service, transparentComplex mesh setup

8. Advanced Topics

8.1 API Versioning

typescript
class APIVersionRouter {
  // Strategy 1: Path-based versioning
  // /api/v1/users -> user-service-v1
  // /api/v2/users -> user-service-v2

  // Strategy 2: Header-based versioning
  // Accept: application/vnd.api+json; version=2

  // Strategy 3: Query parameter
  // /api/users?version=2

  async route(request: IncomingRequest): Promise<RouteMatch> {
    const version = this.extractVersion(request);
    const service = await this.getServiceForVersion(request.path, version);
    return { route: service, version };
  }
}

8.2 Request Coalescing

When many clients request the same resource simultaneously, the gateway can coalesce these into a single backend request.

typescript
class RequestCoalescer {
  private inflightRequests: Map<string, Promise<Response>> = new Map();

  async get(cacheKey: string, fetcher: () => Promise<Response>): Promise<Response> {
    const existing = this.inflightRequests.get(cacheKey);
    if (existing) return existing; // Reuse inflight request

    const promise = fetcher().finally(() => {
      this.inflightRequests.delete(cacheKey);
    });

    this.inflightRequests.set(cacheKey, promise);
    return promise;
  }
}

8.3 GraphQL Gateway

An API Gateway can aggregate multiple REST services into a single GraphQL endpoint.

8.4 WebSocket Proxying

The gateway must handle WebSocket upgrade requests and maintain long-lived connections.

typescript
class WebSocketProxy {
  handleUpgrade(request: IncomingRequest, socket: Socket, head: Buffer): void {
    const route = this.router.resolve(request);
    if (!route) {
      socket.destroy();
      return;
    }

    const instance = this.loadBalancer.selectInstance(request);
    const target = `ws://${instance.host}:${instance.port}${route.targetPath}`;

    // Proxy WebSocket connection
    this.httpProxy.ws(request, socket, head, { target });
  }
}

9. Interview Tips

Key Points to Emphasize

  1. The gateway is on the critical path — Every request passes through it. Latency and availability are paramount.
  2. Rate limiting must be distributed — Use Redis or a similar store for global enforcement.
  3. Circuit breaking protects the entire system — A failing service should not cascade.
  4. Stateless design enables horizontal scaling — All state in Redis/config store, gateway instances are interchangeable.
  5. Plugin architecture for extensibility — New cross-cutting concerns without modifying core logic.

Common Mistakes

  • Making the gateway too "smart" — it should route and enforce policies, not contain business logic.
  • Ignoring the latency budget — every middleware adds latency; be explicit about budgets.
  • Not discussing single point of failure — the gateway must be highly available itself.
  • Forgetting about WebSocket and streaming support — not all traffic is request-response.
  • Over-engineering rate limiting — start with simple token bucket before sliding windows.

Follow-Up Questions to Expect

  • How would you deploy gateway updates without downtime? (Blue-green deployment, rolling updates with health checks.)
  • How would you handle a rogue client sending millions of requests? (WAF, IP-level rate limiting, automatic blocking.)
  • How is this different from a service mesh (Istio/Envoy sidecar)? (Gateway is north-south traffic; mesh handles east-west.)
  • How would you implement API key rotation without downtime?
  • How would you implement request caching at the gateway level?

Time Allocation in 45-min Interview

PhaseTimeFocus
Requirements3 minClarify scope: API Gateway vs. Service Mesh vs. LB
High-Level Design8 minArchitecture, pipeline, component overview
Deep Dive: Routing7 minPath matching, traffic splitting, versioning
Deep Dive: Rate Limiting10 minToken bucket, distributed via Redis, multi-tier
Deep Dive: Circuit Breaking7 minState machine, failure detection, recovery
Scaling7 minHorizontal scaling, multi-region, caching
Q&A3 minBuild vs. buy, service mesh comparison

"What I cannot create, I do not understand." — Richard Feynman