Skip to content
Unverified — AI-generated content. Help verify this page

Auth Service Scaling Plan

Authentication is one of the few services where every user interacts with it on every session. Unlike feature-specific endpoints that only a subset of users hit, the auth service processes traffic proportional to your entire user base, multiplied by their session frequency. A platform with 1 million monthly active users generates approximately 3-5 million token refresh requests per day, on top of login and registration traffic.

This page covers the scaling journey from a single-instance deployment to a globally distributed auth service serving tens of millions of users.

Traffic Profile Analysis

Request Pattern Breakdown

Auth traffic has a distinctive pattern that differs from typical application traffic:

OperationRead/WriteFrequency per UserLatency TargetCPU CostI/O Cost
Token validationRead50-200/day< 5msVery Low (JWT verify)Low (Redis EXISTS)
Token refreshRead + Write4-20/day< 50msLow (JWT sign)Medium (Redis GET/SET)
LoginRead + Write1-3/day< 300msHigh (Argon2 verify)Medium (PG query)
RegistrationWriteOnce< 500msHigh (Argon2 hash)Medium (PG insert)
Password resetWriteRare< 500msHigh (Argon2 hash)Medium (PG + email)
JWKS fetchReadVaries (cached)< 10msNoneNone (in-memory)

Read/Write Ratio

Token validation:  ~85% of all requests (READ-ONLY, stateless)
Token refresh:     ~10% of all requests (READ + WRITE to Redis)
Login:             ~4%  of all requests (READ PG + WRITE Redis)
Everything else:   ~1%  of all requests

The auth service is overwhelmingly read-heavy for token validation. The key insight: token validation does not need to hit the auth service at all. By distributing the JWKS public key, any downstream service can validate tokens locally.

Traffic Estimates by Scale

User ScaleMAUDaily LoginsDaily RefreshesPeak RPS (Auth Service)Peak RPS (Token Validation)
Startup10K5K20K550
Growth100K50K200K50500
Scale1M500K2M5005,000
Large10M5M20M5,00050,000

TIP

Token validation should happen at the API gateway or within each service, not at the auth service. The JWKS endpoint provides the public key for local verification. This eliminates 85%+ of potential auth service traffic.

Scaling Phases

Phase 1: Single Instance (0 - 50K MAU)

At this scale, a single auth service instance behind a load balancer with one PostgreSQL instance and one Redis instance handles all traffic comfortably.

Configuration:

  • 1 auth service instance (2 vCPU, 2 GB RAM)
  • PostgreSQL: db.r6g.large (2 vCPU, 16 GB RAM) or equivalent
  • Redis: cache.r6g.large (2 vCPU, 13 GB RAM) or equivalent
  • Connection pool: 20 connections to PostgreSQL

Bottlenecks at this scale: None. A single Node.js instance can handle 1,000+ RPS for typical auth operations.

Phase 2: Multiple Instances with Connection Pooling (50K - 500K MAU)

As traffic grows, you need multiple auth service instances for redundancy and capacity. The primary challenge is PostgreSQL connection management — each Node.js instance opens a pool of connections, and PostgreSQL has a hard connection limit.

Key changes:

  • 3 auth service instances
  • PgBouncer for connection pooling
  • PostgreSQL read replica for read queries (user lookups)
  • Redis with 1 replica for failover

PgBouncer Configuration

ini
; pgbouncer.ini

[databases]
auth_service = host=postgres-primary.auth-system port=5432 dbname=auth_service

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0

; Pool mode: transaction-level pooling (best for Node.js)
pool_mode = transaction

; Connection limits
max_client_conn = 200          ; Total client connections allowed
default_pool_size = 25         ; Connections per database-user pair
min_pool_size = 5              ; Keep at least 5 connections warm
reserve_pool_size = 5          ; Extra connections for burst
reserve_pool_timeout = 3       ; Seconds before using reserve pool

; Timeouts
server_connect_timeout = 5
server_idle_timeout = 300
client_idle_timeout = 300
query_timeout = 30
query_wait_timeout = 30

; Logging
log_connections = 0
log_disconnections = 0
log_pooler_errors = 1
stats_period = 60

; Security
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

Why PgBouncer matters:

Without PgBouncer, 3 service instances with 20 connections each = 60 PostgreSQL connections. At 10 instances = 200 connections. PostgreSQL performs poorly beyond ~200-300 connections due to per-connection memory overhead (~10 MB each) and process scheduling. PgBouncer maintains a small pool of actual PostgreSQL connections and multiplexes application connections through them.

Phase 3: Read Replicas and Redis Cluster (500K - 5M MAU)

At this scale, read traffic starts to dominate. Login operations perform user lookups (reads), and these can be directed to read replicas. Write operations (registration, password changes) go to the primary.

Read/Write Splitting in Application Code

typescript
// src/infrastructure/database/connection-manager.ts

import { Pool, PoolClient } from 'pg';

export class ConnectionManager {
  private readonly writePool: Pool;
  private readonly readPool: Pool;

  constructor(config: {
    write: { host: string; port: number; database: string; user: string; password: string };
    read: { host: string; port: number; database: string; user: string; password: string };
    poolSize: number;
  }) {
    this.writePool = new Pool({
      ...config.write,
      max: config.poolSize,
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 5000,
    });

    this.readPool = new Pool({
      ...config.read,
      max: config.poolSize * 2, // More read capacity
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 5000,
    });
  }

  async query<T>(sql: string, params?: unknown[], useReplica = false): Promise<T[]> {
    const pool = useReplica ? this.readPool : this.writePool;
    const result = await pool.query(sql, params);
    return result.rows as T[];
  }

  async transaction<T>(fn: (client: PoolClient) => Promise<T>): Promise<T> {
    const client = await this.writePool.connect();
    try {
      await client.query('BEGIN');
      const result = await fn(client);
      await client.query('COMMIT');
      return result;
    } catch (error) {
      await client.query('ROLLBACK');
      throw error;
    } finally {
      client.release();
    }
  }

  async healthCheck(): Promise<{ write: boolean; read: boolean }> {
    const [writeOk, readOk] = await Promise.allSettled([
      this.writePool.query('SELECT 1'),
      this.readPool.query('SELECT 1'),
    ]);
    return {
      write: writeOk.status === 'fulfilled',
      read: readOk.status === 'fulfilled',
    };
  }
}

Redis Cluster Configuration

typescript
// src/infrastructure/cache/redis-cluster.ts

import { Cluster } from 'ioredis';

export function createRedisCluster(nodes: { host: string; port: number }[]): Cluster {
  return new Cluster(nodes, {
    // Cluster-specific settings
    clusterRetryStrategy: (times: number) => {
      if (times > 10) return null; // Stop retrying after 10 attempts
      return Math.min(times * 100, 3000);
    },
    enableReadyCheck: true,
    scaleReads: 'slave',  // Read from replicas when possible

    // Per-node settings
    redisOptions: {
      maxRetriesPerRequest: 3,
      connectTimeout: 5000,
      commandTimeout: 3000,
      password: process.env.REDIS_PASSWORD,
      tls: process.env.REDIS_TLS === 'true' ? {} : undefined,
    },

    // DNS-based discovery (for Kubernetes)
    dnsLookup: (address, callback) => callback(null, address),
    natMap: undefined,
  });
}

// Session operations using Redis Cluster
export class ClusteredSessionStore {
  constructor(private readonly redis: Cluster) {}

  async set(sessionId: string, data: string, ttlSeconds: number): Promise<void> {
    // Sessions for the same user should hash to the same shard
    // Use hash tags: {userId}:session:sessionId
    await this.redis.setex(`{session}:${sessionId}`, ttlSeconds, data);
  }

  async get(sessionId: string): Promise<string | null> {
    return this.redis.get(`{session}:${sessionId}`);
  }

  async delete(sessionId: string): Promise<void> {
    await this.redis.del(`{session}:${sessionId}`);
  }

  async exists(sessionId: string): Promise<boolean> {
    return (await this.redis.exists(`{session}:${sessionId}`)) === 1;
  }
}

Phase 4: Global Distribution (5M+ MAU)

At global scale, latency becomes the primary concern. Users in Asia should not have their login requests routed to US-East. The solution is multi-region deployment with regional Redis instances and a globally replicated PostgreSQL.

Write routing at global scale:

Reads (login user lookup, session validation) can be served from any region using local replicas. Writes (registration, password change) must be routed to the primary region. Two strategies:

  1. Route all writes to the primary region — Simple, but adds latency for non-primary regions (100-200ms cross-region).
  2. Use a multi-primary database (CockroachDB, Spanner) — Complex, but eliminates cross-region writes.

For most auth services, strategy 1 is sufficient. Users register once and change passwords rarely. The extra 100ms is acceptable for these infrequent operations.

Connection Pooling Deep Dive

The Connection Exhaustion Problem

Each PostgreSQL connection consumes approximately 10 MB of memory and one process slot. The maximum practical connection count for a PostgreSQL instance is:

Max ConnectionsAvailable Memory (MB)OS Reserved (MB)Per-Connection Memory (MB)

For a 16 GB instance:

Max Connections163842048101434

But performance degrades well before this limit due to process scheduling overhead. The practical sweet spot is 100-300 connections.

Pooling Architecture

Application Layer:     [Node.js Pool: 20 conn] × 10 instances = 200 app connections

Connection Pooler:     [PgBouncer: 200 client conn → 50 server conn]

Database Layer:        [PostgreSQL: 50 active connections]

Without PgBouncer, the database handles 200 connections. With PgBouncer in transaction mode, the database handles only 50 connections — a 4x reduction — because connections are returned to the pool after each transaction completes.

Pool Sizing Formula

The optimal pool size for each Node.js instance:

Pool Size=Target RPS×Avg Query Duration (s)Number of Instances

For 500 RPS with 5ms average query duration across 5 instances:

Pool Size=500×0.0055=0.5 connections

This seems unrealistically low, but it is correct — most queries complete in milliseconds, so connections are rapidly reused. In practice, set the minimum at 5-10 to handle burst traffic and long-running queries.

Rate Limiting Architecture

Multi-Layer Rate Limiting

Rate limiting for auth endpoints operates at four layers:

Redis-Based Rate Limiter

typescript
// src/infrastructure/rate-limit/sliding-window.ts

import { Redis } from 'ioredis';

interface RateLimitResult {
  allowed: boolean;
  limit: number;
  remaining: number;
  resetAt: number;     // Unix timestamp
  retryAfter?: number; // Seconds
}

export class SlidingWindowRateLimiter {
  constructor(private readonly redis: Redis) {}

  async check(
    key: string,
    limit: number,
    windowSeconds: number,
  ): Promise<RateLimitResult> {
    const now = Date.now();
    const windowStart = now - windowSeconds * 1000;
    const resetAt = Math.ceil((now + windowSeconds * 1000) / 1000);

    // Lua script for atomic sliding window check
    const luaScript = `
      local key = KEYS[1]
      local now = tonumber(ARGV[1])
      local window_start = tonumber(ARGV[2])
      local limit = tonumber(ARGV[3])
      local window_seconds = tonumber(ARGV[4])

      -- Remove entries outside the window
      redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

      -- Count current entries
      local count = redis.call('ZCARD', key)

      if count < limit then
        -- Add this request
        redis.call('ZADD', key, now, now .. ':' .. math.random(1000000))
        redis.call('EXPIRE', key, window_seconds)
        return {1, limit - count - 1}  -- allowed, remaining
      else
        return {0, 0}  -- blocked, 0 remaining
      end
    `;

    const result = await this.redis.eval(
      luaScript,
      1,
      `ratelimit:${key}`,
      now,
      windowStart,
      limit,
      windowSeconds,
    ) as [number, number];

    const allowed = result[0] === 1;
    const remaining = result[1];

    return {
      allowed,
      limit,
      remaining,
      resetAt,
      retryAfter: allowed ? undefined : windowSeconds,
    };
  }
}

// Fastify plugin
export function rateLimitPlugin(redis: Redis) {
  const limiter = new SlidingWindowRateLimiter(redis);

  return async function rateLimit(
    request: any,
    reply: any,
    config: { max: number; timeWindow: string },
  ) {
    const windowSeconds = parseTimeWindow(config.timeWindow);
    const key = `${request.routerPath}:${request.ip}`;

    const result = await limiter.check(key, config.max, windowSeconds);

    // Set rate limit headers
    reply.header('X-RateLimit-Limit', result.limit);
    reply.header('X-RateLimit-Remaining', result.remaining);
    reply.header('X-RateLimit-Reset', result.resetAt);

    if (!result.allowed) {
      reply.header('Retry-After', result.retryAfter);
      reply.code(429).send({
        error: 'RATE_LIMIT_EXCEEDED',
        message: `Too many requests. Please try again after ${result.retryAfter} seconds.`,
        statusCode: 429,
        retryAfter: result.retryAfter,
      });
    }
  };
}

function parseTimeWindow(window: string): number {
  const match = window.match(/^(\d+)\s*(s|sec|m|min|h|hr)$/i);
  if (!match) throw new Error(`Invalid time window: ${window}`);
  const value = parseInt(match[1], 10);
  const unit = match[2].toLowerCase();
  switch (unit) {
    case 's': case 'sec': return value;
    case 'm': case 'min': return value * 60;
    case 'h': case 'hr': return value * 3600;
    default: return value;
  }
}

Per-Account Brute Force Protection

typescript
// src/infrastructure/rate-limit/brute-force.ts

import { Redis } from 'ioredis';

interface BruteForceResult {
  allowed: boolean;
  failedAttempts: number;
  maxAttempts: number;
  lockoutUntil?: Date;
  requireCaptcha: boolean;
}

export class BruteForceProtection {
  constructor(
    private readonly redis: Redis,
    private readonly config: {
      maxAttempts: number;          // 10
      lockoutDurationSeconds: number; // 1800 (30 min)
      captchaThreshold: number;     // 3
      windowSeconds: number;        // 900 (15 min)
    },
  ) {}

  async recordFailedAttempt(
    identifier: string,       // email or user ID
    ipAddress: string,
  ): Promise<BruteForceResult> {
    const accountKey = `brute:account:${identifier}`;
    const ipKey = `brute:ip:${ipAddress}`;

    // Increment counters atomically
    const pipeline = this.redis.pipeline();
    pipeline.incr(accountKey);
    pipeline.expire(accountKey, this.config.windowSeconds);
    pipeline.incr(ipKey);
    pipeline.expire(ipKey, this.config.windowSeconds);
    const results = await pipeline.exec();

    const accountAttempts = results![0][1] as number;
    const ipAttempts = results![2][1] as number;
    const failedAttempts = Math.max(accountAttempts, ipAttempts);

    // Check if account should be locked
    if (failedAttempts >= this.config.maxAttempts) {
      const lockUntil = new Date(Date.now() + this.config.lockoutDurationSeconds * 1000);
      await this.redis.setex(
        `brute:locked:${identifier}`,
        this.config.lockoutDurationSeconds,
        lockUntil.toISOString(),
      );
      return {
        allowed: false,
        failedAttempts,
        maxAttempts: this.config.maxAttempts,
        lockoutUntil: lockUntil,
        requireCaptcha: true,
      };
    }

    return {
      allowed: true,
      failedAttempts,
      maxAttempts: this.config.maxAttempts,
      requireCaptcha: failedAttempts >= this.config.captchaThreshold,
    };
  }

  async isLocked(identifier: string): Promise<{ locked: boolean; until?: Date }> {
    const lockData = await this.redis.get(`brute:locked:${identifier}`);
    if (!lockData) return { locked: false };
    return { locked: true, until: new Date(lockData) };
  }

  async resetAttempts(identifier: string): Promise<void> {
    await this.redis.del(`brute:account:${identifier}`);
    await this.redis.del(`brute:locked:${identifier}`);
  }
}

DDoS Protection Strategy

Layer 1: CDN/WAF (Cloudflare, AWS Shield)

  • Rate limiting: 10,000 requests/min per IP (configurable per endpoint).
  • Bot detection: JavaScript challenge for suspicious clients.
  • IP reputation: Block known-bad IPs from threat intelligence feeds.
  • Geo-blocking: Block regions with no legitimate users (if applicable).
  • Challenge page: CAPTCHA for IPs that exceed soft limits.

Layer 2: Application-Level Protections

Attack VectorDetectionMitigation
Credential stuffingHigh login failure rate from diverse IPsCAPTCHA after 3 failures, progressive delays, breach DB check
Account enumerationTiming differences in login responsesConstant-time responses, generic error messages
Token floodingHigh refresh rate per userPer-user refresh rate limit (30/min)
Registration spamHigh registration volumeCAPTCHA, email verification, phone verification at scale
Password sprayLow failures per account, many accountsPer-IP rate limiting, anomaly detection on aggregate failure rate

Anomaly Detection

typescript
// src/infrastructure/security/anomaly-detector.ts

import { Redis } from 'ioredis';

export class AuthAnomalyDetector {
  constructor(private readonly redis: Redis) {}

  async checkLoginAnomaly(
    userId: string,
    ipAddress: string,
    userAgent: string,
  ): Promise<{
    anomalous: boolean;
    reasons: string[];
    riskScore: number;
  }> {
    const reasons: string[] = [];
    let riskScore = 0;

    // Check 1: New device
    const knownDevices = await this.redis.smembers(`known_devices:${userId}`);
    const deviceHash = this.hashDevice(userAgent);
    if (!knownDevices.includes(deviceHash)) {
      reasons.push('new_device');
      riskScore += 20;
    }

    // Check 2: New IP geolocation
    const knownGeos = await this.redis.smembers(`known_geos:${userId}`);
    const geo = await this.getGeoLocation(ipAddress);
    if (geo && !knownGeos.includes(geo.country)) {
      reasons.push('new_country');
      riskScore += 30;
    }

    // Check 3: Impossible travel
    const lastLogin = await this.redis.get(`last_login_geo:${userId}`);
    if (lastLogin && geo) {
      const last = JSON.parse(lastLogin);
      const distance = this.calculateDistance(last.lat, last.lon, geo.lat, geo.lon);
      const timeDiff = (Date.now() - last.timestamp) / 1000 / 3600; // hours
      const speedKmh = distance / timeDiff;

      if (speedKmh > 1000) { // Faster than commercial flight
        reasons.push('impossible_travel');
        riskScore += 50;
      }
    }

    // Check 4: Unusual time of day
    const hourOfDay = new Date().getUTCHours();
    const loginHours = await this.redis.get(`login_hours:${userId}`);
    if (loginHours) {
      const typicalHours = JSON.parse(loginHours) as number[];
      if (!typicalHours.includes(hourOfDay)) {
        reasons.push('unusual_time');
        riskScore += 10;
      }
    }

    // Check 5: Tor exit node or VPN
    const isTor = await this.redis.sismember('tor_exit_nodes', ipAddress);
    if (isTor) {
      reasons.push('tor_exit_node');
      riskScore += 40;
    }

    return {
      anomalous: riskScore >= 50,
      reasons,
      riskScore: Math.min(riskScore, 100),
    };
  }

  private hashDevice(userAgent: string): string {
    const crypto = require('crypto');
    return crypto.createHash('sha256').update(userAgent).digest('hex').substring(0, 16);
  }

  private async getGeoLocation(ip: string): Promise<{ country: string; lat: number; lon: number } | null> {
    // Use MaxMind GeoIP2 or similar service
    return null; // Placeholder
  }

  private calculateDistance(lat1: number, lon1: number, lat2: number, lon2: number): number {
    // Haversine formula - returns distance in km
    const R = 6371;
    const dLat = (lat2 - lat1) * Math.PI / 180;
    const dLon = (lon2 - lon1) * Math.PI / 180;
    const a = Math.sin(dLat / 2) * Math.sin(dLat / 2) +
              Math.cos(lat1 * Math.PI / 180) * Math.cos(lat2 * Math.PI / 180) *
              Math.sin(dLon / 2) * Math.sin(dLon / 2);
    return R * 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
  }
}

Monitoring and Alerting

Key Metrics

typescript
// src/infrastructure/metrics/auth-metrics.ts

import { Counter, Histogram, Gauge, Registry } from 'prom-client';

export class AuthMetrics {
  private readonly registry: Registry;

  // Counters
  readonly loginAttempts: Counter;
  readonly loginSuccesses: Counter;
  readonly loginFailures: Counter;
  readonly registrations: Counter;
  readonly tokenRefreshes: Counter;
  readonly tokenRefreshFailures: Counter;
  readonly mfaChallenges: Counter;
  readonly mfaSuccesses: Counter;
  readonly mfaFailures: Counter;
  readonly accountLockouts: Counter;
  readonly passwordResets: Counter;

  // Histograms
  readonly loginLatency: Histogram;
  readonly registrationLatency: Histogram;
  readonly tokenRefreshLatency: Histogram;
  readonly passwordHashLatency: Histogram;
  readonly dbQueryLatency: Histogram;
  readonly redisLatency: Histogram;

  // Gauges
  readonly activeSessions: Gauge;
  readonly dbConnectionPoolSize: Gauge;
  readonly dbConnectionPoolUsed: Gauge;
  readonly redisConnectionPoolSize: Gauge;

  constructor() {
    this.registry = new Registry();

    this.loginAttempts = new Counter({
      name: 'auth_login_attempts_total',
      help: 'Total login attempts',
      labelNames: ['method', 'status'],
      registers: [this.registry],
    });

    this.loginSuccesses = new Counter({
      name: 'auth_login_successes_total',
      help: 'Successful logins',
      labelNames: ['method'],
      registers: [this.registry],
    });

    this.loginFailures = new Counter({
      name: 'auth_login_failures_total',
      help: 'Failed login attempts',
      labelNames: ['reason'],
      registers: [this.registry],
    });

    this.registrations = new Counter({
      name: 'auth_registrations_total',
      help: 'Total user registrations',
      labelNames: ['method'],
      registers: [this.registry],
    });

    this.tokenRefreshes = new Counter({
      name: 'auth_token_refreshes_total',
      help: 'Total token refreshes',
      registers: [this.registry],
    });

    this.tokenRefreshFailures = new Counter({
      name: 'auth_token_refresh_failures_total',
      help: 'Failed token refreshes',
      labelNames: ['reason'],
      registers: [this.registry],
    });

    this.mfaChallenges = new Counter({
      name: 'auth_mfa_challenges_total',
      help: 'MFA challenges issued',
      registers: [this.registry],
    });

    this.mfaSuccesses = new Counter({
      name: 'auth_mfa_successes_total',
      help: 'Successful MFA verifications',
      registers: [this.registry],
    });

    this.mfaFailures = new Counter({
      name: 'auth_mfa_failures_total',
      help: 'Failed MFA verifications',
      registers: [this.registry],
    });

    this.accountLockouts = new Counter({
      name: 'auth_account_lockouts_total',
      help: 'Account lockouts due to failed attempts',
      registers: [this.registry],
    });

    this.passwordResets = new Counter({
      name: 'auth_password_resets_total',
      help: 'Password reset completions',
      registers: [this.registry],
    });

    this.loginLatency = new Histogram({
      name: 'auth_login_duration_seconds',
      help: 'Login request duration in seconds',
      labelNames: ['status'],
      buckets: [0.05, 0.1, 0.2, 0.3, 0.5, 1, 2, 5],
      registers: [this.registry],
    });

    this.registrationLatency = new Histogram({
      name: 'auth_registration_duration_seconds',
      help: 'Registration request duration',
      buckets: [0.1, 0.2, 0.5, 1, 2, 5],
      registers: [this.registry],
    });

    this.tokenRefreshLatency = new Histogram({
      name: 'auth_token_refresh_duration_seconds',
      help: 'Token refresh duration',
      buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25],
      registers: [this.registry],
    });

    this.passwordHashLatency = new Histogram({
      name: 'auth_password_hash_duration_seconds',
      help: 'Argon2 password hashing duration',
      buckets: [0.1, 0.2, 0.3, 0.5, 1, 2],
      registers: [this.registry],
    });

    this.dbQueryLatency = new Histogram({
      name: 'auth_db_query_duration_seconds',
      help: 'Database query duration',
      labelNames: ['operation', 'table'],
      buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.5],
      registers: [this.registry],
    });

    this.redisLatency = new Histogram({
      name: 'auth_redis_duration_seconds',
      help: 'Redis operation duration',
      labelNames: ['operation'],
      buckets: [0.0005, 0.001, 0.005, 0.01, 0.025, 0.05],
      registers: [this.registry],
    });

    this.activeSessions = new Gauge({
      name: 'auth_active_sessions',
      help: 'Number of active sessions',
      registers: [this.registry],
    });

    this.dbConnectionPoolSize = new Gauge({
      name: 'auth_db_pool_size',
      help: 'Database connection pool size',
      registers: [this.registry],
    });

    this.dbConnectionPoolUsed = new Gauge({
      name: 'auth_db_pool_used',
      help: 'Database connections in use',
      registers: [this.registry],
    });

    this.redisConnectionPoolSize = new Gauge({
      name: 'auth_redis_pool_size',
      help: 'Redis connection pool size',
      registers: [this.registry],
    });
  }

  async getMetrics(): Promise<string> {
    return this.registry.metrics();
  }
}

Alert Rules (Prometheus)

yaml
# prometheus/auth-alerts.yaml

groups:
  - name: auth-service
    rules:
      # High error rate
      - alert: AuthHighErrorRate
        expr: |
          sum(rate(auth_login_failures_total[5m]))
          / sum(rate(auth_login_attempts_total[5m]))
          > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Auth service login failure rate above 10%"
          description: "{​{ $value | humanizePercentage }​} of login attempts are failing"

      # Credential stuffing detection
      - alert: AuthCredentialStuffing
        expr: |
          sum(rate(auth_login_failures_total{reason="invalid_credentials"}[5m])) > 50
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Possible credential stuffing attack"
          description: "More than 50 failed login attempts per second"

      # Account lockout spike
      - alert: AuthLockoutSpike
        expr: |
          sum(rate(auth_account_lockouts_total[5m])) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Unusual number of account lockouts"

      # Token refresh failure spike (might indicate stolen refresh tokens)
      - alert: AuthTokenReuseDetected
        expr: |
          sum(rate(auth_token_refresh_failures_total{reason="reuse_detected"}[5m])) > 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Refresh token reuse detected — possible token theft"

      # High login latency
      - alert: AuthHighLoginLatency
        expr: |
          histogram_quantile(0.95, rate(auth_login_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Login p95 latency exceeds 1 second"

      # Database connection pool exhaustion
      - alert: AuthDbPoolExhaustion
        expr: |
          auth_db_pool_used / auth_db_pool_size > 0.9
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Database connection pool 90% utilized"

      # Redis latency spike
      - alert: AuthRedisHighLatency
        expr: |
          histogram_quantile(0.95, rate(auth_redis_duration_seconds_bucket[5m])) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Redis p95 latency exceeds 10ms"

      # Service unavailable
      - alert: AuthServiceDown
        expr: up{job="auth-service"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Auth service instance is down"

Capacity Planning

Resource Calculator

Use this table to estimate the infrastructure needed for your scale:

MAUAuth ServicePostgreSQLRedisPgBouncerEst. Monthly Cost
10K1x (1 vCPU, 1 GB)1x db.t3.medium1x cache.t3.mediumNot needed$150
100K2x (2 vCPU, 2 GB)1x db.r6g.large1x cache.r6g.large1x (1 vCPU)$500
500K3x (2 vCPU, 4 GB)1x db.r6g.xlarge + replica3-node cluster2x (2 vCPU)$1,500
1M5x (2 vCPU, 4 GB)1x db.r6g.2xlarge + 2 replicas6-node cluster2x (2 vCPU)$3,000
5M10x (4 vCPU, 8 GB)1x db.r6g.4xlarge + 3 replicas9-node cluster3x (4 vCPU)$10,000
10M20x (4 vCPU, 8 GB)Multi-region (3 regions)Per-region clustersPer-region$30,000+

Load Testing Benchmarks

Expected performance on a single auth service instance (2 vCPU, 4 GB RAM):

OperationConcurrency 1Concurrency 10Concurrency 50Concurrency 100
Login (password verify)10 RPS80 RPS150 RPS180 RPS
Registration (password hash)8 RPS60 RPS120 RPS140 RPS
Token refresh500 RPS2,000 RPS3,500 RPS4,000 RPS
JWKS fetch5,000 RPS10,000 RPS15,000 RPS18,000 RPS

Login and registration are intentionally slow because Argon2id hashing consumes 64 MB of memory and significant CPU per operation. This is the trade-off: password security vs. login throughput.

Argon2 Tuning for Scale

At high scale, Argon2id's memory cost becomes a scaling bottleneck. Each concurrent login uses 64 MB of memory. With 100 concurrent logins, that is 6.4 GB of memory just for password hashing.

Options:

  1. Reduce memory cost — Trade security for throughput (not recommended below 19 MB).
  2. Offload to dedicated workers — Run password hashing on separate instances with more memory.
  3. Use worker threads — Isolate Argon2 to worker threads so the event loop stays responsive.
typescript
// Option 3: Worker thread for Argon2
// src/infrastructure/workers/argon2-worker.ts

import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';
import argon2 from 'argon2';

if (!isMainThread) {
  // Worker thread
  const { operation, password, hash, options } = workerData;

  (async () => {
    try {
      if (operation === 'hash') {
        const result = await argon2.hash(password, options);
        parentPort!.postMessage({ success: true, result });
      } else if (operation === 'verify') {
        const result = await argon2.verify(hash, password);
        parentPort!.postMessage({ success: true, result });
      }
    } catch (error: any) {
      parentPort!.postMessage({ success: false, error: error.message });
    }
  })();
}

export function hashInWorker(password: string, options: any): Promise<string> {
  return new Promise((resolve, reject) => {
    const worker = new Worker(__filename, {
      workerData: { operation: 'hash', password, options },
    });
    worker.on('message', (msg) => {
      if (msg.success) resolve(msg.result);
      else reject(new Error(msg.error));
    });
    worker.on('error', reject);
  });
}

"Scaling an auth service is fundamentally about understanding that 99% of your traffic is token validation, and token validation should never hit the auth service. Build the JWKS endpoint, distribute the public key, and let every other service validate tokens locally."

"What I cannot create, I do not understand." — Richard Feynman