Skip to content
Unverified — AI-generated content. Help verify this page

System Design Anti-Patterns

Anti-patterns are common solutions that appear correct but actually create more problems than they solve. In system design, these mistakes compound — a chatty service design that works for 100 requests per second collapses at 10,000. This page documents the nine most destructive anti-patterns, explains why teams fall into them, and provides concrete alternatives.

Anti-Pattern 1: The Distributed Monolith

What it looks like: You split your monolith into microservices, but they are all deployed together, share a database, and cannot be released independently.

Why teams fall into it: They read about microservices, split their code into separate repos and deployments, but do not change the data architecture or communication patterns. The result has all the complexity of microservices (network calls, distributed debugging, deployment orchestration) with none of the benefits (independent deployment, independent scaling, technology freedom).

How to detect it:

  • Deploying Service A requires deploying Service B at the same time
  • A change to the orders table schema requires changes in three services
  • Teams cannot release without coordinating with other teams
  • Integration tests require all services running together

What to do instead:

Each service owns its data, communicates via events or well-defined APIs, and can be deployed independently. See Database Per Service for how to achieve this.

Anti-Pattern 2: Premature Microservices

What it looks like: A team of 3-5 engineers building a new product starts with 10+ microservices from day one.

MonolithPremature Microservices
1 deployment pipeline10+ deployment pipelines
Simple function callsNetwork calls with retries, timeouts, serialization
Single debuggerDistributed tracing across 10 services
1 database migration10 migration scripts
Ship a feature in 1 dayShip a feature touching 4 services in 1 week

Why teams fall into it: "Netflix uses microservices, so we should too." Netflix has thousands of engineers. You have five. Microservices are an organizational scaling pattern, not a technical best practice.

How to detect it:

  • More services than engineers
  • Simple features require changes in multiple repositories
  • Most engineering time is spent on infrastructure, not product
  • The team discusses service boundaries more than customer problems

What to do instead: Start with a well-structured monolith. Use clear module boundaries internally. Extract services only when a specific module needs independent scaling, independent deployment, or a different technology choice.

typescript
// Well-structured monolith with clear module boundaries
// Can be extracted to services later when the pain is real

// src/modules/orders/order.service.ts
export class OrderService {
  constructor(
    private orderRepo: OrderRepository,
    private inventoryModule: InventoryModule, // Module interface, not HTTP call
    private paymentModule: PaymentModule,
  ) {}

  async createOrder(dto: CreateOrderDTO): Promise<Order> {
    // All in-process — no network calls
    const reserved = await this.inventoryModule.reserve(dto.productId, dto.quantity);
    const payment = await this.paymentModule.charge(dto.userId, dto.total);
    return this.orderRepo.create({ ...dto, reservationId: reserved.id, paymentId: payment.id });
  }
}

// When you need to extract: change InventoryModule from in-process to HTTP client
// The service boundary was already defined

Anti-Pattern 3: Shared Database Across Services

What it looks like: Multiple services read from and write to the same database tables.

This is fully covered in Database Per Service. The short version: shared databases create schema coupling, performance coupling, technology lock-in, and ownership ambiguity. Every benefit of microservices is negated when services share a database.

What to do instead: Each service owns its data. Cross-service data access goes through APIs. Cross-service queries use API composition or CQRS.

Anti-Pattern 4: Chatty Services

What it looks like: Fulfilling a single user request requires dozens of inter-service calls.

Why teams fall into it: Services are split too granularly — every noun becomes a service. The pricing of a product does not need its own service if it is always accessed alongside product data.

How to detect it:

  • A single API call generates 10+ internal calls (visible in distributed traces)
  • P99 latency is dominated by network hops, not computation
  • Services frequently call each other in loops

What to do instead:

  1. Merge overly granular services — Product + Pricing + Inventory can be one Catalog service
  2. Use batch/bulk APIs — instead of N calls for N items, one call with all IDs
  3. Denormalize read data — store commonly-accessed data together
  4. Backend for Frontend (BFF) — aggregate data at the edge
typescript
// Instead of 7 calls, use a BFF that aggregates
class OrderBFF {
  async getOrderDetails(orderId: string): Promise<OrderView> {
    const order = await this.orderService.getOrder(orderId);

    // Parallel bulk calls instead of sequential individual calls
    const [user, products, shipping] = await Promise.all([
      this.userService.getUser(order.userId),
      this.catalogService.getProductsBulk(order.productIds), // One call for all products
      this.shippingService.getEstimate(orderId),
    ]);

    // Catalog already includes price, stock, tax — no separate calls
    return this.assembleView(order, user, products, shipping);
  }
}

Anti-Pattern 5: Synchronous Call Chains

What it looks like: Service A calls Service B, which calls Service C, which calls Service D — all synchronously. The failure of any service in the chain fails the entire request.

The math is brutal:

Services in ChainIndividual AvailabilityChain Availability
299.9%99.8%
399.9%99.7%
599.9%99.5%
599.0%95.1%
1099.0%90.4%

What to do instead:

  1. Use asynchronous communication — events instead of synchronous HTTP calls
  2. Limit sync depth to 2 — if you need to call A → B → C, refactor C's data into B
  3. Use timeouts and circuit breakers for any remaining synchronous calls
  4. Accept eventual consistency where real-time consistency is not actually needed
typescript
// Bad: synchronous chain
async function createOrder(dto: OrderDTO): Promise<Order> {
  const inventory = await inventoryService.reserve(dto.productId); // sync
  const payment = await paymentService.charge(dto.userId, dto.total); // sync
  const shipping = await shippingService.schedule(dto.orderId); // sync
  const notification = await notificationService.send(dto.userId); // sync
  return order;
}

// Good: sync only for critical path, async for the rest
async function createOrder(dto: OrderDTO): Promise<Order> {
  // Only inventory and payment are on the critical path
  const inventory = await inventoryService.reserve(dto.productId);
  const payment = await paymentService.charge(dto.userId, dto.total);

  const order = await orderRepo.create({ ...dto, status: 'confirmed' });

  // Fire-and-forget for non-critical operations
  await eventBus.publish('order.created', {
    orderId: order.id,
    userId: dto.userId,
  });
  // Shipping and notification services react to this event asynchronously

  return order;
}

Anti-Pattern 6: No Circuit Breakers

What it looks like: When a downstream service fails, your service keeps sending requests, accumulating timed-out connections and eventually crashing too.

See our Circuit Breaker Pattern for full implementation details.

The cascade:

What to do instead:

typescript
const circuitBreaker = new CircuitBreaker({
  failureThreshold: 5,    // Open after 5 failures
  resetTimeout: 30_000,   // Try again after 30s
  monitorInterval: 10_000,
});

async function callDownstream(request: Request): Promise<Response> {
  return circuitBreaker.execute(async () => {
    const response = await fetch('https://downstream/api', {
      signal: AbortSignal.timeout(3000), // Always set timeouts
    });
    if (!response.ok) throw new Error(`HTTP ${response.status}`);
    return response;
  }, {
    fallback: () => getCachedResponse(request), // Graceful degradation
  });
}

Every synchronous inter-service call needs:

  1. A timeout (3-10 seconds, not 30)
  2. A circuit breaker (stop calling after N failures)
  3. A fallback (cached response, default value, or degraded mode)
  4. Retry with exponential backoff (for transient failures)

Anti-Pattern 7: The God Service

What it looks like: One service that does everything — handles users, orders, payments, notifications, analytics, and reporting.

Why teams fall into it: It starts as "we'll just add one more endpoint." Over time, the service grows to 50 endpoints, 200K lines of code, and 15-minute build times. Nobody fully understands it anymore.

How to detect it:

  • One service has 10x more code than any other
  • Deploy frequency is low because the blast radius is too high
  • Multiple teams work in the same repository and step on each other
  • A bug in the notification module brings down order processing

What to do instead: Apply the Strangler Fig pattern to gradually extract domains.

Extract one bounded context at a time. Start with the module that has the clearest boundary or the most independent scaling needs.

Anti-Pattern 8: Database as Message Queue

What it looks like: Using a database table as a message queue — polling for new rows, updating a status column, and deleting processed rows.

sql
-- The "poor man's queue"
CREATE TABLE job_queue (
    id SERIAL PRIMARY KEY,
    payload JSONB NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    created_at TIMESTAMP DEFAULT NOW(),
    processed_at TIMESTAMP
);

-- Worker polls every second
SELECT * FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;

-- After processing
UPDATE job_queue SET status = 'completed', processed_at = NOW() WHERE id = $1;

Why it's an anti-pattern (at scale):

IssueImpact
Polling overheadConstant queries even when queue is empty
Lock contentionFOR UPDATE locks rows, limiting throughput
No backpressureDatabase has no concept of consumer capacity
No dead letter queueFailed messages need custom retry logic
Table bloatCompleted rows accumulate, slowing queries
No fan-outCannot have multiple consumer groups

When it's actually fine: Low volume (< 100 messages/minute), simple workflows, team does not want to operate a message broker. The outbox pattern intentionally uses a database table as an intermediate step — but paired with CDC, not polling.

What to do instead: Use a proper message queue.

NeedUse
Simple job queueSQS, Redis streams
Event streamingKafka, Kinesis
Pub/subSNS + SQS, NATS
Task schedulingTemporal, Celery

See our Queue Selection Guide for detailed comparisons.

Anti-Pattern 9: Not Designing for Failure

What it looks like: The architecture assumes everything will work. No retry logic, no fallbacks, no graceful degradation, no health checks, no alerting.

The reality of distributed systems:

FailureFrequency
Network timeoutMultiple times per day
Service restart/deployDaily
Database failoverMonthly
Cloud provider incidentQuarterly
Regional outageYearly
Data center failureMulti-year

What to do instead: Design for failure at every level.

typescript
// Defense in depth: every external call is wrapped in resilience patterns
class ResilientServiceClient {
  private circuitBreaker: CircuitBreaker;
  private retryPolicy: RetryPolicy;
  private cache: Cache;
  private metrics: MetricsClient;

  async callService<T>(
    operation: string,
    call: () => Promise<T>,
    options: {
      fallback?: () => T;
      cacheTtl?: number;
      retries?: number;
      timeout?: number;
    } = {},
  ): Promise<T> {
    const { fallback, cacheTtl = 60, retries = 3, timeout = 5000 } = options;

    // Layer 1: Check cache
    const cached = await this.cache.get<T>(operation);
    if (cached) return cached;

    // Layer 2: Circuit breaker
    try {
      const result = await this.circuitBreaker.execute(async () => {
        // Layer 3: Retry with backoff
        return this.retryPolicy.execute(async () => {
          // Layer 4: Timeout
          const controller = new AbortController();
          const timer = setTimeout(() => controller.abort(), timeout);
          try {
            const result = await call();
            clearTimeout(timer);

            // Cache successful result
            await this.cache.set(operation, result, cacheTtl);
            this.metrics.increment(`${operation}.success`);
            return result;
          } catch (error) {
            clearTimeout(timer);
            this.metrics.increment(`${operation}.error`);
            throw error;
          }
        }, retries);
      });

      return result;
    } catch (error) {
      this.metrics.increment(`${operation}.circuit_open`);

      // Layer 5: Fallback
      if (fallback) {
        this.metrics.increment(`${operation}.fallback`);
        return fallback();
      }
      throw error;
    }
  }
}

The Resilience Checklist

For every external dependency your service has:

  • [ ] Timeout configured (not infinite)
  • [ ] Circuit breaker prevents cascading failure
  • [ ] Retry with exponential backoff and jitter
  • [ ] Fallback provides degraded but functional response
  • [ ] Health check monitors dependency status
  • [ ] Alert fires when error rate exceeds threshold
  • [ ] Bulkhead isolates failure to one dependency (not all)
  • [ ] Graceful degradation — the feature degrades, not the entire service

Anti-Pattern Summary Table

#Anti-PatternCore ProblemFix
1Distributed MonolithMicroservice boundaries without data boundariesDatabase per service, event-driven communication
2Premature MicroservicesSolving organizational problems you don't haveStart with a well-structured monolith
3Shared DatabaseSchema and performance couplingEach service owns its data
4Chatty ServicesToo many network calls per requestBulk APIs, BFF, denormalization
5Synchronous ChainsCascading failures, multiplicative latencyAsync events, limit sync depth
6No Circuit BreakersOne failure takes down everythingCircuit breaker + timeout + fallback
7God ServiceToo much code, too many responsibilities, too risky to changeStrangler Fig pattern to extract domains
8DB as QueuePolling overhead, lock contention, no backpressureUse SQS, Kafka, or Redis Streams
9No Failure DesignSystem assumes everything worksDesign for failure at every layer

Real-World Examples

Segment (Premature Microservices)

Segment built 140+ microservices and then migrated back to a monolith. The operational overhead of maintaining service meshes, distributed tracing, and coordinated deployments consumed more engineering time than building their product. Their blog post "Goodbye Microservices: From 100s of Problem Children to 1 Superstar" became a landmark case study against premature decomposition.

Knight Capital (No Failure Design)

Knight Capital Group lost $440 million in 45 minutes in 2012 due to a deployment anti-pattern. Old code was accidentally activated on one of eight servers (inconsistent deployment), and the system had no circuit breaker or kill switch to stop the runaway trading. This is the most expensive example of Anti-Pattern #9 (Not Designing for Failure) — a single deployment error with no safety net bankrupted the company.

Etsy (Avoiding the Distributed Monolith)

Etsy initially tried microservices, then returned to their PHP monolith after realizing they had built a distributed monolith with shared databases and synchronous chains. They instead invested in making their monolith modular and deployable 50+ times per day. Their approach proved that a well-maintained monolith with good CI/CD can be more productive than poorly-executed microservices.

Interview Tip

What to say

"The most common anti-pattern I watch for is the distributed monolith — teams split code into services but keep a shared database, eliminating every benefit of microservices while adding all the complexity. The test is simple: can each service be deployed independently? If the answer is no, it's a distributed monolith. I'd also always insist on circuit breakers for every synchronous inter-service call, because chain availability is the product of individual availabilities — five services at 99% means 95.1% combined. Segment's story of migrating 140 microservices back to a monolith is a powerful reminder that the question isn't 'should we use microservices?' but 'do we have the organizational scale and operational maturity to justify the complexity?'"

"What I cannot create, I do not understand." — Richard Feynman