Skip to content
System Design Interview0%

Load Balancing

Load balancing is the practice of distributing incoming network traffic across multiple backend servers so that no single server bears an unsustainable share of demand. It is one of the oldest and most consequential patterns in systems engineering. Get it wrong and your horizontally-scaled fleet is nothing more than an expensive collection of idle machines while one node melts. Get it right and you unlock near-linear capacity growth, zero-downtime deployments, graceful degradation, and the ability to tolerate individual server failures without any user ever noticing.

This section does not merely list algorithms and products. It traces the full decision tree from "do I even need a load balancer?" through protocol-level trade-offs, algorithmic analysis, session management pitfalls, global traffic steering, and real production configurations for the three dominant reverse proxies (NGINX, HAProxy, Envoy). Every page assumes you will be operating these systems under pressure and need to understand why things work, not just how to configure them.

When Load Balancing Is Needed

The question is not "should I use a load balancer?" but "at which points in my architecture do I need load distribution, and what kind?"

ScenarioWhy Load Balancing HelpsTypical Approach
Single app server receiving more traffic than it can handleSpread requests across multiple instancesL7 reverse proxy (NGINX, HAProxy)
Microservices calling each other internallyPrevent any single service pod from being overwhelmedClient-side LB (gRPC) or service mesh (Envoy)
Database read replicasDistribute read queries without overloading a single replicaTCP-level (L4) proxy or application-level routing
Multi-region deploymentRoute users to the nearest healthy regionDNS-based (GeoDNS) or anycast
WebSocket serversMaintain sticky connections while distributing new onesL7 LB with session affinity
CI/CD runners, worker poolsSpread jobs across available workersQueue-based (different model, but same principle)
gRPC servicesHTTP/2 multiplexing defeats connection-level balancingApplication-aware L7 LB or client-side LB

If you only have a single backend instance and no plans to add more, you still benefit from a reverse proxy for TLS termination, rate limiting, and request buffering — but you do not need load balancing per se.

Concept Map

The Fundamental Decision: Where Does the Load Balancer Live?

Load balancers can be deployed at many points in an architecture, and understanding the topology is just as important as understanding the algorithm.

External (Edge) Load Balancer

Sits between the public internet and your application tier. This is what most people think of first.

                         ┌──────────────┐
       Internet ────────▶│  Edge LB     │
                         │  (L7 or L4)  │
                         └──────┬───────┘
                    ┌───────────┼───────────┐
                    ▼           ▼           ▼
               ┌────────┐ ┌────────┐ ┌────────┐
               │ App 1  │ │ App 2  │ │ App 3  │
               └────────┘ └────────┘ └────────┘

Responsibilities: TLS termination, DDoS mitigation, rate limiting, WAF rules, compression, caching, routing.

Internal (Service-to-Service) Load Balancer

Sits between microservices inside your private network. Often invisible to the outside world.

               ┌────────┐      ┌──────────────┐      ┌────────────┐
               │ API GW │─────▶│ Internal LB  │─────▶│ Service B  │
               └────────┘      └──────────────┘      │ (3 pods)   │
                                                      └────────────┘

In Kubernetes, this is what a ClusterIP Service with kube-proxy does. In a service mesh, it is what the Envoy sidecar does.

Client-Side Load Balancer

The load balancing logic runs inside the calling application. No separate proxy exists.

               ┌────────────────────────────────────┐
               │  Client Application                │
               │  ┌──────────────────────────────┐  │
               │  │  LB Library                  │  │
               │  │  (round robin / P2C / etc.)  │  │
               │  └──────┬───────────┬───────────┘  │
               └─────────┼───────────┼──────────────┘
                         ▼           ▼
                    ┌────────┐ ┌────────┐
                    │ Svc A  │ │ Svc B  │
                    └────────┘ └────────┘

gRPC uses this model heavily. The client fetches a list of backend addresses (from DNS or a service registry) and balances across them directly, avoiding the extra network hop of a proxy.

Learning Path

Follow this order for the most coherent understanding:

OrderTopicWhy This Order
1L4 vs L7 Load BalancingThe foundational architectural decision — transport vs application layer
2AlgorithmsHow requests get assigned to backends — from round robin to power of two choices
3Health ChecksHow the load balancer knows which backends are alive and ready
4Session AffinitySticky sessions, their problems, and stateless alternatives
5Global Load BalancingDNS, anycast, and multi-region traffic management
6NGINX ConfigurationProduction NGINX load balancer setup from scratch
7HAProxy ConfigurationProduction HAProxy setup — the gold standard for TCP/HTTP proxying
8Envoy ConfigurationEnvoy proxy — the modern service mesh data plane

Quick Reference: Choosing a Load Balancer

CriterionNGINXHAProxyEnvoyCloud LB (ALB/NLB)
Best forWeb serving + LB comboPure proxying / LBService mesh, gRPCManaged, zero-ops
L4 supportYes (stream module)ExcellentExcellentNLB
L7 supportExcellentExcellentExcellentALB
gRPCLimitedLimitedExcellentALB supports it
Dynamic configReload requiredReload (hitless)xDS API (no reload)API-driven
ObservabilityBasic (logs, stub_status)Excellent (stats page)Excellent (Prometheus, tracing)CloudWatch
Config complexityLowMediumHighLow
CommunityMassiveLargeGrowing fastN/A

Key Insight

The entire field of load balancing can be reduced to three questions:

  1. Which backend should this request go to? (Algorithms)
  2. Is that backend healthy enough to handle it? (Health checks)
  3. How much does the load balancer need to understand about the request to make a good decision? (L4 vs L7)

Every configuration option, every tuning knob, and every architectural pattern in this section is an answer to one of these three questions — each making different trade-offs about performance, correctness, and operational complexity.

"What I cannot create, I do not understand." — Richard Feynman