Load Balancing

Load balancing is the practice of distributing incoming network traffic across multiple backend servers so that no single server bears an unsustainable share of demand. It is one of the oldest and most consequential patterns in systems engineering. Get it wrong and your horizontally-scaled fleet is nothing more than an expensive collection of idle machines while one node melts. Get it right and you unlock near-linear capacity growth, zero-downtime deployments, graceful degradation, and the ability to tolerate individual server failures without any user ever noticing.

This section does not merely list algorithms and products. It traces the full decision tree from "do I even need a load balancer?" through protocol-level trade-offs, algorithmic analysis, session management pitfalls, global traffic steering, and real production configurations for the three dominant reverse proxies (NGINX, HAProxy, Envoy). Every page assumes you will be operating these systems under pressure and need to understand why things work, not just how to configure them.

When Load Balancing Is Needed

The question is not "should I use a load balancer?" but "at which points in my architecture do I need load distribution, and what kind?"

Scenario	Why Load Balancing Helps	Typical Approach
Single app server receiving more traffic than it can handle	Spread requests across multiple instances	L7 reverse proxy (NGINX, HAProxy)
Microservices calling each other internally	Prevent any single service pod from being overwhelmed	Client-side LB (gRPC) or service mesh (Envoy)
Database read replicas	Distribute read queries without overloading a single replica	TCP-level (L4) proxy or application-level routing
Multi-region deployment	Route users to the nearest healthy region	DNS-based (GeoDNS) or anycast
WebSocket servers	Maintain sticky connections while distributing new ones	L7 LB with session affinity
CI/CD runners, worker pools	Spread jobs across available workers	Queue-based (different model, but same principle)
gRPC services	HTTP/2 multiplexing defeats connection-level balancing	Application-aware L7 LB or client-side LB

If you only have a single backend instance and no plans to add more, you still benefit from a reverse proxy for TLS termination, rate limiting, and request buffering — but you do not need load balancing per se.

Concept Map

The Fundamental Decision: Where Does the Load Balancer Live?

Load balancers can be deployed at many points in an architecture, and understanding the topology is just as important as understanding the algorithm.

External (Edge) Load Balancer

Sits between the public internet and your application tier. This is what most people think of first.

                         ┌──────────────┐
       Internet ────────▶│  Edge LB     │
                         │  (L7 or L4)  │
                         └──────┬───────┘
                    ┌───────────┼───────────┐
                    ▼           ▼           ▼
               ┌────────┐ ┌────────┐ ┌────────┐
               │ App 1  │ │ App 2  │ │ App 3  │
               └────────┘ └────────┘ └────────┘

Responsibilities: TLS termination, DDoS mitigation, rate limiting, WAF rules, compression, caching, routing.

Internal (Service-to-Service) Load Balancer

Sits between microservices inside your private network. Often invisible to the outside world.

               ┌────────┐      ┌──────────────┐      ┌────────────┐
               │ API GW │─────▶│ Internal LB  │─────▶│ Service B  │
               └────────┘      └──────────────┘      │ (3 pods)   │
                                                      └────────────┘

In Kubernetes, this is what a ClusterIP Service with kube-proxy does. In a service mesh, it is what the Envoy sidecar does.

Client-Side Load Balancer

The load balancing logic runs inside the calling application. No separate proxy exists.

               ┌────────────────────────────────────┐
               │  Client Application                │
               │  ┌──────────────────────────────┐  │
               │  │  LB Library                  │  │
               │  │  (round robin / P2C / etc.)  │  │
               │  └──────┬───────────┬───────────┘  │
               └─────────┼───────────┼──────────────┘
                         ▼           ▼
                    ┌────────┐ ┌────────┐
                    │ Svc A  │ │ Svc B  │
                    └────────┘ └────────┘

gRPC uses this model heavily. The client fetches a list of backend addresses (from DNS or a service registry) and balances across them directly, avoiding the extra network hop of a proxy.

Learning Path

Follow this order for the most coherent understanding:

Order	Topic	Why This Order
1	L4 vs L7 Load Balancing	The foundational architectural decision — transport vs application layer
2	Algorithms	How requests get assigned to backends — from round robin to power of two choices
3	Health Checks	How the load balancer knows which backends are alive and ready
4	Session Affinity	Sticky sessions, their problems, and stateless alternatives
5	Global Load Balancing	DNS, anycast, and multi-region traffic management
6	NGINX Configuration	Production NGINX load balancer setup from scratch
7	HAProxy Configuration	Production HAProxy setup — the gold standard for TCP/HTTP proxying
8	Envoy Configuration	Envoy proxy — the modern service mesh data plane

Quick Reference: Choosing a Load Balancer

Criterion	NGINX	HAProxy	Envoy	Cloud LB (ALB/NLB)
Best for	Web serving + LB combo	Pure proxying / LB	Service mesh, gRPC	Managed, zero-ops
L4 support	Yes (stream module)	Excellent	Excellent	NLB
L7 support	Excellent	Excellent	Excellent	ALB
gRPC	Limited	Limited	Excellent	ALB supports it
Dynamic config	Reload required	Reload (hitless)	xDS API (no reload)	API-driven
Observability	Basic (logs, stub_status)	Excellent (stats page)	Excellent (Prometheus, tracing)	CloudWatch
Config complexity	Low	Medium	High	Low
Community	Massive	Large	Growing fast	N/A

Key Insight

The entire field of load balancing can be reduced to three questions:

Which backend should this request go to? (Algorithms)
Is that backend healthy enough to handle it? (Health checks)
How much does the load balancer need to understand about the request to make a good decision? (L4 vs L7)

Every configuration option, every tuning knob, and every architectural pattern in this section is an answer to one of these three questions — each making different trade-offs about performance, correctness, and operational complexity.

Load Balancing ​

When Load Balancing Is Needed ​

Concept Map ​

The Fundamental Decision: Where Does the Load Balancer Live? ​

External (Edge) Load Balancer ​

Internal (Service-to-Service) Load Balancer ​

Client-Side Load Balancer ​

Learning Path ​

Quick Reference: Choosing a Load Balancer ​

Key Insight ​

Related Pages