Skip to content

Load Testing Deep Dive

Load testing is the practice of simulating real-world traffic against your system to find performance bottlenecks before your users do. Most teams skip it until production breaks. The teams that do it often do it wrong — running a single endpoint with constant load and calling it a day.

This page covers the different types of performance tests, when to use each one, how to design realistic test scenarios, and how to interpret results for capacity planning.

Related: Node.js Profiling | Database Tuning | Caching Strategies


Types of Performance Tests

TypeGoalDurationLoad Profile
Load TestValidate behavior at expected load15-60 minRamp up to target, hold steady
Stress TestFind the breaking point15-30 minRamp up until failure
Soak TestFind memory leaks, resource exhaustion2-8 hoursConstant moderate load
Spike TestTest sudden traffic bursts5-15 minSudden jump, then drop
Breakpoint TestFind maximum capacityVariableIncrement load stepwise
Scalability TestValidate auto-scaling behavior30-60 minRamp beyond single-instance capacity

Load Profiles Visualized

Load Test:          Stress Test:         Spike Test:
   ___________         /                    |
  /           \       /                     |___
 /             \     /                     /    \
/               \   /                     /      \___
                                         /

Soak Test:          Breakpoint Test:
   ____________        ____
  /            |      /    \____
 /             |     /    /     \____
/              |    /    /      /    \
               |

Tool Comparison

Overview

Featurek6GatlingLocustArtillery
LanguageJavaScriptScala/JavaPythonYAML/JS
ProtocolHTTP, WS, gRPCHTTP, WS, JMSHTTP, customHTTP, WS, Socket.io
ScriptingFull JS (ES6)Scala DSLPython classesYAML config + JS
Resource usageVery low (Go)Medium (JVM)High (Python)Medium (Node.js)
Distributedk6 Cloud, k6-operatorBuilt-inBuilt-inArtillery Cloud
CI/CD integrationExcellentGoodGoodExcellent
Real browserk6 browserNoNoNo
Open sourceYes (AGPL)Yes (Apache)Yes (MIT)Yes (MPL)
Best forDev teams, CIEnterprise JavaPython teamsQuick API tests

k6 Example

javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up to 100 VUs
    { duration: '5m', target: 100 },   // Hold at 100
    { duration: '2m', target: 200 },   // Ramp to 200
    { duration: '5m', target: 200 },   // Hold at 200
    { duration: '2m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],   // 95th percentile < 500ms
    http_req_failed: ['rate<0.01'],     // Error rate < 1%
    checks: ['rate>0.99'],              // 99% of checks pass
  },
};

export default function () {
  // Simulate real user flow
  const loginRes = http.post('https://api.example.com/login', JSON.stringify({
    username: 'testuser',
    password: 'testpass',
  }), { headers: { 'Content-Type': 'application/json' } });

  check(loginRes, {
    'login status 200': (r) => r.status === 200,
    'has token': (r) => r.json('token') !== undefined,
  });

  const token = loginRes.json('token');

  const headers = {
    Authorization: `Bearer ${token}`,
    'Content-Type': 'application/json',
  };

  // Browse products
  const productsRes = http.get('https://api.example.com/products', { headers });
  check(productsRes, {
    'products status 200': (r) => r.status === 200,
    'has products': (r) => r.json('items').length > 0,
  });

  sleep(Math.random() * 3 + 1);  // Think time: 1-4 seconds

  // View a product
  const productId = productsRes.json('items.0.id');
  http.get(`https://api.example.com/products/${productId}`, { headers });

  sleep(Math.random() * 2 + 1);
}

Gatling Example

scala
class ApiSimulation extends Simulation {

  val httpProtocol = http
    .baseUrl("https://api.example.com")
    .acceptHeader("application/json")

  val searchScenario = scenario("Search Flow")
    .exec(
      http("Login")
        .post("/login")
        .body(StringBody("""{"username":"test","password":"test"}"""))
        .check(jsonPath("$.token").saveAs("token"))
    )
    .pause(1, 3)
    .exec(
      http("Search Products")
        .get("/products?q=laptop")
        .header("Authorization", "Bearer #{token}")
        .check(jsonPath("$.items[0].id").saveAs("productId"))
    )
    .pause(1, 2)
    .exec(
      http("View Product")
        .get("/products/#{productId}")
        .header("Authorization", "Bearer #{token}")
        .check(status.is(200))
    )

  setUp(
    searchScenario.inject(
      rampUsers(100).during(2.minutes),
      constantUsersPerSec(50).during(5.minutes),
      rampUsersPerSec(50).to(100).during(3.minutes)
    )
  ).protocols(httpProtocol)
   .assertions(
     global.responseTime.percentile(95).lt(500),
     global.successfulRequests.percent.gt(99)
   )
}

Locust Example

python
from locust import HttpUser, task, between

class ApiUser(HttpUser):
    wait_time = between(1, 3)

    def on_start(self):
        """Login on start"""
        response = self.client.post("/login", json={
            "username": "testuser",
            "password": "testpass"
        })
        self.token = response.json()["token"]
        self.headers = {"Authorization": f"Bearer {self.token}"}

    @task(3)
    def browse_products(self):
        self.client.get("/products", headers=self.headers)

    @task(2)
    def search_products(self):
        self.client.get("/products?q=laptop", headers=self.headers)

    @task(1)
    def view_product(self):
        product_id = "prod-123"
        self.client.get(f"/products/{product_id}", headers=self.headers)

Artillery Example

yaml
config:
  target: "https://api.example.com"
  phases:
    - duration: 120
      arrivalRate: 10
      name: "Warm up"
    - duration: 300
      arrivalRate: 50
      name: "Sustained load"
    - duration: 120
      arrivalRate: 100
      name: "Peak load"
  defaults:
    headers:
      Content-Type: "application/json"
  ensure:
    p95: 500
    maxErrorRate: 1

scenarios:
  - name: "Browse and purchase"
    flow:
      - post:
          url: "/login"
          json:
            username: "testuser"
            password: "testpass"
          capture:
            json: "$.token"
            as: "token"
      - get:
          url: "/products"
          headers:
            Authorization: "Bearer {​{ token }}"
      - think: 2
      - get:
          url: "/products/prod-123"
          headers:
            Authorization: "Bearer {​{ token }}"

Test Design Principles

1. Model Real User Behavior

ElementBad TestGood Test
Think timeNo pauses between requestsRandom pauses (1-5s) matching real usage
User flowHit one endpoint repeatedlyComplete user journeys (login, browse, buy)
DataSame request every timeParameterized data, varied payloads
Ramp-upFull load immediatelyGradual ramp matching real traffic patterns

2. Ramp-Up Patterns

javascript
// k6 ramp-up patterns

// Linear ramp
export const options = {
  stages: [
    { duration: '5m', target: 500 },   // Ramp to 500 over 5 minutes
    { duration: '10m', target: 500 },   // Hold for 10 minutes
    { duration: '5m', target: 0 },      // Ramp down
  ],
};

// Step ramp (find breakpoint)
export const options = {
  stages: [
    { duration: '3m', target: 100 },
    { duration: '3m', target: 100 },  // Hold, observe
    { duration: '3m', target: 200 },
    { duration: '3m', target: 200 },  // Hold, observe
    { duration: '3m', target: 300 },
    { duration: '3m', target: 300 },  // Hold, observe
    { duration: '3m', target: 400 },
    { duration: '3m', target: 400 },  // Hold, observe — when does it break?
  ],
};

3. Define Pass/Fail Thresholds

Set thresholds based on your SLOs (Service Level Objectives):

javascript
export const options = {
  thresholds: {
    // Response time
    'http_req_duration': ['p(50)<200', 'p(95)<500', 'p(99)<1000'],

    // Error rate
    'http_req_failed': ['rate<0.01'],

    // Throughput
    'http_reqs': ['rate>100'],

    // Per-endpoint thresholds
    'http_req_duration{name:login}': ['p(95)<300'],
    'http_req_duration{name:search}': ['p(95)<800'],

    // Custom metrics
    'checks': ['rate>0.99'],
  },
};

TIP

Use percentiles (p95, p99), not averages. An average of 200ms can hide the fact that 5% of your users experience 5-second response times. The 95th percentile tells the truth.


Interpreting Results

Key Metrics

MetricWhat It Tells You
p50 (median)Typical user experience
p95Worst case for most users
p99Tail latency (affects power users, API consumers)
Error rateSystem reliability under load
Throughput (RPS)How many requests per second the system handles
Active VUsHow many virtual users are active
Iteration durationHow long a complete user journey takes

Red Flags

SymptomLikely Cause
Latency increases linearly with loadResource contention (CPU, connections)
Latency spikes at specific VU countHitting a pool limit (DB connections, threads)
Error rate climbs while latency stays lowRate limiting or circuit breaker
Everything degrades at onceMemory exhaustion, GC pressure
Latency is fine but throughput plateausBottleneck (single thread, lock contention)
Increasing latency during soak testMemory leak, connection leak

Capacity Planning from Results

The Process

  1. Baseline test: Find throughput at acceptable latency (e.g., 500 RPS at p95 < 200ms)
  2. Per-instance capacity: Divide by number of instances (e.g., 500 RPS / 4 instances = 125 RPS per instance)
  3. Peak traffic estimate: Analyze production traffic patterns, find peak (e.g., 800 RPS)
  4. Headroom factor: Multiply by 1.5-2x for safety (e.g., 1200-1600 RPS capacity needed)
  5. Instance count: Divide required capacity by per-instance throughput (1600 / 125 = 13 instances)
Required instances = (Peak RPS * Safety Factor) / RPS per Instance
                   = (800 * 2) / 125
                   = 12.8 → 13 instances

Scaling Limits

WARNING

Linear scaling is a myth. Amdahl's Law says that the speedup from parallelism is limited by the serial portion of your workload. A shared database, a global lock, or a single-threaded component will eventually cap your throughput regardless of how many instances you add.


CI/CD Integration

k6 in GitHub Actions

yaml
name: Load Test
on:
  pull_request:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: grafana/k6-action@v0.3.1
        with:
          filename: tests/load/api-test.js
          flags: --out json=results.json
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: k6-results
          path: results.json

Shift-Left Testing

EnvironmentTest TypeFrequencyDuration
PR/CISmoke test (10 VUs, 1 min)Every PR1-2 min
StagingLoad test (target load)Daily/weekly15-30 min
Pre-prodFull load + stressBefore release1-2 hours
ProductionSoak test (shadow traffic)Monthly4-8 hours

TIP

Start with smoke tests in CI — they catch regressions without slowing down development. Graduate to full load tests on staging for release validation.


Common Mistakes

MistakeWhy It's Wrong
Testing from the same machine as the serverNetwork is not the bottleneck, you are measuring localhost
No think time between requestsReal users pause; without it you test 10x the actual load
Testing a single endpointReal traffic hits many endpoints with different costs
Using average latencyAverages hide tail latency problems
Not warming up the systemCold JVMs, empty caches, cold DB connections skew results
Running from a single regionDoes not reveal geographic latency issues
Ignoring client-side bottlenecksYour load generator might be the bottleneck, not the server

Last updated: 2026-03-20

"What I cannot create, I do not understand." — Richard Feynman