Rate Limiting
Rate limiting controls how many requests a client can make in a given time window. Without it, a single client — whether malicious, buggy, or simply enthusiastic — can monopolize your API, degrade performance for everyone, and run up your infrastructure costs. Rate limiting is not optional for any production API.
The core algorithms are simple, but the implementation decisions matter: per-user or per-IP? In-memory or distributed? Hard reject or graceful degradation? Fixed window or sliding window? Spring Boot does not include built-in rate limiting, but Bucket4j, Spring Cloud Gateway, and Redis-based solutions integrate cleanly.
Rate Limiting Algorithms
Token Bucket
The most common algorithm. A bucket holds tokens. Each request consumes a token. Tokens are refilled at a fixed rate. When the bucket is empty, requests are rejected.
Token Bucket (capacity=10, refill=2/second):
Time 0: [T T T T T T T T T T] 10 tokens, full
Client sends 3 requests
Time 0: [T T T T T T T _ _ _] 7 tokens
Time 1s: [T T T T T T T T T _] 9 tokens (2 refilled)
Client sends 5 requests
Time 1s: [T T T T _ _ _ _ _ _] 4 tokens
Time 2s: [T T T T T T _ _ _ _] 6 tokens (2 refilled)
Client sends 8 requests
6 accepted, 2 rejected (429 Too Many Requests)
Time 2s: [_ _ _ _ _ _ _ _ _ _] 0 tokensAdvantages: Allows bursts up to bucket capacity. Smooth refill rate. Simple to implement.
Sliding Window Log
Track exact timestamps of each request. Count requests in the sliding window:
Window size: 1 minute, limit: 100
Timestamps: [10:00:01, 10:00:05, 10:00:12, ..., 10:00:58]
New request at 10:01:03:
- Remove timestamps before 10:00:03
- Count remaining + 1
- If count > 100, rejectAdvantages: Precise. No boundary effects. Disadvantages: Memory-intensive (stores every timestamp).
Fixed Window Counter
Simple counter per time window:
Window: 10:00 - 10:01 → counter = 47
Window: 10:01 - 10:02 → counter = 0 (reset)Disadvantage: Boundary problem — 100 requests at 10:00:59 + 100 at 10:01:00 = 200 in 2 seconds.
Sliding Window Counter
Weighted combination of current and previous window to approximate a sliding window:
Previous window (10:00-10:01): 84 requests
Current window (10:01-10:02): 36 requests
Current position: 10:01:15 (25% into current window)
Estimated rate = 84 * 0.75 + 36 = 99 requests
(75% of previous window still counts)Bucket4j: In-Memory Rate Limiting
Dependencies
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-core</artifactId>
<version>8.10.1</version>
</dependency>Basic Implementation
@Component
public class RateLimiterService {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
public Bucket resolveBucket(String key) {
return buckets.computeIfAbsent(key, this::createBucket);
}
private Bucket createBucket(String key) {
return Bucket.builder()
.addLimit(BandwidthBuilder.builder()
.capacity(100) // Max 100 tokens
.refillGreedy(100, Duration.ofMinutes(1)) // Refill 100/min
.build())
.addLimit(BandwidthBuilder.builder()
.capacity(20) // Burst limit
.refillGreedy(20, Duration.ofSeconds(10)) // 20 per 10s
.build())
.build();
}
public boolean tryConsume(String key) {
return resolveBucket(key).tryConsume(1);
}
public ConsumptionProbe tryConsumeWithProbe(String key) {
return resolveBucket(key).tryConsumeAndReturnRemaining(1);
}
}Rate Limiting Filter
@Component
@Order(1)
public class RateLimitFilter extends OncePerRequestFilter {
private final RateLimiterService rateLimiterService;
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response,
FilterChain filterChain)
throws ServletException, IOException {
String key = resolveKey(request);
ConsumptionProbe probe = rateLimiterService.tryConsumeWithProbe(key);
// Set standard rate limit headers
response.setHeader("X-RateLimit-Limit", "100");
response.setHeader("X-RateLimit-Remaining",
String.valueOf(probe.getRemainingTokens()));
if (probe.isConsumed()) {
filterChain.doFilter(request, response);
} else {
long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
response.setHeader("Retry-After", String.valueOf(waitSeconds));
response.setHeader("X-RateLimit-Reset",
String.valueOf(Instant.now().plusSeconds(waitSeconds).getEpochSecond()));
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
response.getWriter().write("""
{"error": "rate_limit_exceeded", "message": "Too many requests. Retry after %d seconds."}
""".formatted(waitSeconds));
}
}
private String resolveKey(HttpServletRequest request) {
// Prefer authenticated user ID, fallback to IP
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth != null && auth.isAuthenticated()
&& !"anonymousUser".equals(auth.getPrincipal())) {
return "user:" + auth.getName();
}
return "ip:" + getClientIp(request);
}
private String getClientIp(HttpServletRequest request) {
String xff = request.getHeader("X-Forwarded-For");
if (xff != null && !xff.isEmpty()) {
return xff.split(",")[0].trim();
}
return request.getRemoteAddr();
}
@Override
protected boolean shouldNotFilter(HttpServletRequest request) {
return request.getRequestURI().startsWith("/actuator");
}
}Per-Endpoint Rate Limiting with Custom Annotation
The Annotation
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface RateLimit {
int capacity() default 100;
int refillTokens() default 100;
int refillSeconds() default 60;
String keyExpression() default ""; // SpEL expression for custom key
}The Interceptor
@Component
public class RateLimitInterceptor implements HandlerInterceptor {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
@Override
public boolean preHandle(HttpServletRequest request,
HttpServletResponse response,
Object handler) throws Exception {
if (!(handler instanceof HandlerMethod handlerMethod)) {
return true;
}
RateLimit rateLimit = handlerMethod.getMethodAnnotation(RateLimit.class);
if (rateLimit == null) {
return true;
}
String key = buildKey(request, handlerMethod, rateLimit);
Bucket bucket = buckets.computeIfAbsent(key, k -> createBucket(rateLimit));
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
response.setHeader("X-RateLimit-Limit", String.valueOf(rateLimit.capacity()));
response.setHeader("X-RateLimit-Remaining",
String.valueOf(probe.getRemainingTokens()));
if (!probe.isConsumed()) {
long retryAfter = probe.getNanosToWaitForRefill() / 1_000_000_000;
response.setHeader("Retry-After", String.valueOf(retryAfter));
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
response.getWriter().write(
"{\"error\":\"rate_limit_exceeded\",\"retry_after\":" + retryAfter + "}");
return false;
}
return true;
}
private Bucket createBucket(RateLimit rateLimit) {
return Bucket.builder()
.addLimit(BandwidthBuilder.builder()
.capacity(rateLimit.capacity())
.refillGreedy(rateLimit.refillTokens(),
Duration.ofSeconds(rateLimit.refillSeconds()))
.build())
.build();
}
private String buildKey(HttpServletRequest request,
HandlerMethod method,
RateLimit rateLimit) {
String userKey = resolveUserKey(request);
String methodKey = method.getMethod().getDeclaringClass().getSimpleName()
+ "." + method.getMethod().getName();
return methodKey + ":" + userKey;
}
private String resolveUserKey(HttpServletRequest request) {
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth != null && auth.isAuthenticated()) {
return auth.getName();
}
return request.getRemoteAddr();
}
}Usage
@RestController
@RequestMapping("/api")
public class ApiController {
@GetMapping("/data")
@RateLimit(capacity = 100, refillTokens = 100, refillSeconds = 60)
public ResponseEntity<List<Data>> getData() {
return ResponseEntity.ok(dataService.findAll());
}
// Stricter limit for expensive operations
@PostMapping("/reports/generate")
@RateLimit(capacity = 5, refillTokens = 5, refillSeconds = 3600)
public ResponseEntity<Report> generateReport(@RequestBody ReportRequest request) {
return ResponseEntity.ok(reportService.generate(request));
}
// Very strict limit for authentication attempts
@PostMapping("/auth/login")
@RateLimit(capacity = 10, refillTokens = 10, refillSeconds = 900)
public ResponseEntity<AuthResponse> login(@RequestBody LoginRequest request) {
return ResponseEntity.ok(authService.login(request));
}
}Distributed Rate Limiting with Redis
In-memory rate limiting breaks with multiple application instances — each instance maintains its own counters. Redis provides a shared, atomic counter:
Dependencies
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-redis</artifactId>
<version>8.10.1</version>
</dependency>
<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
</dependency>Redis-Backed Bucket4j
@Configuration
public class RedisRateLimitConfig {
@Bean
public ProxyManager<String> proxyManager(RedisConnectionFactory connectionFactory) {
LettuceBasedProxyManager<String> proxyManager = LettuceBasedProxyManager
.builderFor(RedisClient.create(
RedisURI.builder()
.withHost("localhost")
.withPort(6379)
.build()))
.withExpirationAfterWrite(
ExpirationAfterWriteStrategy.basedOnTimeForRefillingBucketUpToMax(
Duration.ofMinutes(5)))
.build();
return proxyManager;
}
}@Service
public class RedisRateLimiterService {
private final ProxyManager<String> proxyManager;
public RedisRateLimiterService(ProxyManager<String> proxyManager) {
this.proxyManager = proxyManager;
}
public ConsumptionProbe tryConsume(String key, RateLimitPlan plan) {
BucketConfiguration config = BucketConfiguration.builder()
.addLimit(BandwidthBuilder.builder()
.capacity(plan.getCapacity())
.refillGreedy(plan.getRefillTokens(),
Duration.ofSeconds(plan.getRefillSeconds()))
.build())
.build();
Bucket bucket = proxyManager.builder()
.build(key, () -> config);
return bucket.tryConsumeAndReturnRemaining(1);
}
}Sliding Window Counter in Redis (Manual)
@Service
public class RedisSlidingWindowRateLimiter {
private final StringRedisTemplate redisTemplate;
public boolean isAllowed(String key, int maxRequests, Duration window) {
String redisKey = "ratelimit:" + key;
long now = Instant.now().toEpochMilli();
long windowStart = now - window.toMillis();
// Lua script for atomic sliding window check
String luaScript = """
-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1])
-- Count entries in window
local count = redis.call('ZCARD', KEYS[1])
if count < tonumber(ARGV[2]) then
-- Add the current request
redis.call('ZADD', KEYS[1], ARGV[3], ARGV[3])
-- Set expiry on the key
redis.call('PEXPIRE', KEYS[1], ARGV[4])
return 1
end
return 0
""";
RedisScript<Long> script = RedisScript.of(luaScript, Long.class);
Long result = redisTemplate.execute(script,
List.of(redisKey),
String.valueOf(windowStart), // ARGV[1]: window start
String.valueOf(maxRequests), // ARGV[2]: max requests
String.valueOf(now), // ARGV[3]: current time (score + member)
String.valueOf(window.toMillis())); // ARGV[4]: TTL
return result != null && result == 1;
}
}Spring Cloud Gateway Rate Limiter
For API gateway rate limiting:
spring:
cloud:
gateway:
routes:
- id: api-service
uri: lb://api-service
predicates:
- Path=/api/**
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter:
replenishRate: 10 # 10 requests per second
burstCapacity: 20 # Allow bursts up to 20
requestedTokens: 1 # 1 token per request
key-resolver: "#{@userKeyResolver}"@Configuration
public class GatewayRateLimitConfig {
@Bean
public KeyResolver userKeyResolver() {
return exchange -> {
// Resolve by authenticated user
return exchange.getPrincipal()
.map(Principal::getName)
.defaultIfEmpty(
exchange.getRequest().getRemoteAddress()
.getAddress().getHostAddress());
};
}
@Bean
public KeyResolver apiKeyResolver() {
return exchange -> {
String apiKey = exchange.getRequest().getHeaders()
.getFirst("X-API-Key");
return Mono.justOrEmpty(apiKey)
.switchIfEmpty(Mono.just("anonymous"));
};
}
}Tiered Rate Limits (API Plans)
public enum ApiPlan {
FREE(60, 60, 60), // 60 req/min
BASIC(600, 600, 60), // 600 req/min
PRO(6000, 6000, 60), // 6000 req/min
ENTERPRISE(60000, 60000, 60); // 60000 req/min
private final int capacity;
private final int refillTokens;
private final int refillSeconds;
ApiPlan(int capacity, int refillTokens, int refillSeconds) {
this.capacity = capacity;
this.refillTokens = refillTokens;
this.refillSeconds = refillSeconds;
}
// getters
}
@Service
public class TieredRateLimiter {
private final ProxyManager<String> proxyManager;
private final ApiKeyService apiKeyService;
public RateLimitResult checkLimit(String apiKey) {
ApiPlan plan = apiKeyService.getPlan(apiKey);
BucketConfiguration config = BucketConfiguration.builder()
.addLimit(BandwidthBuilder.builder()
.capacity(plan.getCapacity())
.refillGreedy(plan.getRefillTokens(),
Duration.ofSeconds(plan.getRefillSeconds()))
.build())
.build();
Bucket bucket = proxyManager.builder()
.build("plan:" + apiKey, () -> config);
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
return new RateLimitResult(
probe.isConsumed(),
probe.getRemainingTokens(),
plan.getCapacity(),
plan.name(),
probe.isConsumed() ? 0
: probe.getNanosToWaitForRefill() / 1_000_000_000
);
}
}Monitoring and Alerting
@Component
public class RateLimitMetrics {
private final MeterRegistry meterRegistry;
public void recordRateLimitHit(String endpoint, String clientId, boolean allowed) {
meterRegistry.counter("rate_limit.requests",
"endpoint", endpoint,
"result", allowed ? "allowed" : "rejected")
.increment();
if (!allowed) {
meterRegistry.counter("rate_limit.rejections",
"endpoint", endpoint,
"client", clientId)
.increment();
}
}
}Response Headers Reference
| Header | Description | Example |
|---|---|---|
X-RateLimit-Limit | Maximum requests in the window | 100 |
X-RateLimit-Remaining | Requests remaining | 47 |
X-RateLimit-Reset | Unix timestamp when limit resets | 1711382400 |
Retry-After | Seconds to wait before retrying | 30 |
Rate limiting protects your API from abuse, ensures fair usage across clients, and prevents a single bad actor from affecting everyone. Start with in-memory Bucket4j for single-instance deployments, move to Redis-backed rate limiting when you scale to multiple instances, and always communicate limits clearly through response headers so clients can self-regulate.
Common Pitfalls
Pitfall 1: Using in-memory rate limiting with multiple application instances
Each instance maintains its own counters. A client hitting two instances effectively gets double the rate limit. Fix: Use Redis-backed rate limiting (Bucket4j with Redis proxy, or Redis Lua scripts) for distributed counting across all instances.
Pitfall 2: Rate limiting by IP behind a reverse proxy
All requests come from the proxy's IP address, so the rate limiter treats all users as one client. Fix: Use the X-Forwarded-For header to get the real client IP. Better yet, rate limit by authenticated user ID when available, falling back to IP for unauthenticated requests.
Pitfall 3: Not returning standard rate limit headers
Without X-RateLimit-Remaining and Retry-After headers, clients cannot self-regulate and hammer the API until they get a 429 error. Fix: Always include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After headers in every response (both successful and rate-limited).
Pitfall 4: Applying the same rate limit to all endpoints
Login endpoints need strict limits (prevent brute force) while read endpoints can be more lenient. A one-size-fits-all approach is either too strict or too loose. Fix: Use per-endpoint rate limiting with a custom @RateLimit annotation. Apply strict limits (10 req/15min) to auth endpoints, moderate limits (100 req/min) to write endpoints, and lenient limits (1000 req/min) to read endpoints.
Pitfall 5: Not monitoring rate limit rejections
Rate limiting silently rejects requests without visibility into how often it happens, which clients are affected, and whether limits are set correctly. Fix: Emit metrics for every rate limit check (rate_limit.requests{result=allowed|rejected, endpoint=...}). Alert when rejection rate exceeds a threshold. Review limits periodically based on actual usage patterns.
Interview Questions
Q1: What is the token bucket algorithm and why is it the most popular rate limiting approach?
Answer
The token bucket maintains a bucket with a maximum capacity of tokens. Each request consumes one token. Tokens are refilled at a fixed rate (e.g., 10 tokens per second). When the bucket is empty, requests are rejected. The key advantage over fixed window counters is that it allows bursts (up to bucket capacity) while maintaining a steady long-term rate. For example, with capacity=100 and refill=100/minute, a client can burst 100 requests instantly, then must wait for refill. This matches real usage patterns better than strict per-second limits.
Q2: What is the difference between fixed window, sliding window log, and sliding window counter?
Answer
Fixed window: Divides time into fixed intervals (e.g., 1-minute windows). Counts requests per window. Simple but has a boundary problem -- 100 requests at 10:00:59 + 100 at 10:01:00 = 200 in 2 seconds. Sliding window log: Stores exact timestamps of all requests. Counts requests in the last N seconds from the current moment. Precise but memory-intensive. Sliding window counter: Approximates a sliding window by weighting the current and previous window counts based on the position within the current window. For example, 25% into the current window: rate = previous_count * 0.75 + current_count. Good balance of accuracy and efficiency.
Q3: How do you implement distributed rate limiting with Redis?
Answer
Use Redis atomic operations to maintain shared counters across application instances. The most common approach uses a Lua script executed atomically on Redis: (1) Remove expired entries from a sorted set (sliding window). (2) Count remaining entries. (3) If under the limit, add the current timestamp as a new entry. (4) Set TTL on the key. The Lua script runs atomically, preventing race conditions. Alternatively, use Bucket4j's Redis proxy (LettuceBasedProxyManager) which handles the distributed token bucket logic internally. Both approaches ensure consistent rate limiting regardless of which application instance handles the request.
Q4: How do you implement tiered rate limits based on API subscription plans?
Answer
Define rate limit tiers as an enum or configuration: FREE(60/min), BASIC(600/min), PRO(6000/min), ENTERPRISE(60000/min). Resolve the user's plan from their API key or JWT claims. Create a BucketConfiguration dynamically based on the plan and use the API key as the bucket identifier in Redis. The rate limit filter: (1) Extracts the API key from the request header. (2) Looks up the plan. (3) Creates or retrieves the bucket for that API key with the plan's configuration. (4) Consumes a token and returns appropriate headers including X-RateLimit-Plan.
Q5: What HTTP status code and headers should a rate-limited response include?
Answer
Return 429 Too Many Requests with these headers: X-RateLimit-Limit (maximum requests in the window, e.g., 100), X-RateLimit-Remaining (requests remaining, e.g., 0), X-RateLimit-Reset (Unix timestamp when the limit resets), and Retry-After (seconds until the client should retry, e.g., 30). The response body should include a JSON error with a machine-readable error code and human-readable message. Include rate limit headers on ALL responses (not just 429s) so clients can monitor their consumption proactively.