Distributed API Rate Limiting with NATS KV

When I first configured rate limiting across multiple API replicas on our k3s cluster, I assumed a simple in-memory counter per instance would be enough. It was not. During load testing, I watched a single user blow past their quota by a factor of three simply because requests were round-robined across replicas. Each instance maintained its own counter, blissfully unaware of the others. That experience pushed me toward a shared counter backed by NATS KV, where compare-and-swap semantics gave me the atomic guarantees I needed without bolting on Redis or another external dependency.

Rate limiting in a single-instance API is straightforward — an in-memory counter per user, a sliding window, done. But when you scale to multiple replicas behind a load balancer, each instance has its own counter. A user can effectively multiply their quota by the number of replicas. This article shows how to use NATS KV as a shared, cluster-aware rate limit store.

The Multi-Instance Problem

With N API replicas and local rate limiting:

User quota: 100 requests/minute
Actual effective quota: 100 * N requests/minute
Each replica only sees 1/N of the user’s traffic

This defeats the purpose of rate limiting entirely. You need a shared counter that all replicas can read and write atomically.

[Rate Limiting Best Practices] — Google Cloud Architecture , 2023-06-15

Why NATS KV?

If you already run NATS JetStream for messaging, NATS KV gives you a distributed key-value store with:

Atomic operations via compare-and-swap (CAS)
TTL support for automatic window expiry
No extra infrastructure — it’s built into NATS
Replication matching your NATS cluster topology

sequenceDiagram
    participant C1 as Client A
    participant R1 as API Replica 1
    participant R2 as API Replica 2
    participant KV as NATS KV
    participant C2 as Client B

    C1->>R1: POST /documents
    C2->>R2: POST /documents

    R1->>KV: Get(rate:upload:user1)
    KV-->>R1: value=4, rev=7
    R2->>KV: Get(rate:upload:user1)
    KV-->>R2: value=4, rev=7

    R1->>KV: Update(key, 5, rev=7)
    KV-->>R1: OK, rev=8

    R2->>KV: Update(key, 5, rev=7)
    KV-->>R2: WrongLastRevision!

    R2->>KV: Get(rate:upload:user1)
    KV-->>R2: value=5, rev=8
    R2->>KV: Update(key, 6, rev=8)
    KV-->>R2: OK, rev=9

[NATS Key/Value Store] — NATS Authors , 2024-03-20

Implementation

Rate Limit Key Strategy

The key encodes the user identity and time window:

rate:{policyName}:{userId}:{windowStart}

For authenticated users, use their user ID. For anonymous requests, hash the IP address to avoid storing raw IPs:

RateLimitKeyGenerator.cs

public static class RateLimitKeyGenerator
{
    public static string Generate(
        HttpContext context, string policyName, TimeSpan window)
    {
        var userId = context.User?.FindFirst("sub")?.Value;
        var windowStart = DateTime.UtcNow.Ticks / window.Ticks;

        if (!string.IsNullOrEmpty(userId))
            return $"rate:{policyName}:{userId}:{windowStart}";

        var ip = context.Connection.RemoteIpAddress;
        var hash = ip?.GetHashCode().ToString("X8") ?? "unknown";
        return $"rate:{policyName}:anon:{hash}:{windowStart}";
    }
}

Atomic Increment with CAS

The core operation: read the current count, increment it, and write back atomically. If another replica modified the value between read and write, the CAS fails and we retry:

NatsKvRateLimiter.cs

public class NatsKvRateLimiter
{
    private readonly INatsKVStore _store;
    private readonly int _maxRetries = 3;

    public async Task<RateLimitResult> CheckAsync(
        string key, int limit, CancellationToken ct)
    {
        for (int attempt = 0; attempt < _maxRetries; attempt++)
        {
            try
            {
                var entry = await _store.GetEntryAsync<int>(key, ct);
                if (entry.Value >= limit)
                {
                    return RateLimitResult.Exceeded(
                        entry.Value, limit);
                }

                await _store.UpdateAsync(
                    key, entry.Value + 1, entry.Revision, ct);
                return RateLimitResult.Allowed(
                    entry.Value + 1, limit);
            }
            catch (NatsKVKeyNotFoundException)
            {
                // First request in this window
                try
                {
                    await _store.CreateAsync(key, 1, ct);
                    return RateLimitResult.Allowed(1, limit);
                }
                catch (NatsKVCreateException)
                {
                    continue; // Another replica created it first
                }
            }
            catch (NatsKVWrongLastRevisionException)
            {
                continue; // CAS conflict, retry
            }
        }

        // After max retries, allow the request (fail open)
        return RateLimitResult.Allowed(0, limit);
    }
}

[Optimistic Concurrency Control] — Wikipedia Contributors , 2024-01-10

ASP.NET Core Middleware Integration

Wire the rate limiter into the ASP.NET Core pipeline. The middleware runs after authentication so the user identity is available:

Program.cs

app.UseAuthentication();
app.UseAuthorization();
app.UseRateLimiter(); // Must be after auth

// Configuration
builder.Services.AddRateLimiter(options =>
{
    options.AddPolicy("api", context =>
    {
        var key = RateLimitKeyGenerator.Generate(
            context, "api", TimeSpan.FromMinutes(1));
        return RateLimitPartition.GetFixedWindowLimiter(key,
            _ => new FixedWindowRateLimiterOptions
            {
                PermitLimit = 100,
                Window = TimeSpan.FromMinutes(1),
            });
    });
});

[Rate limiting middleware in ASP.NET Core] — Microsoft , 2024-02-14

Bucket Configuration

var store = await kvContext.CreateStoreAsync(
    new NatsKVConfig("rate-limits")
    {
        MaxAge = TimeSpan.FromMinutes(5), // Auto-expire old windows
        Storage = NatsKVStorageType.Memory, // Speed over durability
        Replicas = 3,
    });

Using Memory storage instead of File trades durability for speed — rate limit counters don’t need to survive full cluster restarts.

Response Headers

Always include rate limit headers so clients can self-throttle:

context.Response.Headers["X-RateLimit-Limit"] = limit.ToString();
context.Response.Headers["X-RateLimit-Remaining"] =
    Math.Max(0, limit - count).ToString();
context.Response.Headers["X-RateLimit-Reset"] =
    windowEnd.ToUnixTimeSeconds().ToString();

[RateLimit Header Fields for HTTP] — IETF HTTP API Working Group , 2024-05-01

Key Takeaways

Local in-memory rate limiting is ineffective with multiple replicas
NATS KV provides atomic counters with CAS — no extra infrastructure needed
Use memory storage with short TTL for rate limit buckets
Place middleware after authentication so user identity is available
Prefer user ID over IP for rate limit keys when authenticated
Fail open for most APIs, fail closed for security-critical endpoints

Looking back, the most valuable lesson from implementing distributed rate limiting was that the hardest part is not the algorithm — it is managing contention under real-world traffic patterns. CAS semantics work beautifully in theory, but you need to plan for the thundering herd scenario where many replicas compete for the same key simultaneously. The combination of exponential backoff, memory-backed storage, and a fail-open policy gave me a system that is both correct and performant under load.

[Counting Things: Token Buckets and Sliding Windows] — Cloudflare , 2023-09-12

Next Steps

Implement sliding window rate limiting by maintaining multiple sub-window counters in NATS KV
Add per-endpoint rate limit policies with different thresholds for read vs write operations
Build a rate limit dashboard in Grafana to visualize quota consumption per user over time
Explore token bucket algorithms for smoother traffic shaping on bursty endpoints