Benchmarking and Stress Testing Microservices

When I first deployed our microservices to Kubernetes, I had no idea where the system would break under load. Everything worked perfectly in development with a single user, but I had a nagging feeling that our vector search endpoint was going to be a problem. I decided to invest a weekend into proper load testing, and what I discovered completely changed how I think about performance optimization. The bottleneck was not where I expected, and without systematic benchmarking, I would have wasted weeks optimizing the wrong component.

Introduction

“It works on my machine” is not a performance guarantee. As we move to a microservices architecture with NATS and Qdrant, we need to know: Where does the system break?

Does the API CPU spike? Does the database lock up? Does the memory leak?

Why Benchmarking Matters:

Capacity Planning: Knowing how many concurrent users a single pod can handle.
Regression Testing: Ensuring a new feature didn’t slow down the core loop.
Bottleneck Identification: Checking if the Vector DB or the API is the weak link.

What We’ll Build

We will create a stress test suite using NBomber (a .NET-native load testing framework) to hammer our endpoints.

[NBomber: Modern Load Testing Framework for .NET] — NBomber , 2024

Architecture Overview

flowchart LR
    TestRunner[NBomber Runner] -->|HTTP Requests| Ingress
    Ingress -->|Load Balance| ApiPod1[API Pod]
    Ingress -->|Load Balance| ApiPod2[API Pod]
    ApiPod1 -->|Query| Qdrant
    ApiPod1 -->|Read| Postgres

    classDef primary fill:#7c3aed,color:#fff
    classDef secondary fill:#06b6d4,color:#fff
    classDef db fill:#f43f5e,color:#fff
    classDef warning fill:#fbbf24,color:#000

    class ApiPod1,ApiPod2 primary
    class Ingress secondary
    class Qdrant,Postgres db
    class TestRunner warning

Section 1: Writing the Scenario

We want to test the “Search” endpoint, as it is the most resource-intensive.

[Performance Testing Guidance for Web Applications] — Microsoft Patterns and Practices , 2007

var httpFactory = HttpClientFactory.Create();

var searchStep = Step.Create("search_documents", httpFactory, async context =>
{
    var request = Http.CreateRequest("GET", "https://api.bluerobin.local/documents/search?q=invoice")
        .WithHeader("Authorization", "Bearer token");

    var response = await Http.Send(request, context);

    return response.StatusCode == 200
        ? Response.Ok(statusCode: 200)
        : Response.Fail();
});

var scenario = ScenarioBuilder.CreateScenario("search_load", searchStep)
    .WithWarmUpDuration(TimeSpan.FromSeconds(10))
    .WithLoadSimulations(
        Simulation.RampingInject(rate: 50, interval: TimeSpan.FromSeconds(1), during: TimeSpan.FromMinutes(2))
    );

NBomberRunner
    .RegisterScenarios(scenario)
    .Run();

Section 2: Analyzing the Crash

We ran the test ramping up to 500 requests per second (RPS).

Results:

0-100 RPS: Sub-50ms latency. Smooth.
200 RPS: Latency jumped to 400ms.
350 RPS: Errors started appearing (HTTP 503).

The Investigation

We looked at our SigNoz dashboards during the test.

[SigNoz: Open-Source Application Performance Monitoring] — SigNoz , 2024

API CPU: 40%. Not the bottleneck.
Postgres CPU: 15%. Sleeping.
Qdrant CPU: 98%.

Diagnosis: The vector search calculations were saturating the CPU cores allocated to Qdrant.

[Qdrant Performance Optimization] — Qdrant , 2024

Section 3: Continuous Performance Testing

We integrated this into our CI/CD pipeline. We don’t run the full stress test on every commit, but we run a “smoke test” (50 RPS) to ensure no gross regressions.

[Continuous Performance Testing in CI/CD Pipelines] — Grafana Labs , 2024

- name: Run Performance Smoke Test
  run: dotnet run --project tests/MyApp.Performance

Conclusion

You cannot optimize what you do not measure. By identifying Qdrant as our bottleneck, we focussed our optimization efforts where they mattered, rather than wasting time optimizing C# code that wasn’t the problem.

This experience fundamentally changed my approach to performance work. Before this, I would have guessed the .NET API was the bottleneck and spent days profiling C# code, adding caching at the application layer, or optimizing serialization. The data told a completely different story. Now, the first thing I do before any optimization work is run a load test and look at the dashboards. It has saved me countless hours of misguided effort, and I wish I had adopted this discipline earlier in my career.

[Systems Performance: Enterprise and the Cloud] — Brendan Gregg , 2020

Next Steps

Learn about Storage Performance impacting database speed.
See how Pact Testing handles functional correctness.
Explore k6 as a complementary load testing tool for more complex user journey scenarios.
Set up automated performance regression alerts using Grafana dashboards connected to CI results.