Benchmarking and Stress Testing Microservices
Using NBomber and k6 to find the breaking points of our .NET API. Analysis of CPU vs Memory vs I/O bottlenecks.
When I first deployed our microservices to Kubernetes, I had no idea where the system would break under load. Everything worked perfectly in development with a single user, but I had a nagging feeling that our vector search endpoint was going to be a problem. I decided to invest a weekend into proper load testing, and what I discovered completely changed how I think about performance optimization. The bottleneck was not where I expected, and without systematic benchmarking, I would have wasted weeks optimizing the wrong component.
Introduction
“It works on my machine” is not a performance guarantee. As we move to a microservices architecture with NATS and Qdrant, we need to know: Where does the system break?
Does the API CPU spike? Does the database lock up? Does the memory leak?
Why Benchmarking Matters:
- Capacity Planning: Knowing how many concurrent users a single pod can handle.
- Regression Testing: Ensuring a new feature didn’t slow down the core loop.
- Bottleneck Identification: Checking if the Vector DB or the API is the weak link.
What We’ll Build
We will create a stress test suite using NBomber (a .NET-native load testing framework) to hammer our endpoints.
[NBomber: Modern Load Testing Framework for .NET] — NBomber , 2024Architecture Overview
flowchart LR
TestRunner[NBomber Runner] -->|HTTP Requests| Ingress
Ingress -->|Load Balance| ApiPod1[API Pod]
Ingress -->|Load Balance| ApiPod2[API Pod]
ApiPod1 -->|Query| Qdrant
ApiPod1 -->|Read| Postgres
classDef primary fill:#7c3aed,color:#fff
classDef secondary fill:#06b6d4,color:#fff
classDef db fill:#f43f5e,color:#fff
classDef warning fill:#fbbf24,color:#000
class ApiPod1,ApiPod2 primary
class Ingress secondary
class Qdrant,Postgres db
class TestRunner warning
Section 1: Writing the Scenario
We want to test the “Search” endpoint, as it is the most resource-intensive.
[Performance Testing Guidance for Web Applications] — Microsoft Patterns and Practices , 2007var httpFactory = HttpClientFactory.Create();
var searchStep = Step.Create("search_documents", httpFactory, async context =>
{
var request = Http.CreateRequest("GET", "https://api.bluerobin.local/documents/search?q=invoice")
.WithHeader("Authorization", "Bearer token");
var response = await Http.Send(request, context);
return response.StatusCode == 200
? Response.Ok(statusCode: 200)
: Response.Fail();
});
var scenario = ScenarioBuilder.CreateScenario("search_load", searchStep)
.WithWarmUpDuration(TimeSpan.FromSeconds(10))
.WithLoadSimulations(
Simulation.RampingInject(rate: 50, interval: TimeSpan.FromSeconds(1), during: TimeSpan.FromMinutes(2))
);
NBomberRunner
.RegisterScenarios(scenario)
.Run();
Section 2: Analyzing the Crash
We ran the test ramping up to 500 requests per second (RPS).
Results:
- 0-100 RPS: Sub-50ms latency. Smooth.
- 200 RPS: Latency jumped to 400ms.
- 350 RPS: Errors started appearing (HTTP 503).
The Investigation
We looked at our SigNoz dashboards during the test.
[SigNoz: Open-Source Application Performance Monitoring] — SigNoz , 2024- API CPU: 40%. Not the bottleneck.
- Postgres CPU: 15%. Sleeping.
- Qdrant CPU: 98%.
Diagnosis: The vector search calculations were saturating the CPU cores allocated to Qdrant.
[Qdrant Performance Optimization] — Qdrant , 2024Section 3: Continuous Performance Testing
We integrated this into our CI/CD pipeline. We don’t run the full stress test on every commit, but we run a “smoke test” (50 RPS) to ensure no gross regressions.
[Continuous Performance Testing in CI/CD Pipelines] — Grafana Labs , 2024- name: Run Performance Smoke Test
run: dotnet run --project tests/MyApp.Performance
Conclusion
You cannot optimize what you do not measure. By identifying Qdrant as our bottleneck, we focussed our optimization efforts where they mattered, rather than wasting time optimizing C# code that wasn’t the problem.
This experience fundamentally changed my approach to performance work. Before this, I would have guessed the .NET API was the bottleneck and spent days profiling C# code, adding caching at the application layer, or optimizing serialization. The data told a completely different story. Now, the first thing I do before any optimization work is run a load test and look at the dashboards. It has saved me countless hours of misguided effort, and I wish I had adopted this discipline earlier in my career.
[Systems Performance: Enterprise and the Cloud] — Brendan Gregg , 2020Next Steps
- Learn about Storage Performance impacting database speed.
- See how Pact Testing handles functional correctness.
- Explore k6 as a complementary load testing tool for more complex user journey scenarios.
- Set up automated performance regression alerts using Grafana dashboards connected to CI results.