GraphRAG Routing in .NET: Safe Fallback Between Classic RAG and Agent Retrieval

When I first encountered a production incident where GraphRAG returned an empty result set, I was baffled. The user’s query was perfectly reasonable, the document was in the vector store, and the embedding looked fine. The problem? The document had been uploaded and embedded, but entity extraction hadn’t completed yet — the async pipeline was still processing. The graph had no knowledge of the document’s entities, so graph traversal returned nothing, and the system served an empty response. That failure taught me a lesson I now consider non-negotiable: every retrieval path needs a fallback strategy, and the routing between them needs to be deterministic, not hopeful.

Agentic retrieval (where the LLM decides which tools and retrievers to invoke) can produce dramatically better results than simple vector search. But it’s also slower, more expensive, and more fragile. In production, you need a routing layer that can fall back to classic RAG when the agentic pipeline is unhealthy.

[] — LangChain

The Reliability Problem

An agentic retrieval pipeline has more moving parts:

LLM for query classification
Graph database for traversal
Multiple vector stores for ensemble search
Tool-calling infrastructure

If any component is unhealthy, the entire agentic path fails. But classic RAG (embed query, vector search, return results) still works fine. The routing layer decides which path to use at query time.

[] — Martin Fowler [] — LlamaIndex

Routing Architecture

Query --> RoutingService
              |
              |-- Is agentic enabled? (feature flag)
              |-- Is graph DB healthy?
              |-- Is LLM responsive?
              |
              |-- YES to all --> Agentic Pipeline
              |                      |
              |                      |-- Timeout? --> Fallback
              |                      +-- Success --> Return
              |
              +-- NO to any --> Classic RAG Pipeline

Implementation

RetrievalRoutingService.cs

public class RetrievalRoutingService
{
    private readonly IClassicRagService _classicRag;
    private readonly IAgenticRagService _agenticRag;
    private readonly IHealthChecker _health;
    private readonly RetrievalOptions _options;

    public async Task<SearchResult> SearchAsync(
        string query, CancellationToken ct)
    {
        if (!_options.AgenticEnabled)
            return await _classicRag.SearchAsync(query, ct);

        var isHealthy = await _health.CheckAgenticHealthAsync(ct);
        if (!isHealthy)
        {
            _logger.LogWarning(
                "Agentic pipeline unhealthy, falling back to classic RAG");
            return await _classicRag.SearchAsync(query, ct);
        }

        try
        {
            using var cts = CancellationTokenSource
                .CreateLinkedTokenSource(ct);
            cts.CancelAfter(_options.AgenticTimeout);

            return await _agenticRag.SearchAsync(query, cts.Token);
        }
        catch (OperationCanceledException)
        {
            _logger.LogWarning(
                "Agentic retrieval timed out after {Timeout}ms",
                _options.AgenticTimeout.TotalMilliseconds);
            return await _classicRag.SearchAsync(query, ct);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Agentic retrieval failed");
            return await _classicRag.SearchAsync(query, ct);
        }
    }
}

[] — Microsoft

Health Checking

The health checker probes each component of the agentic pipeline:

AgenticHealthChecker.cs

public class AgenticHealthChecker : IHealthChecker
{
    private readonly IGraphClient _graph;
    private readonly IOllamaService _ollama;

    public async Task<bool> CheckAgenticHealthAsync(
        CancellationToken ct)
    {
        try
        {
            using var cts = CancellationTokenSource
                .CreateLinkedTokenSource(ct);
            cts.CancelAfter(TimeSpan.FromSeconds(5));

            var graphTask = _graph.PingAsync(cts.Token);
            var ollamaTask = _ollama.PingAsync(cts.Token);

            await Task.WhenAll(graphTask, ollamaTask);
            return true;
        }
        catch
        {
            return false;
        }
    }
}

[] — Microsoft

Configuration

{
  "Retrieval": {
    "AgenticEnabled": true,
    "AgenticTimeoutMs": 15000,
    "HealthCacheDurationSeconds": 30
  }
}

The feature flag (AgenticEnabled) lets you disable the entire agentic path without a deployment — useful during incidents.

Telemetry

Track which path each query takes:

_telemetry.TrackEvent("retrieval.routed", new
{
    Path = usedAgentic ? "agentic" : "classic",
    QueryLength = query.Length,
    DurationMs = stopwatch.ElapsedMilliseconds,
    ResultCount = results.Count,
});

This gives you data on:

What percentage of queries use the agentic path
Agentic vs classic latency comparison
Fallback frequency (indicates reliability issues)

Implementation Note

The circuit breaker on the graph database connection was essential — and I learned this the hard way. When FalkorDB went down for scheduled maintenance one evening, our system didn’t have a circuit breaker yet. Every GraphRAG query waited the full 15-second timeout before falling through to the fallback, and users experienced agonizing delays. After adding a Polly circuit breaker that opened after 3 consecutive failed requests, the behavior changed completely: the first 3 queries after a FalkorDB outage experienced the timeout, but then the circuit opened and all subsequent queries fell through to vector-only search seamlessly in under 100ms. The circuit breaker half-opened every 60 seconds to test if FalkorDB was back, and when it recovered, traffic resumed automatically. Without this pattern, every maintenance window would have been a user-facing incident.

[] — App vNext

Conclusion

Building this routing layer taught me that reliability engineering is just as important as retrieval quality in production RAG systems. It doesn’t matter how good your GraphRAG answers are if the system returns empty results or times out when a component goes down. The combination of health-aware routing, circuit breakers, and a deterministic fallback chain means that users always get a response — and in most failure scenarios, they don’t even notice the degradation.

The approach I’d recommend to anyone building a similar system: start with the fallback chain. Get pure vector search working reliably, then add the agentic path on top with routing and health checks. Never let the pursuit of better retrieval quality compromise the reliability of the system your users depend on.

Next Steps

Hybrid Retrieval with Graph Filters: FalkorDB + Qdrant — the graph traversal and vector search components that this router orchestrates
Agentic RRF Ensembling for Production Search — how the RRF ensemble works within the agentic retrieval path

The Reliability Problem

Routing Architecture

Implementation

Health Checking

Configuration

Telemetry

Conclusion

Next Steps

Further Reading