AI/ML Advanced 10 min

GraphRAG Routing in .NET: Safe Fallback Between Classic RAG and Agent Retrieval

How to introduce an agent-based retrieval routing layer with health-aware fallback to keep retrieval reliable in production.

By Victor Robin Updated:

When I first encountered a production incident where GraphRAG returned an empty result set, I was baffled. The user’s query was perfectly reasonable, the document was in the vector store, and the embedding looked fine. The problem? The document had been uploaded and embedded, but entity extraction hadn’t completed yet — the async pipeline was still processing. The graph had no knowledge of the document’s entities, so graph traversal returned nothing, and the system served an empty response. That failure taught me a lesson I now consider non-negotiable: every retrieval path needs a fallback strategy, and the routing between them needs to be deterministic, not hopeful.

Agentic retrieval (where the LLM decides which tools and retrievers to invoke) can produce dramatically better results than simple vector search. But it’s also slower, more expensive, and more fragile. In production, you need a routing layer that can fall back to classic RAG when the agentic pipeline is unhealthy.

[] — LangChain

The Reliability Problem

An agentic retrieval pipeline has more moving parts:

  • LLM for query classification
  • Graph database for traversal
  • Multiple vector stores for ensemble search
  • Tool-calling infrastructure

If any component is unhealthy, the entire agentic path fails. But classic RAG (embed query, vector search, return results) still works fine. The routing layer decides which path to use at query time.

[] — Martin Fowler [] — LlamaIndex

Routing Architecture

Query --> RoutingService
              |
              |-- Is agentic enabled? (feature flag)
              |-- Is graph DB healthy?
              |-- Is LLM responsive?
              |
              |-- YES to all --> Agentic Pipeline
              |                      |
              |                      |-- Timeout? --> Fallback
              |                      +-- Success --> Return
              |
              +-- NO to any --> Classic RAG Pipeline

Implementation

RetrievalRoutingService.cs
public class RetrievalRoutingService
{
    private readonly IClassicRagService _classicRag;
    private readonly IAgenticRagService _agenticRag;
    private readonly IHealthChecker _health;
    private readonly RetrievalOptions _options;

    public async Task<SearchResult> SearchAsync(
        string query, CancellationToken ct)
    {
        if (!_options.AgenticEnabled)
            return await _classicRag.SearchAsync(query, ct);

        var isHealthy = await _health.CheckAgenticHealthAsync(ct);
        if (!isHealthy)
        {
            _logger.LogWarning(
                "Agentic pipeline unhealthy, falling back to classic RAG");
            return await _classicRag.SearchAsync(query, ct);
        }

        try
        {
            using var cts = CancellationTokenSource
                .CreateLinkedTokenSource(ct);
            cts.CancelAfter(_options.AgenticTimeout);

            return await _agenticRag.SearchAsync(query, cts.Token);
        }
        catch (OperationCanceledException)
        {
            _logger.LogWarning(
                "Agentic retrieval timed out after {Timeout}ms",
                _options.AgenticTimeout.TotalMilliseconds);
            return await _classicRag.SearchAsync(query, ct);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Agentic retrieval failed");
            return await _classicRag.SearchAsync(query, ct);
        }
    }
}
[] — Microsoft

Health Checking

The health checker probes each component of the agentic pipeline:

AgenticHealthChecker.cs
public class AgenticHealthChecker : IHealthChecker
{
    private readonly IGraphClient _graph;
    private readonly IOllamaService _ollama;

    public async Task<bool> CheckAgenticHealthAsync(
        CancellationToken ct)
    {
        try
        {
            using var cts = CancellationTokenSource
                .CreateLinkedTokenSource(ct);
            cts.CancelAfter(TimeSpan.FromSeconds(5));

            var graphTask = _graph.PingAsync(cts.Token);
            var ollamaTask = _ollama.PingAsync(cts.Token);

            await Task.WhenAll(graphTask, ollamaTask);
            return true;
        }
        catch
        {
            return false;
        }
    }
}
[] — Microsoft

Configuration

{
  "Retrieval": {
    "AgenticEnabled": true,
    "AgenticTimeoutMs": 15000,
    "HealthCacheDurationSeconds": 30
  }
}

The feature flag (AgenticEnabled) lets you disable the entire agentic path without a deployment — useful during incidents.

Telemetry

Track which path each query takes:

_telemetry.TrackEvent("retrieval.routed", new
{
    Path = usedAgentic ? "agentic" : "classic",
    QueryLength = query.Length,
    DurationMs = stopwatch.ElapsedMilliseconds,
    ResultCount = results.Count,
});

This gives you data on:

  • What percentage of queries use the agentic path
  • Agentic vs classic latency comparison
  • Fallback frequency (indicates reliability issues)
[] — App vNext

Conclusion

Building this routing layer taught me that reliability engineering is just as important as retrieval quality in production RAG systems. It doesn’t matter how good your GraphRAG answers are if the system returns empty results or times out when a component goes down. The combination of health-aware routing, circuit breakers, and a deterministic fallback chain means that users always get a response — and in most failure scenarios, they don’t even notice the degradation.

The approach I’d recommend to anyone building a similar system: start with the fallback chain. Get pure vector search working reliably, then add the agentic path on top with routing and health checks. Never let the pursuit of better retrieval quality compromise the reliability of the system your users depend on.

Next Steps

Further Reading

[] — Martin Fowler [] — Microsoft [] — App vNext