AI/ML Advanced 11 min

Hybrid Retrieval with Graph Filters: FalkorDB + Qdrant

A production blueprint for combining graph traversal and vector search to improve recall and precision in document retrieval.

By Victor Robin Updated:

When I first started building retrieval pipelines, pure vector search seemed like it could handle everything. That illusion shattered the day a user asked, “Which vendor has sent us the most invoices this quarter?” Vector search dutifully returned chunks about individual invoices — semantically relevant, technically correct, utterly useless for answering the actual question. That question required traversing entity relationships: linking invoice documents to vendor entities, counting across the graph. That was my aha moment — the realization that semantic similarity and structural connectivity are fundamentally different retrieval axes, and the best systems need both. GraphRAG hybrid retrieval combines the best of both worlds.

Vector search finds semantically similar documents. Graph traversal finds structurally connected ones. Combining both in a hybrid retrieval pipeline gives you the precision of graph relationships with the flexibility of dense embeddings.

[] — Microsoft Research

Vector search excels at “find documents about X.” But it struggles with relational queries:

  • “Show me all lab results for the same patient as this report”
  • “What documents reference the same organization?”
  • “Find related documents within two hops of this entity”

These require structural knowledge — which entities appear in which documents, and how they relate. That’s where a knowledge graph comes in.

[] — Schneider et al.

Architecture

Query
  |-->  Vector Search (Qdrant).....Top-K by similarity
  |
  |-->  Entity Extraction (NER)
  |                    |
  |-->  Graph Traversal (FalkorDB)..Connected documents
                       |
                       v
                RRF Fusion --> Final ranked results

The architecture follows a pattern well-documented in graph-based reasoning literature, where graph structure augments the retrieval process rather than replacing it.

[] — Vashishth et al.

Graph Schema

A minimal schema for document-entity relationships in FalkorDB:

// Nodes
(:Document {id, title, created_at})
(:Entity {text, label, normalized_text})
(:User {id})

// Relationships
(:Document)-[:CONTAINS]->(:Entity)
(:Entity)-[:SAME_AS]->(:Entity)
(:User)-[:OWNS]->(:Document)
[] — FalkorDB

Step 1: Extract Entities and Build the Graph

When a document is processed, NER extracts entities and creates graph relationships:

GraphSyncService.cs
public class GraphSyncService
{
    private readonly IGraphClient _graph;

    public async Task SyncDocumentAsync(
        string docId, IReadOnlyList<ExtractedEntity> entities,
        CancellationToken ct)
    {
        // Upsert document node
        await _graph.ExecuteAsync(
            "MERGE (d:Document {id: $id})",
            new { id = docId }, ct);

        foreach (var entity in entities)
        {
            var normalized = entity.Text.ToLowerInvariant().Trim();

            // Upsert entity and link to document
            await _graph.ExecuteAsync(@"
                MERGE (e:Entity {normalized_text: $normalized})
                ON CREATE SET e.text = $text, e.label = $label
                WITH e
                MATCH (d:Document {id: $docId})
                MERGE (d)-[:CONTAINS]->(e)",
                new { normalized, text = entity.Text,
                      label = entity.Label, docId }, ct);
        }
    }
}

Step 2: Hybrid Query

For a search query, run vector search and graph traversal in parallel, then fuse results:

HybridSearchService.cs
public class HybridSearchService
{
    private readonly IVectorStore _vectors;
    private readonly IGraphClient _graph;
    private readonly RrfEnsembleSearch _rrf;

    public async Task<List<ScoredDocument>> SearchAsync(
        string query, string userId, CancellationToken ct)
    {
        // Vector search
        var vectorTask = _vectors.SearchAsync(query, topK: 20, ct);

        // Graph: find entities in query, traverse to documents
        var graphTask = GraphSearchAsync(query, userId, ct);

        await Task.WhenAll(vectorTask, graphTask);

        return _rrf.Fuse(new[]
        {
            new RankedList(await vectorTask, Weight: 1.0),
            new RankedList(await graphTask, Weight: 0.7),
        });
    }

    private async Task<List<RetrievedDocument>> GraphSearchAsync(
        string query, string userId, CancellationToken ct)
    {
        // Extract entities from query text
        var queryEntities = await _ner.ExtractAsync(query, ct);
        if (queryEntities.Count == 0) return new();

        var normalized = queryEntities
            .Select(e => e.Text.ToLowerInvariant().Trim())
            .ToList();

        // Find documents containing these entities (max 2 hops)
        var result = await _graph.ExecuteAsync<GraphDoc>(@"
            MATCH (e:Entity)
            WHERE e.normalized_text IN $entities
            MATCH (d:Document)-[:CONTAINS]->(e)
            MATCH (u:User {id: $userId})-[:OWNS]->(d)
            RETURN DISTINCT d.id AS id, d.title AS title,
                   count(e) AS entity_overlap
            ORDER BY entity_overlap DESC
            LIMIT 20",
            new { entities = normalized, userId }, ct);

        return result.Select(r => new RetrievedDocument(
            r.Id, r.Title, Score: r.EntityOverlap
        )).ToList();
    }
}
[] — Qdrant

When to Use Hybrid vs Vector-Only

ScenarioBest approach
Free-form semantic searchVector-only
Entity-specific queriesGraph + vector
”Related documents” featuresGraph traversal
Cold-start (no graph data yet)Vector with graceful fallback
[] — Neo4j

Conclusion

Building this hybrid pipeline reinforced a core belief I keep coming back to: the best retrieval systems don’t pick one approach — they layer complementary strengths. Vector search gives you the semantic flexibility to handle vague, exploratory queries. Graph traversal gives you the structural precision to answer relational questions that embeddings simply cannot encode. The combination, fused through RRF, consistently outperforms either approach in isolation.

The biggest lesson was that building the graph itself is the real investment. The retrieval code is straightforward; the months of work went into entity extraction quality, disambiguation (is “Acme” the same entity as “ACME Corp”?), and keeping the graph in sync with document updates. If you’re starting a GraphRAG project, spend your first sprint on entity extraction quality — everything downstream depends on it.

Next Steps

Further Reading

[FalkorDB Cypher Reference] — FalkorDB , 2024 [Qdrant Hybrid Search] — Qdrant , 2024 [] — Microsoft Research [] — Schneider et al.