Entity Analysis and Modeling for Graph Databases
How to extract entities (NER) from documents using LLMs and model them effectively in FalkorDB for knowledge graphs.
When I first started exploring knowledge graphs for our document archive system, I was surprised at how quickly the limitations of pure vector search became apparent. A user would ask “show me all contracts linked to Project Alpha,” and the vector database would return semantically similar documents but completely miss the structural relationship. That frustration led me down the rabbit hole of entity extraction and graph modeling, and the results transformed how we think about document intelligence. This article captures everything I learned building that pipeline from scratch.
Introduction
Vector databases like Qdrant are amazing for “similarity search,” but they struggle with structured relationships. If you ask, “Who are all the vendors connected to Project Alpha?”, a vector search might fail. This is where a Graph Database (FalkorDB) shines.
[Knowledge Graphs: Fundamentals, Techniques, and Applications] — Hogan et al. , 2021Why Graph Modeling Matters:
- Structured Knowledge: Maps real-world relationships (Person
WORKS_ONProject). - Multi-Hop Reasoning: Allows queries that traverse connections (Friends of Friends).
- Entity Disambiguation: Distinguishes between “Apple” (the fruit) and “Apple” (the company).
What We’ll Build
We will build a pipeline that extracts entities from our documents and inserts them into FalkorDB.
- Perform NER (Named Entity Recognition): Use an LLM to find People, Organizations, and Locations.
- Model the Schema: Define distinct Nodes and Relationships.
- Ingest into FalkorDB: Use Cypher queries to build the graph.
Architecture Overview
The flow runs in parallel with our vector embedding pipeline.
flowchart LR
Doc[Document Text] -->|Prompt| LLM[LLM NER]
LLM -->|JSON| Entities[Entity List]
Entities -->|Map| GraphModel[Graph Logic]
GraphModel -->|Cypher| FalkorDB[(FalkorDB)]
classDef primary fill:#7c3aed,color:#fff
classDef secondary fill:#06b6d4,color:#fff
classDef db fill:#f43f5e,color:#fff
class GraphModel,LLM primary
class Doc,Entities secondary
class FalkorDB db
Section 1: Extracting Entities with LLMs
Traditional NLP libraries (like spaCy) are fast but often struggle with custom domain contexts. LLMs represent a leap forward in zero-shot NER.
[Named Entity Recognition with Large Language Models: A Survey] — Wang et al. , 2023We design a prompt that asks the LLM to return a structured JSON response.
var systemPrompt = @"
You are an expert data analyst. Extract the following entities from the text:
- Person
- Organization
- Project
- Contract
Return ONLY JSON format:
{
""entities"": [
{ ""type"": ""Person"", ""name"": ""Alice Smith"", ""role"": ""Manager"" },
{ ""type"": ""Organization"", ""name"": ""Acme Corp"" }
],
""relationships"": [
{ ""from"": ""Alice Smith"", ""to"": ""Acme Corp"", ""type"": ""WORKS_FOR"" }
]
}";
Section 2: Modeling for FalkorDB
In FalkorDB (which uses the Redis protocol and Cypher query language), we need to be careful about node merging. We don’t want five “Alice Smith” nodes; we want one node with multiple connections.
[FalkorDB Documentation: Graph Data Modeling] — FalkorDB , 2024Cypher Merge Strategy
Use the MERGE keyword to ensure idempotency.
MERGE (p:Person {name: 'Alice Smith'})
MERGE (o:Organization {name: 'Acme Corp'})
MERGE (p)-[:WORKS_FOR]->(o)
[openCypher: The Property Graph Query Language]
— openCypher Project , 2023
In C#, using the NRedisStack or a Redis client:
public async Task IngestGraphAsync(GraphData data)
{
var db = _redis.GetDatabase();
// Construct Cypher query dynamically or use parameters
foreach (var rel in data.Relationships)
{
string query = "MERGE (a {name: $from}) MERGE (b {name: $to}) MERGE (a)-[:" + rel.Type + "]->(b)";
await db.GraphQueryAsync("files", query, new { from = rel.From, to = rel.To });
}
}
Section 3: Querying the Graph
Now that the data is in, we can perform powerful queries that RAG alone cannot handle.
Question: “Find all contracts signed by anyone who works for Acme Corp.”
MATCH (p:Person)-[:WORKS_FOR]->(o:Organization {name: 'Acme Corp'})
MATCH (p)-[:SIGNED]->(c:Contract)
RETURN c.title, p.name
[Graph-Enhanced Retrieval Augmented Generation (GraphRAG)]
— Microsoft Research , 2024
Conclusion
By combining the fuzzy search capabilities of Qdrant with the structured relationship modeling of FalkorDB, we create a “GraphRAG” system. This gives our AI two brains: one for “vibe” (vectors) and one for “facts” (graph).
Looking back, the journey from pure vector search to a hybrid GraphRAG approach was one of the most rewarding technical decisions I made on this project. The ability to answer structured relationship questions that were previously impossible gave our archive system a qualitative leap in usefulness. If you are building any kind of document intelligence system, I strongly recommend investing the time to model your domain as a graph — the returns compound as your data grows.
[Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks] — Lewis et al. , 2020Next Steps
- Explore how we Benchmark Microservices to handle graph ingestion load.
- Review specific optimization techniques in Kubernetes Image Optimization.
- Investigate adding temporal properties to graph edges so that relationship history can be queried across time.
- Experiment with graph embeddings (e.g., Node2Vec) to combine vector similarity with structural proximity.