Self-hosted document AI,
built end-to-end
Upload any document. Ask any question.
Get answers cited directly from your own archive.
A homelab project — a complete document intelligence stack covering OCR, entity extraction, knowledge graphs, multi-model vector search, and LLM-powered RAG. All running on self-hosted Kubernetes.
How It Works
Three steps: parse with OCR, enrich with AI, then index across multiple vector models. Each step is a separate background worker communicating over NATS.
Upload & Parse
Docling converts uploaded documents to structured text, preserving tables, headings, and layout. The structure is used for accurate chunking during indexing.
Extract & Index
Named entities are extracted and linked into a knowledge graph. Each chunk is embedded across 8 vector models and stored in Qdrant, alongside a generated summary and keyword set.
Search & Ask
Reciprocal Rank Fusion combines scores from all vector models to find the most relevant passages. A locally hosted LLM answers in plain English, with citations to the exact source paragraphs.
Cross-document reasoning with a knowledge graph
Entities from each document are linked across your archive. When a question spans multiple files, the graph connects them — the LLM then reasons across the combined result.
This is not keyword search. The graph connected three documents. The LLM reasoned across them.
What's under the hood
The full stack — API, workers, frontend, messaging, vector search, graph, and AI inference — running on a homelab K3s cluster.
Knowledge Graph & Entity Intelligence
Each document is analysed to extract named entities — people, organisations, dates, financial figures, medical terms. Entities are deduplicated, resolved to canonical records, and linked into a queryable graph that grows with every upload. Connections surface across the full archive, not just within a single document.
Semantic Search & RAG
Documents are embedded across 8 vector models and stored in Qdrant. At query time, Reciprocal Rank Fusion combines rankings from each model. A locally hosted LLM answers the question, citing the exact source paragraphs it used.
Archive Management
Per-document lifecycle: Pending → Stored → Processing → Enriched → Indexed. Per-user isolated storage. Real-time Blazor UI updates via NATS JetStream.
Per-User Encryption
Dedicated MinIO bucket per user. Unique KES encryption key per user. SSE-KMS at rest. Physically and cryptographically isolated from every other account.
Event-Driven Pipeline
NATS JetStream for async processing, distributed worker deduplication via KV store, and real-time status propagation to the Blazor UI without polling.
100% Self-Hosted on a Homelab K3s Cluster
Every component runs locally — Ollama for AI inference, Flux for GitOps, Linkerd for mTLS service mesh, KES for key management. Your data never leaves your hardware.
Architecture
.NET 10 with Clean Architecture and DDD aggregates. NATS JetStream for async event processing. Flux GitOps on K3s. Full documentation in the Architecture section.
100+ Engineering Articles
The decisions behind each component: why NATS, how multi-model RAG works in practice, what broke in production, and how it was fixed.