Personal Project · Document AI · Self-Hosted

Self-hosted document AI,
built end-to-end

Upload any document. Ask any question.
Get answers cited directly from your own archive.

A homelab project — a complete document intelligence stack covering OCR, entity extraction, knowledge graphs, multi-model vector search, and LLM-powered RAG. All running on self-hosted Kubernetes.

Multi-model Vector Search Knowledge Graph Per-User Encryption 100% Self-Hosted

How It Works

Three steps: parse with OCR, enrich with AI, then index across multiple vector models. Each step is a separate background worker communicating over NATS.

1

Upload & Parse

Docling converts uploaded documents to structured text, preserving tables, headings, and layout. The structure is used for accurate chunking during indexing.

Docling OCR Structure Extraction
2

Extract & Index

Named entities are extracted and linked into a knowledge graph. Each chunk is embedded across 8 vector models and stored in Qdrant, alongside a generated summary and keyword set.

Entity Extraction Knowledge Graph Embeddings
3

Search & Ask

Reciprocal Rank Fusion combines scores from all vector models to find the most relevant passages. A locally hosted LLM answers in plain English, with citations to the exact source paragraphs.

Qdrant RRF Ensemble Ollama RAG

Cross-document reasoning with a knowledge graph

Entities from each document are linked across your archive. When a question spans multiple files, the graph connects them — the LLM then reasons across the combined result.

3 Documents
📄 Medical letter — Jan 2024 📄 Lab results — Mar 2024 📄 Prescription — Apr 2024
Entity extraction & graph linking
Graph links
Dr. Nguyen → mentioned in all 3 documents
Diagnosis → linked to Treatment → linked to Prescription
Lab value: HbA1c 7.2 → referenced in follow-up letter + prescription context
RAG query across graph + vectors
You ask
“What has Dr. Nguyen prescribed since my diagnosis, and why?”
BlueRobin: Based on your medical letter (Jan 2024) and prescription (Apr 2024), Dr. Nguyen prescribed Lisinopril 10mg following a diagnosis of stage-1 hypertension. The decision was informed by the HbA1c reading of 7.2 from your March lab results, which indicated elevated cardiovascular risk.
Cited: medical-letter-jan-2024.pdf Cited: lab-results-mar-2024.pdf

This is not keyword search. The graph connected three documents. The LLM reasoned across them.

What's under the hood

The full stack — API, workers, frontend, messaging, vector search, graph, and AI inference — running on a homelab K3s cluster.

Knowledge Graph & Entity Intelligence

Each document is analysed to extract named entities — people, organisations, dates, financial figures, medical terms. Entities are deduplicated, resolved to canonical records, and linked into a queryable graph that grows with every upload. Connections surface across the full archive, not just within a single document.

Entity Extraction Canonical Resolution Graph Traversal Inferred Relationships

Semantic Search & RAG

Documents are embedded across 8 vector models and stored in Qdrant. At query time, Reciprocal Rank Fusion combines rankings from each model. A locally hosted LLM answers the question, citing the exact source paragraphs it used.

Ensemble Retrieval Reciprocal Rank Fusion Cited Answers Self-Hosted LLM

Archive Management

Per-document lifecycle: Pending → Stored → Processing → Enriched → Indexed. Per-user isolated storage. Real-time Blazor UI updates via NATS JetStream.

Per-User Encryption

Dedicated MinIO bucket per user. Unique KES encryption key per user. SSE-KMS at rest. Physically and cryptographically isolated from every other account.

Event-Driven Pipeline

NATS JetStream for async processing, distributed worker deduplication via KV store, and real-time status propagation to the Blazor UI without polling.

100% Self-Hosted on a Homelab K3s Cluster

Every component runs locally — Ollama for AI inference, Flux for GitOps, Linkerd for mTLS service mesh, KES for key management. Your data never leaves your hardware.

Architecture

.NET 10 with Clean Architecture and DDD aggregates. NATS JetStream for async event processing. Flux GitOps on K3s. Full documentation in the Architecture section.

.NET 10 Blazor NATS PostgreSQL Qdrant MinIO FalkorDB Ollama Kubernetes
Frontend
Blazor Server
Interactive UI · Real-time updates
Application
FastEndpoints API
REST · OpenAPI
Background Workers
OCR · Embeddings · NER
Messaging
NATS JetStream
Event-Driven Messaging · KV Store
Data Layer
PostgreSQL
Metadata
Qdrant
Vectors
MinIO
Documents
FalkorDB
Graph
AI Services
Ollama
LLM · Embeddings
D
Docling OCR
Text Extraction
Entity Extraction
NER Pipeline
Orchestration
Kubernetes (K3s)
Container Orchestration · Flux GitOps · Linkerd mTLS

100+ Engineering Articles

The decisions behind each component: why NATS, how multi-model RAG works in practice, what broke in production, and how it was fixed.