Personal Project · Document AI · Self-Hosted

Self-hosted document AI,
built end-to-end

Upload any document. Ask any question.
Get answers cited directly from your own archive.

A homelab project — a complete document intelligence stack covering OCR, entity extraction, knowledge graphs, multi-model vector search, and LLM-powered RAG. All running on self-hosted Kubernetes.

Architecture All Articles

Multi-model Vector Search Knowledge Graph Per-User Encryption 100% Self-Hosted

How It Works

Three steps: parse with OCR, enrich with AI, then index across multiple vector models. Each step is a separate background worker communicating over NATS.

Upload & Parse

Docling converts uploaded documents to structured text, preserving tables, headings, and layout. The structure is used for accurate chunking during indexing.

Docling OCR Structure Extraction

Extract & Index

Named entities are extracted and linked into a knowledge graph. Each chunk is embedded across 8 vector models and stored in Qdrant, alongside a generated summary and keyword set.

Entity Extraction Knowledge Graph Embeddings

Search & Ask

Reciprocal Rank Fusion combines scores from all vector models to find the most relevant passages. A locally hosted LLM answers in plain English, with citations to the exact source paragraphs.

Qdrant RRF Ensemble Ollama RAG

Cross-document reasoning with a knowledge graph

Entities from each document are linked across your archive. When a question spans multiple files, the graph connects them — the LLM then reasons across the combined result.

3 Documents

📄 Medical letter — Jan 2024 📄 Lab results — Mar 2024 📄 Prescription — Apr 2024

Entity extraction & graph linking

Graph links

Dr. Nguyen → mentioned in all 3 documents

Diagnosis → linked to Treatment → linked to Prescription

Lab value: HbA1c 7.2 → referenced in follow-up letter + prescription context

RAG query across graph + vectors

You ask

“What has Dr. Nguyen prescribed since my diagnosis, and why?”

BlueRobin: Based on your medical letter (Jan 2024) and prescription (Apr 2024), Dr. Nguyen prescribed Lisinopril 10mg following a diagnosis of stage-1 hypertension. The decision was informed by the HbA1c reading of 7.2 from your March lab results, which indicated elevated cardiovascular risk.

Cited: medical-letter-jan-2024.pdf Cited: lab-results-mar-2024.pdf

This is not keyword search. The graph connected three documents. The LLM reasoned across them.

What's under the hood

The full stack — API, workers, frontend, messaging, vector search, graph, and AI inference — running on a homelab K3s cluster.

Knowledge Graph & Entity Intelligence

Each document is analysed to extract named entities — people, organisations, dates, financial figures, medical terms. Entities are deduplicated, resolved to canonical records, and linked into a queryable graph that grows with every upload. Connections surface across the full archive, not just within a single document.

Entity Extraction Canonical Resolution Graph Traversal Inferred Relationships

Semantic Search & RAG

Documents are embedded across 8 vector models and stored in Qdrant. At query time, Reciprocal Rank Fusion combines rankings from each model. A locally hosted LLM answers the question, citing the exact source paragraphs it used.

Ensemble Retrieval Reciprocal Rank Fusion Cited Answers Self-Hosted LLM

Archive Management

Per-document lifecycle: Pending → Stored → Processing → Enriched → Indexed. Per-user isolated storage. Real-time Blazor UI updates via NATS JetStream.

Per-User Encryption

Dedicated MinIO bucket per user. Unique KES encryption key per user. SSE-KMS at rest. Physically and cryptographically isolated from every other account.

Event-Driven Pipeline

NATS JetStream for async processing, distributed worker deduplication via KV store, and real-time status propagation to the Blazor UI without polling.

100% Self-Hosted on a Homelab K3s Cluster

Every component runs locally — Ollama for AI inference, Flux for GitOps, Linkerd for mTLS service mesh, KES for key management. Your data never leaves your hardware.

Architecture

.NET 10 with Clean Architecture and DDD aggregates. NATS JetStream for async event processing. Flux GitOps on K3s. Full documentation in the Architecture section.

.NET 10 Blazor NATS PostgreSQL Qdrant MinIO FalkorDB Ollama Kubernetes

Frontend

Blazor Server

Interactive UI · Real-time updates

▼

Application

FastEndpoints API

REST · OpenAPI

Background Workers

OCR · Embeddings · NER

▼

Messaging

NATS JetStream

Event-Driven Messaging · KV Store

▼

Data Layer

PostgreSQL

Metadata

Qdrant

Vectors

MinIO

Documents

FalkorDB

Graph

▼

AI Services

Ollama

LLM · Embeddings

Docling OCR

Text Extraction

Entity Extraction

NER Pipeline

▼

Orchestration

Kubernetes (K3s)

Container Orchestration · Flux GitOps · Linkerd mTLS

100+ Engineering Articles

The decisions behind each component: why NATS, how multi-model RAG works in practice, what broke in production, and how it was fixed.

Start Reading View Architecture Explore by Topic

Self-hosted document AI,built end-to-end