A self-hosted team of AI agents
for your documents
Upload any document. Ask any question.
Get answers cited directly from your own archive.
A homelab project — not one model, but a collection of agents. A triage agent reads and routes every upload; specialists for retrieval, finance, health, and tax do the work; an entity & knowledge-graph agent connects it all. The deep stack — OCR, ensemble entity extraction, multi-model vector search, a graph database — is the toolbox they reach for. All on self-hosted Kubernetes.
Meet the agents
The Archives isn't one model answering questions — it's a team. Specialists handle what you came for; a core of shared-intelligence agents reads, connects, and enriches everything behind them.
Specialist agents — what you came for
Retrieval agent
Answers questions across your whole archive in plain English, citing the exact source paragraphs it used.
Finance agent
Pulls transactions, merchants, amounts, and line items out of receipts and statements into a single spend view.
Health agent
Extracts biomarkers, providers, prescriptions, and appointments, then plots the values on a timeline.
Tax agent
Gathers tax-relevant documents, figures, and deductions across the year — so filing season is a query, not a shoebox.
Core intelligence — the engine behind them
Triage agent
Reads every upload, classifies its domain, and routes it to the right specialist — the dispatcher for the whole team.
Entity & graph agent
Runs a multi-provider NER ensemble, resolves duplicates to canonical entities, and links them into a graph that grows with every document.
Insight agent
Builds rich entity profiles, generates LLM insights, detects when they go stale, and refreshes them on demand.
A document arrives → the triage agent routes it → specialists extract what matters → the entity & graph agent links it into the knowledge graph → the insight agent enriches it → the retrieval agent answers across all of it. Every hand-off is an event on NATS JetStream.
How the agents work together
Three stages: the triage agent parses and routes, specialists enrich and index, and the retrieval agent answers. Each stage is a separate worker communicating over NATS.
Triage agent — parse & route
Docling converts uploaded documents to structured text, preserving tables, headings, and layout. The triage agent classifies the domain and hands the document to the right specialist — the structure also drives accurate chunking during indexing.
Specialists & entity agent — extract & index
Domain specialists pull structured data (transactions, biomarkers, tax figures); the entity agent extracts and links named entities into a knowledge graph. Each chunk is embedded across 8 vector models and stored in Qdrant, alongside a generated summary and keyword set.
Retrieval agent — search & ask
The retrieval agent fuses scores from all vector models with Reciprocal Rank Fusion and traverses the graph to find the most relevant passages. A locally hosted LLM answers in plain English, with citations to the exact source paragraphs.
Two agents, one cross-document answer
The entity agent links entities from each document across your archive. When a question spans multiple files, the graph connects them — and the retrieval agent reasons across the combined result.
This is not keyword search. The entity agent connected three documents. The retrieval agent reasoned across them.
What powers the agents
The toolbox the agents reach for — vector search, a knowledge graph, AI inference, messaging, and per-user encryption — all running on a homelab K3s cluster.
Entity & Knowledge-Graph Agent
The entity agent analyses each document to extract named entities — people, organisations, dates, financial figures, medical terms — using a multi-provider NER ensemble. Entities are deduplicated, resolved to canonical records, and linked into a queryable graph that grows with every upload. Connections surface across the full archive, not just within a single document.
Retrieval Agent
The retrieval agent embeds documents across 8 vector models stored in Qdrant. At query time it fuses rankings from each model with Reciprocal Rank Fusion and traverses the knowledge graph. A locally hosted LLM answers the question, citing the exact source paragraphs it used.
Archive Management
Per-document lifecycle: Pending → Stored → Processing → Enriched → Indexed. Per-user isolated storage. Real-time Blazor UI updates via NATS JetStream.
Per-User Encryption
Per-user Cloudflare R2 storage with app-layer envelope encryption — a unique key per user (HKDF-SHA256 + AES-256-GCM), so every account is cryptographically isolated.
Event-Driven Pipeline
NATS JetStream for async processing, distributed worker deduplication via KV store, and real-time status propagation to the Blazor UI without polling.
100% Self-Hosted on a Homelab K3s Cluster
Compute runs on a self-hosted K3s cluster — Ollama for AI inference, Flux for GitOps, Linkerd for mTLS service mesh, with app-layer envelope encryption for every user's data at rest.
Architecture
.NET 10 with Clean Architecture and DDD aggregates. NATS JetStream for async event processing. Flux GitOps on K3s. Full documentation in the Architecture section.
100+ Engineering Articles
The decisions behind each component: why NATS, how multi-model RAG works in practice, what broke in production, and how it was fixed.