← All projects
Homelab project · The Archives · Multi-Agent Document AI

A self-hosted team of AI agents
for your documents

Upload any document. Ask any question.
Get answers cited directly from your own archive.

A homelab project — not one model, but a collection of agents. A triage agent reads and routes every upload; specialists for retrieval, finance, health, and tax do the work; an entity & knowledge-graph agent connects it all. The deep stack — OCR, ensemble entity extraction, multi-model vector search, a graph database — is the toolbox they reach for. All on self-hosted Kubernetes.

Specialist Agents Knowledge Graph Per-User Encryption 100% Self-Hosted

Meet the agents

The Archives isn't one model answering questions — it's a team. Specialists handle what you came for; a core of shared-intelligence agents reads, connects, and enriches everything behind them.

Specialist agents — what you came for

Retrieval agent

Answers questions across your whole archive in plain English, citing the exact source paragraphs it used.

8-model embeddingsReciprocal Rank FusionGraph traversalLocal LLM

Finance agent

Pulls transactions, merchants, amounts, and line items out of receipts and statements into a single spend view.

Transaction schemaMerchant resolutionSpend dashboard

Health agent

Extracts biomarkers, providers, prescriptions, and appointments, then plots the values on a timeline.

Biomarker timelineProvider linkingStructured medical schema

Tax agent

Gathers tax-relevant documents, figures, and deductions across the year — so filing season is a query, not a shoebox.

Document gatheringDeduction hintsYear grouping

Core intelligence — the engine behind them

Triage agent

Reads every upload, classifies its domain, and routes it to the right specialist — the dispatcher for the whole team.

Docling OCRContent analysisDomain classification

Entity & graph agent

Runs a multi-provider NER ensemble, resolves duplicates to canonical entities, and links them into a graph that grows with every document.

NER ensembleCanonical resolutionFalkorDB graphCross-doc reasoning

Insight agent

Builds rich entity profiles, generates LLM insights, detects when they go stale, and refreshes them on demand.

Entity profilesLLM insightsStaleness detectionWeb enrichment

A document arrives → the triage agent routes it → specialists extract what matters → the entity & graph agent links it into the knowledge graph → the insight agent enriches it → the retrieval agent answers across all of it. Every hand-off is an event on NATS JetStream.

How the agents work together

Three stages: the triage agent parses and routes, specialists enrich and index, and the retrieval agent answers. Each stage is a separate worker communicating over NATS.

1

Triage agent — parse & route

Docling converts uploaded documents to structured text, preserving tables, headings, and layout. The triage agent classifies the domain and hands the document to the right specialist — the structure also drives accurate chunking during indexing.

Docling OCR Structure Extraction
2

Specialists & entity agent — extract & index

Domain specialists pull structured data (transactions, biomarkers, tax figures); the entity agent extracts and links named entities into a knowledge graph. Each chunk is embedded across 8 vector models and stored in Qdrant, alongside a generated summary and keyword set.

Entity Extraction Knowledge Graph Embeddings
3

Retrieval agent — search & ask

The retrieval agent fuses scores from all vector models with Reciprocal Rank Fusion and traverses the graph to find the most relevant passages. A locally hosted LLM answers in plain English, with citations to the exact source paragraphs.

Qdrant RRF Ensemble Ollama LLM

Two agents, one cross-document answer

The entity agent links entities from each document across your archive. When a question spans multiple files, the graph connects them — and the retrieval agent reasons across the combined result.

3 Documents
📄 Medical letter — Jan 2024 📄 Lab results — Mar 2024 📄 Prescription — Apr 2024
Entity agent — extraction & graph linking
Graph links
Dr. Nguyen → mentioned in all 3 documents
Diagnosis → linked to Treatment → linked to Prescription
Lab value: HbA1c 7.2 → referenced in follow-up letter + prescription context
Retrieval agent — graph + vector search
You ask
“What has Dr. Nguyen prescribed since my diagnosis, and why?”
BlueRobin: Based on your medical letter (Jan 2024) and prescription (Apr 2024), Dr. Nguyen prescribed Lisinopril 10mg following a diagnosis of stage-1 hypertension. The decision was informed by the HbA1c reading of 7.2 from your March lab results, which indicated elevated cardiovascular risk.
Cited: medical-letter-jan-2024.pdf Cited: lab-results-mar-2024.pdf

This is not keyword search. The entity agent connected three documents. The retrieval agent reasoned across them.

What powers the agents

The toolbox the agents reach for — vector search, a knowledge graph, AI inference, messaging, and per-user encryption — all running on a homelab K3s cluster.

Entity & Knowledge-Graph Agent

The entity agent analyses each document to extract named entities — people, organisations, dates, financial figures, medical terms — using a multi-provider NER ensemble. Entities are deduplicated, resolved to canonical records, and linked into a queryable graph that grows with every upload. Connections surface across the full archive, not just within a single document.

Entity Extraction Canonical Resolution Graph Traversal Inferred Relationships

Retrieval Agent

The retrieval agent embeds documents across 8 vector models stored in Qdrant. At query time it fuses rankings from each model with Reciprocal Rank Fusion and traverses the knowledge graph. A locally hosted LLM answers the question, citing the exact source paragraphs it used.

Ensemble Retrieval Reciprocal Rank Fusion Cited Answers Self-Hosted LLM

Archive Management

Per-document lifecycle: Pending → Stored → Processing → Enriched → Indexed. Per-user isolated storage. Real-time Blazor UI updates via NATS JetStream.

Per-User Encryption

Per-user Cloudflare R2 storage with app-layer envelope encryption — a unique key per user (HKDF-SHA256 + AES-256-GCM), so every account is cryptographically isolated.

Event-Driven Pipeline

NATS JetStream for async processing, distributed worker deduplication via KV store, and real-time status propagation to the Blazor UI without polling.

100% Self-Hosted on a Homelab K3s Cluster

Compute runs on a self-hosted K3s cluster — Ollama for AI inference, Flux for GitOps, Linkerd for mTLS service mesh, with app-layer envelope encryption for every user's data at rest.

Architecture

.NET 10 with Clean Architecture and DDD aggregates. NATS JetStream for async event processing. Flux GitOps on K3s. Full documentation in the Architecture section.

.NET 10 Blazor NATS PostgreSQL Qdrant Cloudflare R2 FalkorDB Ollama Kubernetes
Frontend
Blazor Server
Interactive UI · Real-time updates
Application
FastEndpoints API
REST · OpenAPI
Background Workers
OCR · Embeddings · NER
Messaging
NATS JetStream
Event-Driven Messaging · KV Store
Data Layer
PostgreSQL
Metadata
Qdrant
Vectors
Cloudflare R2
Documents
FalkorDB
Graph
AI Services
Ollama
LLM · Embeddings
D
Docling OCR
Text Extraction
Entity Extraction
NER Pipeline
Orchestration
Kubernetes (K3s)
Container Orchestration · Flux GitOps · Linkerd mTLS

100+ Engineering Articles

The decisions behind each component: why NATS, how multi-model RAG works in practice, what broke in production, and how it was fixed.