← All projects
Research project · The Debug Agent · Autonomous RCA

An agent that finds
the root cause

When production breaks, an LLM agent investigates — grounded in a graph,
disciplined by evidence, and honest about what it does not know.

A graph-grounded root-cause-analysis agent: it traverses a FalkorDB world-model of the platform, ranks suspects with correlation-first scoring, investigates with real observability tools, and only opens a pull request when hard evidence supports it. Research-grade RCA, running at a €50/month homelab budget.

Graph-grounded Correlation-first MAST-aligned reasoning €50 / month

How it investigates

An alert fires; the agent recalls similar incidents, ranks suspects from the graph, runs a bounded tool-using investigation, and decides on an action by confidence — never auto-remediating.

1

Detect

An Alertmanager webhook is de-duplicated, then deterministic passes gather traces, logs and code context before any model call.

2

Recall & rank

Personalized PageRank + BARO change-point scoring rank candidate causes; hybrid memory recalls related past incidents — all without an LLM.

3

Investigate

A bounded ReAct loop queries real tools. A verification-gate FSM — not the model — decides when evidence is sufficient to stop.

4

Decide

A confidence ladder, soft-capped by graph grounding, chooses the action: notify, open an issue, attach patches, or open a PR for a human.

Five ideas, one agent

The Debug Agent combines a graph world-model, correlation-first reasoning, an evidence-disciplined LLM loop, a rigorous eval gate, and temporal memory. Each is written up as a paper below.

01

The StackGraph

A persistent neuro-symbolic world-model in FalkorDB that fuses code, infra, runtime telemetry, change history and past RCA outcomes into one property graph the agent reasons over.

02

Correlation-first blame propagation

Personalized PageRank over an error-weighted dependency graph, blended with parameter-free BOCPD change-point scoring (BARO) — deterministic graph priors that seed the LLM's hypothesis space. No causal discovery.

03

Don't grade your own homework

An externalized verification-gate FSM governs loop termination; hypothesis state is derived only from hard tool evidence (never LLM-asserted); a grounding-based soft-cap keeps thin conclusions below the auto-PR tier.

04

Eval-first agentic RCA

Every change passes a deterministic AC@1 / MTTR replay gate with distractor injection, behind a shadow → active rollout — so an LLM-in-the-loop system ships safely against a labelled ground-truth set.

05

Bi-temporal incident memory

A hand-rolled Graphiti-on-FalkorDB temporal knowledge graph with no-LLM-at-retrieval hybrid recall (semantic + BM25 + graph-BFS, fused by RRF) and Reflexion-style lessons — the agent learns from every incident.

Research · ArXiv-style write-ups

Papers

Each paper follows a scientific structure — abstract, related work, method, evaluation, discussion — and is published here and on ArXiv.

Observability Expert 18 min

Correlation-First Blame Propagation: Personalized PageRank and Parameter-Free Change-Point Detection as Pre-LLM Graph Priors for Root-Cause Analysis

A deterministic, training-free pre-LLM scoring layer that seeds an RCA agent's hypothesis space: error-weighted personalized PageRank blame propagation, parameter-free BOCPD change-point onset detection, a BARO-style robust scorer, and a five-term re-normalizing additive blend with provable fail-open collapse — and a principled, CI-enforced refusal of causal discovery.

AI/ML Expert 20 min

Do Not Grade Your Own Homework: Externalized Verification Gates, Evidence-Derived Hypothesis Trees, and Grounding-Based Confidence Capping for LLM Root-Cause Agents

A systems paper on the reasoning-safety core of the BlueRobin Debug Agent: a deterministic verification-gate FSM outside the model, hypothesis state derived only from real-tool evidence, and a grounding-based soft-cap that bounds LLM self-confidence below the action tiers — all driven by the MAST failure taxonomy.

CI/CD Expert 18 min

Eval-First Agentic RCA: A Deterministic Accuracy/MTTR Regression Gate and Shadow-to-Active Rollout for Shipping LLM-in-the-Loop Reliability Tooling

How the BlueRobin Debug Agent makes every RCA claim measurable and safe to ship: a deterministic offline replay harness over a labelled incident set emitting an AC@1 + MTTR scorecard, a per-commit regression gate that blocks merges, distractor-injected honest-accuracy arms, and a shadow-to-active live-activation rollout — at a 50 EUR/month homelab budget.

AI/ML Expert 21 min

Bi-Temporal Incident Memory on a Property Graph: Hand-Rolled Graphiti-on-FalkorDB with No-LLM-at-Retrieval Hybrid Recall and Reflexion Lessons

How the BlueRobin Debug Agent gives an LLM RCA agent durable, time-aware memory: a bi-temporal MemoryEpisode/MemoryEntity subgraph on the existing FalkorDB, a three-leg hybrid recall (semantic + BM25 + graph-BFS) fused by Reciprocal Rank Fusion with no LLM at retrieval, and Reflexion/Voyager-style post-incident lessons — all inside a ~50 EUR/month homelab budget.

Built with

FalkorDBQdrantOllamaClaudeCloudflare AI Gateway.NET 10TempoLokiPrometheus