🤖 AI/ML Advanced
⏱️ 10 min
From Single-Model NER to Ensemble NER: Adding spaCy + LLM Voting
How BlueRobin evolved entity extraction by combining deterministic and LLM-based providers with confidence-aware ensembling.
By Victor Robin • • Updated:
Introduction
A single NER model is rarely enough in noisy real-world documents. BlueRobin improved extraction quality by introducing spaCy as an additional provider and combining outputs with confidence-aware aggregation.
What Changed
bc7c191: spaCy added to ensemble NER providers.c0d3a8a+fdf0e61(infra): restored spaCy deployment and enabled worker access.
Ensembling Strategy
- Run multiple NER providers in parallel.
- Normalize entities into a shared schema.
- Merge by entity type and surface form.
- Keep high-confidence consensus entities first.
Why This Works
- Deterministic models improve stability for known patterns.
- LLM-based extraction catches long-tail entities.
- Aggregation reduces provider-specific blind spots.
Conclusion
Entity quality improves most when extraction is treated as an ensemble problem, not a single-model selection problem.
Related reading:
/spacy-ner-document-analysis//agentic-rrf-ensembling-production-search/