From Single-Model NER to Ensemble NER: Adding spaCy + LLM Voting

Introduction

A single NER model is rarely enough in noisy real-world documents. BlueRobin improved extraction quality by introducing spaCy as an additional provider and combining outputs with confidence-aware aggregation.

What Changed

bc7c191: spaCy added to ensemble NER providers.
c0d3a8a + fdf0e61 (infra): restored spaCy deployment and enabled worker access.

Ensembling Strategy

Run multiple NER providers in parallel.
Normalize entities into a shared schema.
Merge by entity type and surface form.
Keep high-confidence consensus entities first.

Why This Works

Deterministic models improve stability for known patterns.
LLM-based extraction catches long-tail entities.
Aggregation reduces provider-specific blind spots.

Conclusion

Entity quality improves most when extraction is treated as an ensemble problem, not a single-model selection problem.

Related reading:

/spacy-ner-document-analysis/
/agentic-rrf-ensembling-production-search/