From Single-Model NER to Ensemble NER: Adding spaCy + LLM Voting

When I first started comparing NER outputs side-by-side, one case made the need for ensembling immediately obvious. I was processing a compliance document and noticed that spaCy consistently missed “GDPR” as a legal entity — it simply was not in the model’s training data as a LAW type. Meanwhile, the LLM provider caught it every time but hallucinated entities that did not exist in the text at all, inventing organization names that appeared nowhere in the document. The moment I combined both providers with a voting mechanism, both failure modes disappeared: the LLM covered spaCy’s blind spots, and spaCy’s deterministic precision filtered out the LLM’s hallucinations. That was the day I stopped relying on any single NER model.

Named entity recognition (NER) is a core building block for document intelligence. A single NER model — whether a transformer or a rule-based system — inevitably has blind spots. By combining multiple NER providers into an ensemble with voting, you dramatically improve precision and recall while gaining confidence scores for downstream filtering.

[spaCy Linguistic Features: Named Entity Recognition] — Explosion AI , 2024-06-01

The Single-Provider Problem

spaCy’s en_core_web_trf is fast and reliable for common entities (person names, organizations, dates). But it struggles with:

Domain-specific terms (medical conditions, financial instruments)
Ambiguous spans (“Apple” as company vs. fruit)
Nested entities (location within organization name)

An LLM-based extractor handles ambiguity better but hallucinates entities and is slower. Combine them, and their weaknesses cancel out. This principle — that diverse models with uncorrelated errors produce better results when combined — is well-established in both classical machine learning and NLP specifically.

[Speech and Language Processing (3rd ed.) - Chapter 8: Sequence Labeling] — Dan Jurafsky and James H. Martin , 2024-02-03

Architecture: Parallel NER with Voting

Input Text
    |-->  spaCy NER --------->  Entities + spans
    |-->  LLM NER (Ollama) -->  Entities + spans
    |-->  Rule-based NER ----->  Entities + spans
              |
              v
      Normalize + Align
              |
              v
      Consensus Voting
              |
              v
      Merged Entities (with confidence)

Provider Interface

Each NER provider implements a common interface:

INerProvider.cs

public interface INerProvider
{
    string Name { get; }
    Task<IReadOnlyList<ExtractedEntity>> ExtractAsync(
        string text, CancellationToken ct);
}

public record ExtractedEntity(
    string Text,
    string Label,
    int StartChar,
    int EndChar,
    double Confidence,
    string Source);

spaCy Provider (via HTTP API)

SpacyNerProvider.cs

public class SpacyNerProvider : INerProvider
{
    public string Name => "spacy";
    private readonly HttpClient _http;

    public async Task<IReadOnlyList<ExtractedEntity>> ExtractAsync(
        string text, CancellationToken ct)
    {
        var response = await _http.PostAsJsonAsync(
            "/ner", new { text }, ct);
        var result = await response.Content
            .ReadFromJsonAsync<SpacyResponse>(ct);

        return result!.Entities.Select(e => new ExtractedEntity(
            e.Text, e.Label, e.Start, e.End,
            Confidence: 0.85, Source: Name
        )).ToList();
    }
}

The entity labels returned by spaCy follow the OntoNotes 5 annotation scheme, which defines 18 entity categories such as PERSON, ORG, GPE, DATE, and MONEY.

[OntoNotes Release 5.0 - Entity Types] — Linguistic Data Consortium , 2013-10-16

LLM Provider (via Ollama)

OllamaNerProvider.cs

public class OllamaNerProvider : INerProvider
{
    public string Name => "ollama";

    public async Task<IReadOnlyList<ExtractedEntity>> ExtractAsync(
        string text, CancellationToken ct)
    {
        var prompt = $@"Extract named entities from this text.
Return JSON array: [{{""text"":"".."",""label"":"".."",""start"":0,""end"":5}}]
Labels: PERSON, ORG, DATE, LOCATION, MEDICAL, FINANCIAL
Text: {text[..Math.Min(text.Length, 2000)]}";

        var response = await _ollama.GenerateAsync(prompt, ct);
        var entities = JsonSerializer
            .Deserialize<List<LlmEntity>>(response);

        return entities?.Select(e => new ExtractedEntity(
            e.Text, e.Label, e.Start, e.End,
            Confidence: 0.7, Source: Name
        )).ToList() ?? [];
    }
}

[Ollama API Documentation] — Ollama , 2024-11-20

Ensemble: Normalize, Align, Vote

EnsembleNerService.cs

public class EnsembleNerService
{
    private readonly IReadOnlyList<INerProvider> _providers;

    public async Task<IReadOnlyList<ExtractedEntity>> ExtractAsync(
        string text, CancellationToken ct)
    {
        // 1. Run providers in parallel
        var tasks = _providers.Select(
            p => p.ExtractAsync(text, ct));
        var results = await Task.WhenAll(tasks);
        var allEntities = results.SelectMany(r => r).ToList();

        // 2. Normalize labels
        var normalized = allEntities.Select(Normalize).ToList();

        // 3. Group by overlapping spans
        var groups = GroupByOverlappingSpans(normalized);

        // 4. Vote within each group
        return groups.Select(VoteOnGroup).ToList();
    }

    private ExtractedEntity VoteOnGroup(
        List<ExtractedEntity> group)
    {
        // Majority label wins
        var bestLabel = group
            .GroupBy(e => e.Label)
            .OrderByDescending(g => g.Count())
            .ThenByDescending(g => g.Max(e => e.Confidence))
            .First().Key;

        // Confidence = fraction of providers that agree
        var agreement = group
            .Count(e => e.Label == bestLabel) / (double)_providers.Count;

        var representative = group
            .First(e => e.Label == bestLabel);

        return representative with
        {
            Confidence = agreement,
            Source = string.Join("+",
                group.Select(e => e.Source).Distinct())
        };
    }
}

The ensemble approach of using agreement ratio as the confidence score is a form of model combination that has been studied extensively in the NLP literature. The core insight is that diverse classifiers with uncorrelated errors improve when averaged.

[Ensemble Methods for NLP: A Survey] — Wenliang Dai et al. , 2020-10-12

Span Overlap Detection

Two entities “overlap” if their character ranges intersect:

private List<List<ExtractedEntity>> GroupByOverlappingSpans(
    List<ExtractedEntity> entities)
{
    var sorted = entities.OrderBy(e => e.StartChar).ToList();
    var groups = new List<List<ExtractedEntity>>();
    var current = new List<ExtractedEntity>();

    foreach (var entity in sorted)
    {
        if (current.Count == 0
            || entity.StartChar <= current.Max(e => e.EndChar))
        {
            current.Add(entity);
        }
        else
        {
            groups.Add(current);
            current = new List<ExtractedEntity> { entity };
        }
    }

    if (current.Count > 0) groups.Add(current);
    return groups;
}

After entities are grouped and voted on, the resulting entity records can be linked back to their source documents for downstream entity linking and knowledge graph construction.

[Entity Linking: A Survey of Recent Approaches] — Ozge Sevgili et al. , 2022-01-15

Conclusion

Building the ensemble NER system was a turning point in how I approach information extraction. The single biggest insight is that no individual model — no matter how sophisticated — is sufficient for production NER across diverse document types. The voting mechanism is simple to implement and reason about, and the agreement-based confidence score gives downstream consumers a meaningful signal for filtering. If I were starting over, I would add the ensemble from day one rather than treating it as an optimization.

Key Takeaways:

Single NER models have blind spots — domain terms, ambiguity, nested entities
Run providers in parallel for latency parity with a single model
Normalize labels before voting (PER to PERSON, LOC to LOCATION)
Confidence = agreement ratio — simple, interpretable, filterable
Use the Source field to trace which providers contributed to each entity

Next Steps

NER with spaCy for Document Analysis — the single-provider foundation this article builds on
FalkorDB Knowledge Graph — where ensemble NER entities get stored as graph nodes and relationships

The Single-Provider Problem

Architecture: Parallel NER with Voting

Provider Interface

spaCy Provider (via HTTP API)

LLM Provider (via Ollama)

Ensemble: Normalize, Align, Vote

Span Overlap Detection

Conclusion

Next Steps

Further Reading