Building a Semantic Embedding Pipeline with Ollama and Qdrant
Create a production-ready embedding pipeline in .NET that generates vector embeddings with Ollama and stores them in Qdrant for semantic search.
2026 Upgrade Note
This article was upgraded on 2026-02-26 with the latest production context: multi-model embedding fanout, model-specific completion events, and stronger alignment with ensemble retrieval strategies introduced in recent worker and infra updates.
Introduction
Traditional keyword search fails when users search for “budget reports” but your documents say “financial statements.” Semantic search solves this by converting text into vector representations (embeddings) that capture meaning, enabling similarity matching based on concepts rather than exact words.
In this guide, we’ll build a production-ready embedding pipeline that:
- Chunks documents into semantically meaningful segments
- Generates vector embeddings using Ollama (local inference)
- Stores vectors in Qdrant with metadata for filtered search
- Processes documents asynchronously via NATS events
- Enables semantic search with score thresholds and user isolation
Why Ollama + Qdrant?
- Ollama: Run embedding models locally without API costs or data leaving your infrastructure. Perfect for sensitive documents.
- Qdrant: Purpose-built vector database with filtering, payloads, and horizontal scaling. Much faster than Postgres with pgvector for large collections.
Prerequisites
- Ollama running with
nomic-embed-textmodel pulled - Qdrant instance (we use shared infrastructure in
data-layernamespace) - NATS JetStream for event-driven processing
Architecture Overview
Before diving into code, understand the data flow:
flowchart LR
subgraph Ingestion["📥 Ingestion"]
A["📄 Content<br/>Markdown"] --> B["✂️ Chunking<br/>Semantic"]
B --> C["🧠 Embedding<br/>Ollama"]
end
subgraph Storage["💾 Storage"]
C --> D["📤 Upsert<br/>Batched"]
D --> E[("🔮 Qdrant<br/>Vectors")]
end
subgraph Query["🔍 Query"]
F["💬 Search Query"] --> E
E --> G["✅ Results"]
end
classDef primary fill:#7c3aed,color:#fff
classDef secondary fill:#06b6d4,color:#fff
classDef db fill:#f43f5e,color:#fff
classDef warning fill:#fbbf24,color:#000
class Ingestion primary
class Storage db
class Query warning
The flow works like this:
- After OCR completes, a NATS event triggers the embedding worker
- Document content is split into ~500 token chunks with overlap
- Each chunk is sent to Ollama for embedding generation
- Vectors are stored in Qdrant with metadata (document ID, user ID, chunk content)
- At search time, the query is embedded and compared against stored vectors
Choosing an Embedding Model
Model selection significantly impacts both search quality and performance. For local inference with Ollama, we evaluated three options:
| Model | Dimensions | Context | Speed | Quality |
|---|---|---|---|---|
| nomic-embed-text | 768 | 8192 | Fast | Excellent |
| mxbai-embed-large | 1024 | 512 | Medium | Very Good |
| all-minilm | 384 | 256 | Very Fast | Good |
We chose nomic-embed-text for several reasons:
- 768 dimensions: Good balance between expressiveness and storage
- 8K context window: Handles longer chunks without truncation
- Competitive quality: Benchmarks well against commercial models
- Fast inference: ~50ms per embedding on modest hardware
The Embedding Service
Let’s start with the interface. Defining a clean abstraction allows us to swap implementations (e.g., switch to OpenAI embeddings for comparison):
// Application/Services/IEmbeddingService.cs
namespace Archives.Application.Services;
public interface IEmbeddingService
{
/// <summary>
/// Generates an embedding for a single text.
/// </summary>
Task<float[]> GenerateEmbeddingAsync(
string text,
CancellationToken ct = default);
/// <summary>
/// Generates embeddings for multiple texts in a batch.
/// More efficient than calling GenerateEmbeddingAsync repeatedly.
/// </summary>
Task<IReadOnlyList<float[]>> GenerateEmbeddingsAsync(
IReadOnlyList<string> texts,
CancellationToken ct = default);
}
Ollama Implementation
The Ollama API is straightforward—send text, receive a vector. The complexity comes from batching and error handling:
// Infrastructure/AI/OllamaEmbeddingService.cs
namespace Archives.Infrastructure.AI;
public sealed class OllamaEmbeddingService : IEmbeddingService
{
private readonly HttpClient _httpClient;
private readonly ILogger<OllamaEmbeddingService> _logger;
private readonly string _model;
private const int MaxBatchSize = 10; // Prevent memory spikes
public OllamaEmbeddingService(
HttpClient httpClient,
IConfiguration configuration,
ILogger<OllamaEmbeddingService> logger)
{
_httpClient = httpClient;
_logger = logger;
_model = configuration["Ollama:EmbeddingModel"] ?? "nomic-embed-text";
}
public async Task<float[]> GenerateEmbeddingAsync(
string text,
CancellationToken ct = default)
{
var result = await GenerateEmbeddingsAsync([text], ct);
return result[0];
}
public async Task<IReadOnlyList<float[]>> GenerateEmbeddingsAsync(
IReadOnlyList<string> texts,
CancellationToken ct = default)
{
if (texts.Count == 0)
return [];
var embeddings = new List<float[]>(texts.Count);
// Process in batches to manage memory and provide progress
foreach (var batch in texts.Chunk(MaxBatchSize))
{
var batchEmbeddings = await GenerateBatchAsync(batch, ct);
embeddings.AddRange(batchEmbeddings);
}
return embeddings;
}
Note on batching: Ollama’s /api/embed accepts one input at a time, so we iterate. However, batching still helps by limiting concurrent requests and providing natural checkpoints for logging.
private async Task<IEnumerable<float[]>> GenerateBatchAsync(
IEnumerable<string> texts,
CancellationToken ct)
{
var results = new List<float[]>();
foreach (var text in texts)
{
var request = new OllamaEmbedRequest
{
Model = _model,
Input = text
};
var response = await _httpClient.PostAsJsonAsync(
"/api/embed",
request,
ct);
response.EnsureSuccessStatusCode();
var result = await response.Content
.ReadFromJsonAsync<OllamaEmbedResponse>(ct);
// Use pattern matching to safely extract the embedding
if (result?.Embeddings is [var embedding, ..])
{
results.Add(embedding);
}
else
{
throw new EmbeddingException($"No embedding returned for text");
}
}
return results;
}
private sealed record OllamaEmbedRequest
{
[JsonPropertyName("model")]
public required string Model { get; init; }
[JsonPropertyName("input")]
public required string Input { get; init; }
}
private sealed record OllamaEmbedResponse
{
[JsonPropertyName("embeddings")]
public float[][]? Embeddings { get; init; }
}
}
public class EmbeddingException : Exception
{
public EmbeddingException(string message) : base(message) { }
}
Semantic Chunking
Embedding models work best on focused, coherent text. A 10,000 word document embedded as-is produces a vector that represents “everything,” making it hard to match against specific queries. Chunking splits documents into meaningful segments.
Key chunking decisions:
- Chunk size: ~500 tokens balances specificity with context. Too small = fragments; too large = diluted relevance.
- Overlap: 50-100 tokens ensures concepts spanning chunk boundaries appear in at least one chunk.
- Boundary respect: Split at markdown headers or sentence boundaries, not mid-word.
// Application/Services/ITextChunker.cs
namespace Archives.Application.Services;
public interface ITextChunker
{
IReadOnlyList<TextChunk> ChunkText(
string text,
ChunkingOptions? options = null);
}
public record TextChunk(
string Content,
int StartOffset,
int EndOffset,
int Index);
public record ChunkingOptions
{
public int MaxTokens { get; init; } = 500;
public int OverlapTokens { get; init; } = 50;
public bool PreserveMarkdown { get; init; } = true;
}
Markdown-Aware Implementation
Our documents are markdown from OCR output, so we should respect heading boundaries:
// Infrastructure/Text/SemanticTextChunker.cs
namespace Archives.Infrastructure.Text;
public sealed class SemanticTextChunker : ITextChunker
{
private readonly ILogger<SemanticTextChunker> _logger;
// Approximate tokens as words * 1.3 (accounts for subword tokenization)
private const double TokenToWordRatio = 1.3;
public SemanticTextChunker(ILogger<SemanticTextChunker> logger)
{
_logger = logger;
}
public IReadOnlyList<TextChunk> ChunkText(
string text,
ChunkingOptions? options = null)
{
options ??= new ChunkingOptions();
if (string.IsNullOrWhiteSpace(text))
return [];
var chunks = new List<TextChunk>();
// Choose splitting strategy based on content type
var sections = options.PreserveMarkdown
? SplitByMarkdownSections(text)
: SplitBySentences(text);
var currentChunk = new StringBuilder();
var currentStart = 0;
var chunkIndex = 0;
foreach (var section in sections)
{
var sectionTokens = EstimateTokens(section);
var currentTokens = EstimateTokens(currentChunk.ToString());
// Check if adding this section would exceed our limit
if (currentTokens + sectionTokens > options.MaxTokens && currentChunk.Length > 0)
{
// Finalize current chunk
var content = currentChunk.ToString().Trim();
if (content.Length > 0)
{
chunks.Add(new TextChunk(
content,
currentStart,
currentStart + content.Length,
chunkIndex++));
}
// Start new chunk with overlap from previous
var overlap = GetOverlapText(currentChunk.ToString(), options.OverlapTokens);
currentChunk.Clear();
currentChunk.Append(overlap);
currentStart = text.IndexOf(section, StringComparison.Ordinal);
}
currentChunk.Append(section);
currentChunk.AppendLine();
}
// Don't forget the final chunk!
var lastContent = currentChunk.ToString().Trim();
if (lastContent.Length > 0)
{
chunks.Add(new TextChunk(
lastContent,
currentStart,
currentStart + lastContent.Length,
chunkIndex));
}
_logger.LogDebug(
"Split {TextLength} chars into {ChunkCount} chunks",
text.Length,
chunks.Count);
return chunks;
}
The helper methods handle the two splitting strategies:
/// <summary>
/// Split at markdown headers, keeping each header with its content.
/// This preserves document structure in the chunks.
/// </summary>
private static IEnumerable<string> SplitByMarkdownSections(string text)
{
var lines = text.Split('\n');
var currentSection = new StringBuilder();
foreach (var line in lines)
{
// When we hit a header, yield the previous section
if (line.StartsWith('#') && currentSection.Length > 0)
{
yield return currentSection.ToString();
currentSection.Clear();
}
currentSection.AppendLine(line);
}
if (currentSection.Length > 0)
{
yield return currentSection.ToString();
}
}
/// <summary>
/// Fallback for non-markdown content: split at sentence boundaries.
/// </summary>
private static IEnumerable<string> SplitBySentences(string text)
{
var sentenceEnders = new[] { ". ", "! ", "? ", ".\n", "!\n", "?\n" };
var start = 0;
for (var i = 0; i < text.Length - 1; i++)
{
var twoChars = text.Substring(i, 2);
if (sentenceEnders.Contains(twoChars))
{
yield return text.Substring(start, i - start + 1).Trim();
start = i + 1;
}
}
if (start < text.Length)
{
yield return text.Substring(start).Trim();
}
}
private static int EstimateTokens(string text)
{
if (string.IsNullOrWhiteSpace(text))
return 0;
var wordCount = text.Split(
[' ', '\n', '\r', '\t'],
StringSplitOptions.RemoveEmptyEntries).Length;
return (int)(wordCount * TokenToWordRatio);
}
/// <summary>
/// Get the last N tokens worth of text for overlap.
/// </summary>
private static string GetOverlapText(string text, int overlapTokens)
{
var words = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);
var overlapWordCount = (int)(overlapTokens / TokenToWordRatio);
if (words.Length <= overlapWordCount)
return text;
return string.Join(' ', words.TakeLast(overlapWordCount));
}
}
Qdrant Vector Storage
Qdrant is a purpose-built vector database that handles similarity search at scale. Key features we use:
- Payloads: Store metadata alongside vectors for filtering
- Collections: Namespace vectors by environment (dev, staging, prod)
- Filtering: Pre-filter by user_id before similarity search
The Vector Store Interface
We abstract Qdrant behind an interface for testability:
// Application/Services/IVectorStore.cs
namespace Archives.Application.Services;
public interface IVectorStore
{
Task UpsertAsync(
string collection,
IReadOnlyList<VectorPoint> points,
CancellationToken ct = default);
Task<IReadOnlyList<SearchResult>> SearchAsync(
string collection,
float[] queryVector,
SearchOptions options,
CancellationToken ct = default);
Task DeleteByDocumentIdAsync(
string collection,
string documentId,
CancellationToken ct = default);
}
public record VectorPoint(
Guid Id,
float[] Vector,
Dictionary<string, object> Payload);
public record SearchResult(
Guid Id,
float Score,
Dictionary<string, object> Payload);
public record SearchOptions
{
public int Limit { get; init; } = 10;
public float ScoreThreshold { get; init; } = 0.7f;
public Dictionary<string, object>? Filters { get; init; }
}
Qdrant Implementation
The implementation handles collection creation, upserts, search, and deletion:
// Infrastructure/VectorStore/QdrantVectorStore.cs
namespace Archives.Infrastructure.VectorStore;
public sealed class QdrantVectorStore : IVectorStore
{
private readonly QdrantClient _client;
private readonly ILogger<QdrantVectorStore> _logger;
private readonly string _environment;
public QdrantVectorStore(
QdrantClient client,
IConfiguration configuration,
ILogger<QdrantVectorStore> logger)
{
_client = client;
_logger = logger;
_environment = configuration["Environment"] ?? "dev";
}
Upserting vectors with payloads:
The UpsertAsync method stores vectors along with metadata that enables filtering during search:
public async Task UpsertAsync(
string collection,
IReadOnlyList<VectorPoint> points,
CancellationToken ct = default)
{
if (points.Count == 0)
return;
var fullCollectionName = GetCollectionName(collection);
// Ensure collection exists
await EnsureCollectionAsync(fullCollectionName, points[0].Vector.Length, ct);
var qdrantPoints = points.Select(p => new PointStruct
{
Id = new PointId { Uuid = p.Id.ToString() },
Vectors = p.Vector,
Payload = { p.Payload.ToDictionary(kv => kv.Key, kv => ToQdrantValue(kv.Value)) }
}).ToList();
await _client.UpsertAsync(
fullCollectionName,
qdrantPoints,
cancellationToken: ct);
_logger.LogInformation(
"Upserted {PointCount} points to {Collection}",
points.Count,
fullCollectionName);
}
public async Task<IReadOnlyList<SearchResult>> SearchAsync(
string collection,
float[] queryVector,
SearchOptions options,
CancellationToken ct = default)
{
var fullCollectionName = GetCollectionName(collection);
Filter? filter = null;
if (options.Filters?.TryGetValue("user_id", out var userId) == true)
{
filter = new Filter
{
Must =
{
new Condition
{
Field = new FieldCondition
{
Key = "user_id",
Match = new Match { Keyword = userId.ToString()! }
}
}
}
};
}
var results = await _client.SearchAsync(
fullCollectionName,
queryVector,
filter: filter,
limit: (ulong)options.Limit,
scoreThreshold: options.ScoreThreshold,
cancellationToken: ct);
return results.Select(r => new SearchResult(
Guid.Parse(r.Id.Uuid),
r.Score,
r.Payload.ToDictionary(kv => kv.Key, kv => FromQdrantValue(kv.Value))
)).ToList();
}
public async Task DeleteByDocumentIdAsync(
string collection,
string documentId,
CancellationToken ct = default)
{
var fullCollectionName = GetCollectionName(collection);
await _client.DeleteAsync(
fullCollectionName,
new Filter
{
Must =
{
new Condition
{
Field = new FieldCondition
{
Key = "document_id",
Match = new Match { Keyword = documentId }
}
}
}
},
cancellationToken: ct);
_logger.LogInformation(
"Deleted vectors for document {DocumentId} from {Collection}",
documentId,
fullCollectionName);
}
private string GetCollectionName(string baseName)
=> $"{_environment}-{baseName}";
private async Task EnsureCollectionAsync(
string name,
int vectorSize,
CancellationToken ct)
{
try
{
await _client.GetCollectionInfoAsync(name, ct);
}
catch (QdrantException)
{
// Collection doesn't exist, create it
await _client.CreateCollectionAsync(
name,
new VectorParams
{
Size = (ulong)vectorSize,
Distance = Distance.Cosine
},
cancellationToken: ct);
_logger.LogInformation("Created collection {Collection}", name);
}
}
private static Value ToQdrantValue(object obj) => obj switch
{
string s => new Value { StringValue = s },
int i => new Value { IntegerValue = i },
long l => new Value { IntegerValue = l },
double d => new Value { DoubleValue = d },
bool b => new Value { BoolValue = b },
_ => new Value { StringValue = obj.ToString() ?? "" }
};
private static object FromQdrantValue(Value value) => value.KindCase switch
{
Value.KindOneofCase.StringValue => value.StringValue,
Value.KindOneofCase.IntegerValue => value.IntegerValue,
Value.KindOneofCase.DoubleValue => value.DoubleValue,
Value.KindOneofCase.BoolValue => value.BoolValue,
_ => value.StringValue
};
}
The Embedding Worker
Now we orchestrate everything in a background worker. This worker:
- Subscribes to
analysis.completedevents (after OCR) - Fetches document content from MinIO
- Chunks → Embeds → Stores
- Publishes
embedding.completedevent
// Workers/EmbeddingWorker.cs
namespace Archives.Workers;
public sealed class EmbeddingWorker : BackgroundService
{
private readonly INatsConnection _nats;
private readonly IServiceScopeFactory _scopeFactory;
private readonly ILogger<EmbeddingWorker> _logger;
private readonly string _environment;
public EmbeddingWorker(
INatsConnection nats,
IServiceScopeFactory scopeFactory,
IConfiguration configuration,
ILogger<EmbeddingWorker> logger)
{
_nats = nats;
_scopeFactory = scopeFactory;
_logger = logger;
_environment = configuration["Environment"] ?? "dev";
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
var js = _nats.CreateJetStreamContext();
var consumer = await js.CreateOrUpdateConsumerAsync(
stream: "DOCUMENTS",
config: new ConsumerConfig
{
DurableName = $"{_environment}-embedding-worker",
FilterSubject = $"{_environment}.archives.documents.analysis.completed",
AckPolicy = ConsumerConfigAckPolicy.Explicit,
AckWait = TimeSpan.FromMinutes(5),
MaxDeliver = 3
},
stoppingToken);
_logger.LogInformation("Embedding worker started");
await foreach (var msg in consumer.ConsumeAsync<AnalysisCompletedEvent>(
cancellationToken: stoppingToken))
{
await ProcessAsync(msg, stoppingToken);
}
}
private async Task ProcessAsync(
NatsJSMsg<AnalysisCompletedEvent> msg,
CancellationToken ct)
{
using var scope = _scopeFactory.CreateScope();
var embeddingService = scope.ServiceProvider.GetRequiredService<IEmbeddingService>();
var chunker = scope.ServiceProvider.GetRequiredService<ITextChunker>();
var vectorStore = scope.ServiceProvider.GetRequiredService<IVectorStore>();
var storage = scope.ServiceProvider.GetRequiredService<IObjectStorage>();
var evt = msg.Data!;
try
{
_logger.LogInformation(
"Processing embeddings for document {DocumentId}",
evt.DocumentId);
// 1. Get document content
var contentPath = $"processed/{evt.DocumentId}/content.md";
var content = await storage.GetObjectAsStringAsync(
evt.Bucket, contentPath, ct);
// 2. Chunk the content
var chunks = chunker.ChunkText(content, new ChunkingOptions
{
MaxTokens = 500,
OverlapTokens = 50,
PreserveMarkdown = true
});
_logger.LogDebug(
"Document {DocumentId} split into {ChunkCount} chunks",
evt.DocumentId,
chunks.Count);
// 3. Generate embeddings
var texts = chunks.Select(c => c.Content).ToList();
var embeddings = await embeddingService.GenerateEmbeddingsAsync(texts, ct);
// 4. Create vector points with metadata
var points = chunks.Zip(embeddings, (chunk, embedding) =>
{
var payload = new Dictionary<string, object>
{
["document_id"] = evt.DocumentId,
["user_id"] = evt.UserId,
["chunk_index"] = chunk.Index,
["content"] = chunk.Content,
["start_offset"] = chunk.StartOffset,
["end_offset"] = chunk.EndOffset
};
return new VectorPoint(Guid.NewGuid(), embedding, payload);
}).ToList();
// 5. Delete existing vectors for this document
await vectorStore.DeleteByDocumentIdAsync(
"documents", evt.DocumentId, ct);
// 6. Upsert new vectors
await vectorStore.UpsertAsync("documents", points, ct);
// 7. Publish completion event
await _nats.PublishAsync(
$"{_environment}.archives.documents.embedding.completed",
new EmbeddingCompletedEvent
{
DocumentId = evt.DocumentId,
Bucket = evt.Bucket,
UserId = evt.UserId,
ChunkCount = chunks.Count,
ProcessedAt = DateTimeOffset.UtcNow
},
cancellationToken: ct);
await msg.AckAsync(cancellationToken: ct);
_logger.LogInformation(
"Embeddings complete for {DocumentId}: {ChunkCount} chunks",
evt.DocumentId,
chunks.Count);
}
catch (Exception ex)
{
_logger.LogError(ex,
"Failed to generate embeddings for {DocumentId}",
evt.DocumentId);
await msg.NakAsync(
delay: TimeSpan.FromMinutes(1),
cancellationToken: ct);
}
}
}
Semantic Search Implementation
Finally, the search service that makes it all useful. At query time, we:
- Embed the search query
- Search Qdrant with user_id filter
- Return matching chunks with scores
// Application/Services/SemanticSearchService.cs
namespace Archives.Application.Services;
public sealed class SemanticSearchService : ISemanticSearchService
{
private readonly IEmbeddingService _embeddingService;
private readonly IVectorStore _vectorStore;
private readonly ILogger<SemanticSearchService> _logger;
public SemanticSearchService(
IEmbeddingService embeddingService,
IVectorStore vectorStore,
ILogger<SemanticSearchService> logger)
{
_embeddingService = embeddingService;
_vectorStore = vectorStore;
_logger = logger;
}
public async Task<IReadOnlyList<SemanticSearchResult>> SearchAsync(
string query,
string userId,
int limit = 10,
CancellationToken ct = default)
{
// Generate query embedding
var queryEmbedding = await _embeddingService.GenerateEmbeddingAsync(query, ct);
// Search vector store
var results = await _vectorStore.SearchAsync(
"documents",
queryEmbedding,
new SearchOptions
{
Limit = limit,
ScoreThreshold = 0.7f,
Filters = new Dictionary<string, object> { ["user_id"] = userId }
},
ct);
// Map to search results
return results.Select(r => new SemanticSearchResult
{
DocumentId = r.Payload["document_id"].ToString()!,
Content = r.Payload["content"].ToString()!,
ChunkIndex = Convert.ToInt32(r.Payload["chunk_index"]),
Score = r.Score
}).ToList();
}
}
public record SemanticSearchResult
{
public required string DocumentId { get; init; }
public required string Content { get; init; }
public required int ChunkIndex { get; init; }
public required float Score { get; init; }
}
Service Registration
Wire everything up in dependency injection:
// Program.cs
// Ollama embedding service with timeout for large batches
builder.Services.AddHttpClient<IEmbeddingService, OllamaEmbeddingService>(client =>
{
client.BaseAddress = new Uri(
builder.Configuration["Ollama:Url"]
?? "http://ollama.ai.svc.cluster.local:11434");
client.Timeout = TimeSpan.FromMinutes(2); // Large documents take time
});
// Qdrant vector store - singleton since client is thread-safe
builder.Services.AddSingleton(sp =>
{
var config = sp.GetRequiredService<IConfiguration>();
var host = config["Qdrant:Host"] ?? "qdrant.data-layer.svc.cluster.local";
var port = config.GetValue("Qdrant:GrpcPort", 6334);
return new QdrantClient(host, port);
});
builder.Services.AddSingleton<IVectorStore, QdrantVectorStore>();
// Chunking (stateless) and search (needs DI)
builder.Services.AddSingleton<ITextChunker, SemanticTextChunker>();
builder.Services.AddScoped<ISemanticSearchService, SemanticSearchService>();
Conclusion
We’ve built a complete semantic embedding pipeline:
| Component | Purpose |
|---|---|
| IEmbeddingService | Generate vectors via Ollama |
| ITextChunker | Markdown-aware semantic splitting |
| IVectorStore | Qdrant storage with filtering |
| EmbeddingWorker | Event-driven pipeline orchestration |
| SemanticSearchService | Query-time embedding and search |
Key takeaways:
- Chunk thoughtfully: Respect document structure for better retrieval
- Store metadata: User IDs, document IDs, and chunk content enable filtering and reconstruction
- Delete before upsert: Prevent duplicate vectors when re-processing documents
- Use score thresholds: A 0.7 threshold filters out low-confidence matches
The pipeline processes documents asynchronously, scales horizontally (add more workers), and isolates user data through Qdrant filters.
Next in the series: We’ll build hybrid search that combines semantic similarity with full-text keyword search for the best of both approaches.