Kubernetes Cluster Setup for Agentic AI Workloads

Introduction

Agentic and LLM-heavy systems are infrastructure-heavy systems. To run them reliably, your cluster needs clear boundaries between app workloads, data services, messaging, and platform controls.

Baseline Topology

archives-* namespaces for API, workers, web, and AI services.
data-layer for Postgres, MinIO, Qdrant, and NATS.
ai for model-serving components like Ollama.
monitoring for observability and telemetry pipelines.

Critical Setup Decisions

Use GitOps (base + overlays) as the only deployment path.
Keep secrets in ExternalSecrets/Infisical flows.
Use FQDN service addressing across namespaces.
Validate health checks and endpoint ports per environment.

Operational Lessons from Recent Changes

GraphRAG service deployment required overlay and secret fixes before stabilization.
Incorrect service ports and OTEL endpoints caused avoidable runtime failures.
CI integration for new services is required from day one to avoid drift.

Conclusion

Cluster setup is the main determinant of reliability for agentic systems. Start with strong namespace boundaries, declarative deployment, and explicit secrets/runtime wiring.

Related reading:

/gitops-flux-cd-introduction/
/graphrag-gitops-kustomize-externalsecrets/