Running PostgreSQL on Kubernetes with CloudNative-PG
Deploy a production-ready PostgreSQL cluster on Kubernetes using CloudNative-PG, with automated backups, failover, and multi-database support.
CloudNative-PG (CNPG) is a Kubernetes operator that manages the full lifecycle of PostgreSQL clusters. It provides high availability, automated failover, backup/recovery, and connection pooling—all with declarative configuration.
flowchart TD
subgraph "CNPG Cluster"
OP["CNPG Operator"] -->|"manages"| PRI["Primary"]
OP -->|"manages"| STB1["Standby 1"]
OP -->|"manages"| STB2["Standby 2"]
PRI -->|"streaming replication"| STB1
PRI -->|"streaming replication"| STB2
end
PRI -->|"WAL archive"| MINIO["MinIO\n(Backups)"]
PGB["PgBouncer\n(Connection Pool)"] --> PRI
APP["Archives API"] --> PGB
WORK["Workers"] --> PGB
style OP fill:#1a2744,stroke:#6366f1,color:#e2e8f0
style PRI fill:#1a2744,stroke:#22c55e,color:#e2e8f0
style STB1 fill:#1a2744,stroke:#f59e0b,color:#e2e8f0
style STB2 fill:#1a2744,stroke:#f59e0b,color:#e2e8f0
style MINIO fill:#1a2744,stroke:#a855f7,color:#e2e8f0
Why CNPG?
| Feature | CNPG | StatefulSet | Helm Chart |
|---|---|---|---|
| Automated failover | ✅ | ❌ | Varies |
| Point-in-time recovery | ✅ | ❌ | ❌ |
| Connection pooling | ✅ Built-in | Manual | Optional |
| Backup to S3/GCS | ✅ | Manual | Manual |
| Rolling updates | ✅ | Basic | Basic |
| Declarative config | ✅ | ✅ | ❌ |
Installing CNPG Operator
# Add the CNPG Helm repository
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update
# Install the operator
helm install cnpg cnpg/cloudnative-pg \
--namespace cnpg-system \
--create-namespace
Verify installation:
kubectl get deployment -n cnpg-system
# NAME READY AGE
# cnpg-cloudnative-pg 1/1 2m
Creating a PostgreSQL Cluster
Basic Cluster
# infrastructure/data-layer/postgres/cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres
namespace: data-layer
spec:
instances: 3
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
effective_cache_size: "768MB"
maintenance_work_mem: "128MB"
checkpoint_completion_target: "0.9"
wal_buffers: "8MB"
random_page_cost: "1.1"
effective_io_concurrency: "200"
work_mem: "4MB"
min_wal_size: "1GB"
max_wal_size: "4GB"
bootstrap:
initdb:
database: postgres
owner: postgres
storage:
size: 20Gi
storageClass: local-path
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1"
Multi-Database Configuration
For our project, we create environment-specific databases in one cluster:
# infrastructure/data-layer/postgres/cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres
namespace: data-layer
spec:
instances: 3
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
bootstrap:
initdb:
database: myapp_dev
owner: myapp_dev
postInitSQL:
- CREATE DATABASE myapp_staging OWNER myapp_staging;
- CREATE DATABASE myapp_prod OWNER myapp_prod;
postInitApplicationSQL:
- GRANT ALL PRIVILEGES ON DATABASE myapp_dev TO myapp_dev;
- GRANT ALL PRIVILEGES ON DATABASE myapp_staging TO myapp_staging;
- GRANT ALL PRIVILEGES ON DATABASE myapp_prod TO myapp_prod;
managed:
roles:
- name: myapp_dev
ensure: present
login: true
passwordSecret:
name: postgres-archives-dev-password
- name: myapp_staging
ensure: present
login: true
passwordSecret:
name: postgres-myapp-staging-password
- name: myapp_prod
ensure: present
login: true
passwordSecret:
name: postgres-myapp-prod-password
storage:
size: 50Gi
storageClass: local-path
Secrets for Roles
Create secrets for each database user:
# infrastructure/data-layer/postgres/secrets.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-archives-dev-password
namespace: data-layer
spec:
secretStoreRef:
kind: ClusterSecretStore
name: infisical-store
target:
name: postgres-archives-dev-password
data:
- secretKey: password
remoteRef:
key: APP_DEV_DB_PASSWORD
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-myapp-staging-password
namespace: data-layer
spec:
secretStoreRef:
kind: ClusterSecretStore
name: infisical-store
target:
name: postgres-myapp-staging-password
data:
- secretKey: password
remoteRef:
key: APP_STAGING_DB_PASSWORD
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-myapp-prod-password
namespace: data-layer
spec:
secretStoreRef:
kind: ClusterSecretStore
name: infisical-store
target:
name: postgres-myapp-prod-password
data:
- secretKey: password
remoteRef:
key: APP_PROD_DB_PASSWORD
Backup Configuration
S3-Compatible Backup (MinIO)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres
namespace: data-layer
spec:
# ... previous configuration ...
backup:
barmanObjectStore:
destinationPath: s3://postgres-backups/
endpointURL: http://minio.data-layer.svc.cluster.local:9000
s3Credentials:
accessKeyId:
name: minio-credentials
key: access-key
secretAccessKey:
name: minio-credentials
key: secret-key
wal:
compression: gzip
maxParallel: 2
data:
compression: gzip
immediateCheckpoint: true
retentionPolicy: "30d"
Scheduled Backups
# infrastructure/data-layer/postgres/scheduled-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: postgres-daily
namespace: data-layer
spec:
schedule: "0 2 * * *" # 2 AM daily
backupOwnerReference: self
cluster:
name: postgres
target: prefer-standby # Don't impact primary
Connection Pooling with PgBouncer
CNPG includes optional PgBouncer pooling:
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
name: postgres-pooler
namespace: data-layer
spec:
cluster:
name: postgres
instances: 2
type: rw # Read-write pooler
pgbouncer:
poolMode: transaction
parameters:
max_client_conn: "1000"
default_pool_size: "25"
Access via the pooler service:
postgres-pooler-rw.data-layer.svc.cluster.local:5432
Monitoring
Exposing Metrics
CNPG exposes Prometheus metrics automatically. Create a ServiceMonitor:
# infrastructure/data-layer/postgres/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: postgres-metrics
namespace: data-layer
spec:
selector:
matchLabels:
cnpg.io/cluster: postgres
endpoints:
- port: metrics
interval: 30s
Key Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
cnpg_pg_replication_lag | Replication lag in bytes | > 10MB |
cnpg_pg_stat_archiver_archived_count | WAL files archived | Stalled |
cnpg_pg_database_size_bytes | Database size | Near storage limit |
cnpg_pg_stat_activity_count | Active connections | > 80% max |
High Availability
Failover Behavior
CNPG automatically handles failover:
- Primary becomes unavailable
- CNPG detects failure (configurable timeout)
- Most up-to-date standby is promoted
- Services updated to point to new primary
- Old primary rejoins as standby when recovered
Testing Failover
# Delete the primary pod (CNPG will failover)
kubectl delete pod postgres-1 -n data-layer
# Watch failover
kubectl get pods -n data-layer -w
# Check cluster status
kubectl get cluster postgres -n data-layer -o yaml
Accessing from Applications
Service Discovery
CNPG creates services for each role:
| Service | Purpose |
|---|---|
postgres-rw | Read-write (primary) |
postgres-ro | Read-only (standbys) |
postgres-r | Any replica (for reads) |
Connection String Configuration
# apps/myapp-api/staging/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-api-config
namespace: myapp-staging
data:
ConnectionStrings__AppDb: >-
Host=postgres-rw.data-layer.svc.cluster.local;
Port=5432;
Database=myapp_staging;
Username=myapp_staging;
Include Error Detail=true;
SSL Mode=Disable
Password from secret:
# apps/myapp-api/staging/deployment.yaml
env:
- name: ConnectionStrings__AppDb
valueFrom:
secretKeyRef:
name: archives-secrets
key: DATABASE_CONNECTION_STRING
.NET Configuration
// Program.cs
builder.Services.AddDbContext<AppDbContext>(options =>
{
var connectionString = builder.Configuration
.GetConnectionString("AppDb");
options.UseNpgsql(connectionString, npgsql =>
{
npgsql.EnableRetryOnFailure(
maxRetryCount: 5,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorCodesToAdd: null);
npgsql.CommandTimeout(30);
});
});
Local Development Access
Expose PostgreSQL via NodePort for local development:
# infrastructure/data-layer/postgres/nodeport.yaml
apiVersion: v1
kind: Service
metadata:
name: postgres-nodeport
namespace: data-layer
spec:
type: NodePort
selector:
cnpg.io/cluster: postgres
role: primary
ports:
- port: 5432
targetPort: 5432
nodePort: 30432
Connection via Tailscale:
Host=192.168.0.6;Port=30432;Database=myapp_dev;Username=myapp_dev;Password=xxx
Point-in-Time Recovery
Recover to a specific timestamp:
# recovery-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: postgres-recovery
namespace: data-layer
spec:
instances: 1
bootstrap:
recovery:
source: postgres
recoveryTarget:
targetTime: "2026-01-30 10:00:00+00"
externalClusters:
- name: postgres
barmanObjectStore:
destinationPath: s3://postgres-backups/
endpointURL: http://minio.data-layer.svc.cluster.local:9000
s3Credentials:
accessKeyId:
name: minio-credentials
key: access-key
secretAccessKey:
name: minio-credentials
key: secret-key
Maintenance Operations
Rolling Updates
# Update PostgreSQL version
kubectl patch cluster postgres -n data-layer \
--type merge \
-p '{"spec":{"imageName":"ghcr.io/cloudnative-pg/postgresql:16.2"}}'
CNPG performs rolling updates automatically.
Manual Switchover
# Promote a standby
kubectl cnpg promote postgres postgres-2 -n data-layer
Check Cluster Status
kubectl cnpg status postgres -n data-layer
Output:
Cluster Summary
Name: postgres
Namespace: data-layer
PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:16.2
Primary instance: postgres-1
Status: Cluster in healthy state
Instances: 3
Ready instances: 3
Instances status
Name Database Size Current LSN Replication role Status QoS
---- ------------- ----------- ---------------- ------ ---
postgres-1 1.5 GB 0/A000060 Primary OK Guaranteed
postgres-2 1.5 GB 0/A000060 Standby (sync) OK Guaranteed
postgres-3 1.5 GB 0/A000060 Standby (async) OK Guaranteed
Summary
We’ve deployed a production-ready PostgreSQL cluster with CNPG:
- 3-instance HA cluster with automatic failover
- Multi-database setup for environment isolation
- Automated backups to S3-compatible storage
- Connection pooling with PgBouncer
- Monitoring with Prometheus metrics
- Point-in-time recovery capability
CNPG transforms PostgreSQL management from complex operations into declarative Kubernetes resources.
The most valuable lesson from running CNPG in production was the importance of testing failover before you need it. We scheduled monthly failover drills and discovered that our connection pooler configuration needed tuning — PgBouncer’s default pool_mode=session caused a 15-second connection storm after failover. Switching to transaction mode eliminated the issue entirely.
Next Steps
- PostgreSQL Security Design — harden your CNPG cluster with row-level security and audit logging.
- External Secrets with Infisical — manage database credentials through Infisical instead of static secrets.
- Infrastructure Performance: SSD vs NAS — choose the right storage backend for your PostgreSQL data.