Recipe Advanced 20 min

Running PostgreSQL on Kubernetes with CloudNative-PG

Deploy a production-ready PostgreSQL cluster on Kubernetes using CloudNative-PG, with automated backups, failover, and multi-database support.

By Victor Robin

CloudNative-PG (CNPG) is a Kubernetes operator that manages the full lifecycle of PostgreSQL clusters. It provides high availability, automated failover, backup/recovery, and connection pooling—all with declarative configuration.

flowchart TD
    subgraph "CNPG Cluster"
        OP["CNPG Operator"] -->|"manages"| PRI["Primary"]
        OP -->|"manages"| STB1["Standby 1"]
        OP -->|"manages"| STB2["Standby 2"]
        PRI -->|"streaming replication"| STB1
        PRI -->|"streaming replication"| STB2
    end

    PRI -->|"WAL archive"| MINIO["MinIO\n(Backups)"]
    PGB["PgBouncer\n(Connection Pool)"] --> PRI
    APP["Archives API"] --> PGB
    WORK["Workers"] --> PGB

    style OP fill:#1a2744,stroke:#6366f1,color:#e2e8f0
    style PRI fill:#1a2744,stroke:#22c55e,color:#e2e8f0
    style STB1 fill:#1a2744,stroke:#f59e0b,color:#e2e8f0
    style STB2 fill:#1a2744,stroke:#f59e0b,color:#e2e8f0
    style MINIO fill:#1a2744,stroke:#a855f7,color:#e2e8f0

Why CNPG?

FeatureCNPGStatefulSetHelm Chart
Automated failoverVaries
Point-in-time recovery
Connection pooling✅ Built-inManualOptional
Backup to S3/GCSManualManual
Rolling updatesBasicBasic
Declarative config
[CloudNative-PG] — EDB

Installing CNPG Operator

# Add the CNPG Helm repository
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update

# Install the operator
helm install cnpg cnpg/cloudnative-pg \
  --namespace cnpg-system \
  --create-namespace

Verify installation:

kubectl get deployment -n cnpg-system
# NAME                          READY   AGE
# cnpg-cloudnative-pg          1/1     2m

Creating a PostgreSQL Cluster

Basic Cluster

# infrastructure/data-layer/postgres/cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres
  namespace: data-layer
spec:
  instances: 3
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
      effective_cache_size: "768MB"
      maintenance_work_mem: "128MB"
      checkpoint_completion_target: "0.9"
      wal_buffers: "8MB"
      random_page_cost: "1.1"
      effective_io_concurrency: "200"
      work_mem: "4MB"
      min_wal_size: "1GB"
      max_wal_size: "4GB"
  
  bootstrap:
    initdb:
      database: postgres
      owner: postgres
  
  storage:
    size: 20Gi
    storageClass: local-path
  
  resources:
    requests:
      memory: "512Mi"
      cpu: "250m"
    limits:
      memory: "1Gi"
      cpu: "1"

Multi-Database Configuration

For our project, we create environment-specific databases in one cluster:

# infrastructure/data-layer/postgres/cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres
  namespace: data-layer
spec:
  instances: 3
  
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
  
  bootstrap:
    initdb:
      database: myapp_dev
      owner: myapp_dev
      postInitSQL:
        - CREATE DATABASE myapp_staging OWNER myapp_staging;
        - CREATE DATABASE myapp_prod OWNER myapp_prod;
      postInitApplicationSQL:
        - GRANT ALL PRIVILEGES ON DATABASE myapp_dev TO myapp_dev;
        - GRANT ALL PRIVILEGES ON DATABASE myapp_staging TO myapp_staging;
        - GRANT ALL PRIVILEGES ON DATABASE myapp_prod TO myapp_prod;
  
  managed:
    roles:
      - name: myapp_dev
        ensure: present
        login: true
        passwordSecret:
          name: postgres-archives-dev-password
      - name: myapp_staging
        ensure: present
        login: true
        passwordSecret:
          name: postgres-myapp-staging-password
      - name: myapp_prod
        ensure: present
        login: true
        passwordSecret:
          name: postgres-myapp-prod-password
  
  storage:
    size: 50Gi
    storageClass: local-path

Secrets for Roles

Create secrets for each database user:

# infrastructure/data-layer/postgres/secrets.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: postgres-archives-dev-password
  namespace: data-layer
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: infisical-store
  target:
    name: postgres-archives-dev-password
  data:
    - secretKey: password
      remoteRef:
        key: APP_DEV_DB_PASSWORD
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: postgres-myapp-staging-password
  namespace: data-layer
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: infisical-store
  target:
    name: postgres-myapp-staging-password
  data:
    - secretKey: password
      remoteRef:
        key: APP_STAGING_DB_PASSWORD
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: postgres-myapp-prod-password
  namespace: data-layer
spec:
  secretStoreRef:
    kind: ClusterSecretStore
    name: infisical-store
  target:
    name: postgres-myapp-prod-password
  data:
    - secretKey: password
      remoteRef:
        key: APP_PROD_DB_PASSWORD

Backup Configuration

S3-Compatible Backup (MinIO)

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres
  namespace: data-layer
spec:
  # ... previous configuration ...
  
  backup:
    barmanObjectStore:
      destinationPath: s3://postgres-backups/
      endpointURL: http://minio.data-layer.svc.cluster.local:9000
      s3Credentials:
        accessKeyId:
          name: minio-credentials
          key: access-key
        secretAccessKey:
          name: minio-credentials
          key: secret-key
      wal:
        compression: gzip
        maxParallel: 2
      data:
        compression: gzip
        immediateCheckpoint: true
    retentionPolicy: "30d"

Scheduled Backups

# infrastructure/data-layer/postgres/scheduled-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
  name: postgres-daily
  namespace: data-layer
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  backupOwnerReference: self
  cluster:
    name: postgres
  target: prefer-standby  # Don't impact primary

Connection Pooling with PgBouncer

CNPG includes optional PgBouncer pooling:

apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: postgres-pooler
  namespace: data-layer
spec:
  cluster:
    name: postgres
  instances: 2
  type: rw  # Read-write pooler
  
  pgbouncer:
    poolMode: transaction
    parameters:
      max_client_conn: "1000"
      default_pool_size: "25"

Access via the pooler service:

postgres-pooler-rw.data-layer.svc.cluster.local:5432

Monitoring

Exposing Metrics

CNPG exposes Prometheus metrics automatically. Create a ServiceMonitor:

# infrastructure/data-layer/postgres/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-metrics
  namespace: data-layer
spec:
  selector:
    matchLabels:
      cnpg.io/cluster: postgres
  endpoints:
    - port: metrics
      interval: 30s

Key Metrics

MetricDescriptionAlert Threshold
cnpg_pg_replication_lagReplication lag in bytes> 10MB
cnpg_pg_stat_archiver_archived_countWAL files archivedStalled
cnpg_pg_database_size_bytesDatabase sizeNear storage limit
cnpg_pg_stat_activity_countActive connections> 80% max

High Availability

Failover Behavior

CNPG automatically handles failover:

  1. Primary becomes unavailable
  2. CNPG detects failure (configurable timeout)
  3. Most up-to-date standby is promoted
  4. Services updated to point to new primary
  5. Old primary rejoins as standby when recovered

Testing Failover

# Delete the primary pod (CNPG will failover)
kubectl delete pod postgres-1 -n data-layer

# Watch failover
kubectl get pods -n data-layer -w

# Check cluster status
kubectl get cluster postgres -n data-layer -o yaml

Accessing from Applications

Service Discovery

CNPG creates services for each role:

ServicePurpose
postgres-rwRead-write (primary)
postgres-roRead-only (standbys)
postgres-rAny replica (for reads)

Connection String Configuration

# apps/myapp-api/staging/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-api-config
  namespace: myapp-staging
data:
  ConnectionStrings__AppDb: >-
    Host=postgres-rw.data-layer.svc.cluster.local;
    Port=5432;
    Database=myapp_staging;
    Username=myapp_staging;
    Include Error Detail=true;
    SSL Mode=Disable

Password from secret:

# apps/myapp-api/staging/deployment.yaml
env:
  - name: ConnectionStrings__AppDb
    valueFrom:
      secretKeyRef:
        name: archives-secrets
        key: DATABASE_CONNECTION_STRING

.NET Configuration

// Program.cs
builder.Services.AddDbContext<AppDbContext>(options =>
{
    var connectionString = builder.Configuration
        .GetConnectionString("AppDb");
    
    options.UseNpgsql(connectionString, npgsql =>
    {
        npgsql.EnableRetryOnFailure(
            maxRetryCount: 5,
            maxRetryDelay: TimeSpan.FromSeconds(30),
            errorCodesToAdd: null);
        
        npgsql.CommandTimeout(30);
    });
});

Local Development Access

Expose PostgreSQL via NodePort for local development:

# infrastructure/data-layer/postgres/nodeport.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-nodeport
  namespace: data-layer
spec:
  type: NodePort
  selector:
    cnpg.io/cluster: postgres
    role: primary
  ports:
    - port: 5432
      targetPort: 5432
      nodePort: 30432

Connection via Tailscale:

Host=192.168.0.6;Port=30432;Database=myapp_dev;Username=myapp_dev;Password=xxx

Point-in-Time Recovery

Recover to a specific timestamp:

# recovery-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-recovery
  namespace: data-layer
spec:
  instances: 1
  
  bootstrap:
    recovery:
      source: postgres
      recoveryTarget:
        targetTime: "2026-01-30 10:00:00+00"
  
  externalClusters:
    - name: postgres
      barmanObjectStore:
        destinationPath: s3://postgres-backups/
        endpointURL: http://minio.data-layer.svc.cluster.local:9000
        s3Credentials:
          accessKeyId:
            name: minio-credentials
            key: access-key
          secretAccessKey:
            name: minio-credentials
            key: secret-key

Maintenance Operations

Rolling Updates

# Update PostgreSQL version
kubectl patch cluster postgres -n data-layer \
  --type merge \
  -p '{"spec":{"imageName":"ghcr.io/cloudnative-pg/postgresql:16.2"}}'

CNPG performs rolling updates automatically.

Manual Switchover

# Promote a standby
kubectl cnpg promote postgres postgres-2 -n data-layer

Check Cluster Status

kubectl cnpg status postgres -n data-layer

Output:

Cluster Summary
Name:               postgres
Namespace:          data-layer
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:16.2
Primary instance:   postgres-1
Status:             Cluster in healthy state
Instances:          3
Ready instances:    3

Instances status
Name        Database Size  Current LSN  Replication role  Status  QoS
----        -------------  -----------  ----------------  ------  ---
postgres-1  1.5 GB         0/A000060    Primary           OK      Guaranteed
postgres-2  1.5 GB         0/A000060    Standby (sync)    OK      Guaranteed
postgres-3  1.5 GB         0/A000060    Standby (async)   OK      Guaranteed

Summary

We’ve deployed a production-ready PostgreSQL cluster with CNPG:

  • 3-instance HA cluster with automatic failover
  • Multi-database setup for environment isolation
  • Automated backups to S3-compatible storage
  • Connection pooling with PgBouncer
  • Monitoring with Prometheus metrics
  • Point-in-time recovery capability

CNPG transforms PostgreSQL management from complex operations into declarative Kubernetes resources.

The most valuable lesson from running CNPG in production was the importance of testing failover before you need it. We scheduled monthly failover drills and discovered that our connection pooler configuration needed tuning — PgBouncer’s default pool_mode=session caused a 15-second connection storm after failover. Switching to transaction mode eliminated the issue entirely.

Next Steps

Further Reading

[CloudNative-PG Documentation] — CloudNative-PG Community , 2024 [PostgreSQL High Availability with Kubernetes] — EDB , 2024 [PostgreSQL Documentation] — The PostgreSQL Global Development Group , 2024