Backend Intermediate 14 min

Health Checks and Readiness Probes in .NET

Implement comprehensive health checks for .NET applications with database, cache, and dependency monitoring for Kubernetes deployments.

By Victor Robin Updated:

When I first wired up health checks for our .NET API running on Kubernetes, I made the classic mistake of putting all dependency checks into the liveness probe. The result was catastrophic: a brief Redis connection hiccup caused Kubernetes to restart every API pod simultaneously, turning a minor blip into a full outage. That incident taught me the critical distinction between liveness and readiness probes the hard way. Liveness should only answer “is the process alive?” while readiness answers “can this pod handle traffic right now?” Getting this separation right eliminated our health check flapping problem entirely and gave us stable, predictable behavior during transient dependency failures.

Introduction

In a containerized environment like Kubernetes, the orchestrator needs to know the exact state of your application to manage traffic routing and restarts. Health checks enable Kubernetes to determine when your application is ready to receive traffic (Readiness) and when it needs to be restarted (Liveness).

This guide covers implementing comprehensive health monitoring in .NET, ensuring your services are observable and resilient.

[Health checks in ASP.NET Core] — Microsoft , 2024-11-01 [Configure Liveness, Readiness and Startup Probes] — Kubernetes Authors , 2024-06-15

What We’ll Build

  1. Liveness & Readiness Probes: Separate checks for startup, liveness, and readiness.
  2. Infrastructure Checks: Custom checks for PostgreSQL, Redis, NATS, and MinIO.
  3. Health Dashboard: A graphical UI to visualize the state of your cluster dependencies.

Architecture Overview

flowchart LR
    Kubelet["☸️ Kubelet"] -->|Probes /health/live| API["🚀 API Pod"]
    LB["⚖️ Load Balancer"] -.->|Training /health/ready| API
    
    subgraph Checks["🏥 Health Checks"]
        API --> DB[(SQL)]
        API --> Redis[(Cache)]
        API --> Msg[(NATS)]
    end

    classDef primary fill:#7c3aed,color:#fff
    classDef secondary fill:#06b6d4,color:#fff
    classDef db fill:#f43f5e,color:#fff
    classDef warning fill:#fbbf24,color:#000

    class API primary
    class LB secondary
    class DB,Redis,Msg db
    class Kubelet warning

Implementation

Basic Setup

Service Registration

// Program.cs
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy(), tags: ["live"])
    
    // Database
    .AddNpgSql(
        builder.Configuration.GetConnectionString("AppDb")!,
        name: "postgresql",
        tags: ["ready", "db"])
    
    // Redis
    .AddRedis(
        builder.Configuration.GetConnectionString("Redis")!,
        name: "redis",
        tags: ["ready", "cache"])
    
    // Custom checks
    .AddCheck<NatsHealthCheck>("nats", tags: ["ready", "messaging"])
    .AddCheck<MinioHealthCheck>("minio", tags: ["ready", "storage"])
    .AddCheck<QdrantHealthCheck>("qdrant", tags: ["ready", "search"]);

Endpoint Mapping

// Program.cs
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("live"),
    ResponseWriter = WriteMinimalResponse
});

app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = WriteDetailedResponse
});

app.MapHealthChecks("/health", new HealthCheckOptions
{
    ResponseWriter = WriteDetailedResponse
});

Custom Health Checks

[AspNetCore.Diagnostics.HealthChecks] — Xabaril , 2024-03-01

NATS Health Check

// Infrastructure/HealthChecks/NatsHealthCheck.cs
public sealed class NatsHealthCheck : IHealthCheck
{
    private readonly INatsConnection _connection;
    private readonly ILogger<NatsHealthCheck> _logger;

    public NatsHealthCheck(INatsConnection connection, ILogger<NatsHealthCheck> logger)
    {
        _connection = connection;
        _logger = logger;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken ct = default)
    {
        try
        {
            var connectionState = _connection.ConnectionState;
            
            if (connectionState != NatsConnectionState.Open)
            {
                return HealthCheckResult.Unhealthy(
                    $"NATS connection state: {connectionState}");
            }

            // Verify with a ping
            var rtt = await _connection.PingAsync(ct);
            
            var data = new Dictionary<string, object>
            {
                ["server"] = _connection.ServerInfo?.Name ?? "unknown",
                ["rtt_ms"] = rtt.TotalMilliseconds,
                ["jetstream"] = _connection.ServerInfo?.IsJetStreamEnabled ?? false
            };

            return HealthCheckResult.Healthy("NATS connection is healthy", data);
        }
        catch (Exception ex)
        {
            _logger.LogWarning(ex, "NATS health check failed");
            return HealthCheckResult.Unhealthy("NATS connection failed", ex);
        }
    }
}

MinIO Health Check

// Infrastructure/HealthChecks/MinioHealthCheck.cs
public sealed class MinioHealthCheck : IHealthCheck
{
    private readonly IMinioClient _client;
    private readonly ILogger<MinioHealthCheck> _logger;

    public MinioHealthCheck(IMinioClient client, ILogger<MinioHealthCheck> logger)
    {
        _client = client;
        _logger = logger;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken ct = default)
    {
        try
        {
            // List buckets as a connectivity test
            var buckets = await _client.ListBucketsAsync(ct);
            
            var data = new Dictionary<string, object>
            {
                ["bucket_count"] = buckets.Buckets.Count,
                ["endpoint"] = _client.Config.Endpoint
            };

            return HealthCheckResult.Healthy("MinIO is accessible", data);
        }
        catch (MinioException ex)
        {
            _logger.LogWarning(ex, "MinIO health check failed");
            return HealthCheckResult.Unhealthy("MinIO connection failed", ex);
        }
    }
}

Qdrant Health Check

// Infrastructure/HealthChecks/QdrantHealthCheck.cs
public sealed class QdrantHealthCheck : IHealthCheck
{
    private readonly QdrantClient _client;
    private readonly ILogger<QdrantHealthCheck> _logger;

    public QdrantHealthCheck(QdrantClient client, ILogger<QdrantHealthCheck> logger)
    {
        _client = client;
        _logger = logger;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken ct = default)
    {
        try
        {
            var healthInfo = await _client.HealthAsync(ct);
            
            var data = new Dictionary<string, object>
            {
                ["version"] = healthInfo.Version,
                ["status"] = "healthy"
            };

            return HealthCheckResult.Healthy("Qdrant is healthy", data);
        }
        catch (RpcException ex) when (ex.StatusCode == StatusCode.Unavailable)
        {
            _logger.LogWarning(ex, "Qdrant unavailable");
            return HealthCheckResult.Unhealthy("Qdrant is unavailable", ex);
        }
        catch (Exception ex)
        {
            _logger.LogWarning(ex, "Qdrant health check failed");
            return HealthCheckResult.Degraded("Qdrant check failed", ex);
        }
    }
}

Response Writers

Detailed Response Writer

// Infrastructure/HealthChecks/HealthCheckResponseWriter.cs
public static class HealthCheckResponseWriter
{
    public static Task WriteDetailedResponse(HttpContext context, HealthReport report)
    {
        context.Response.ContentType = "application/json";
        
        var response = new
        {
            status = report.Status.ToString(),
            duration = report.TotalDuration.TotalMilliseconds,
            checks = report.Entries.Select(e => new
            {
                name = e.Key,
                status = e.Value.Status.ToString(),
                duration = e.Value.Duration.TotalMilliseconds,
                description = e.Value.Description,
                data = e.Value.Data,
                exception = e.Value.Exception?.Message
            }),
            timestamp = DateTime.UtcNow
        };

        return context.Response.WriteAsJsonAsync(response);
    }

    public static Task WriteMinimalResponse(HttpContext context, HealthReport report)
    {
        context.Response.ContentType = "text/plain";
        return context.Response.WriteAsync(report.Status.ToString());
    }
}

Sample Response

{
  "status": "Healthy",
  "duration": 245.32,
  "checks": [
    {
      "name": "postgresql",
      "status": "Healthy",
      "duration": 12.5,
      "description": null,
      "data": {}
    },
    {
      "name": "nats",
      "status": "Healthy",
      "duration": 5.2,
      "description": "NATS connection is healthy",
      "data": {
        "server": "nats-0",
        "rtt_ms": 1.23,
        "jetstream": true
      }
    },
    {
      "name": "minio",
      "status": "Healthy",
      "duration": 45.8,
      "description": "MinIO is accessible",
      "data": {
        "bucket_count": 12,
        "endpoint": "minio.data-layer.svc.cluster.local:9000"
      }
    }
  ],
  "timestamp": "2026-03-14T10:30:00Z"
}

Kubernetes Probes

[Pod Lifecycle] — Kubernetes Authors , 2024-05-20

Deployment Configuration

# apps/myapp-api/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-api
spec:
  template:
    spec:
      containers:
        - name: api
          image: myapp-api:latest
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 30

Health Check UI

Dashboard Configuration

// Program.cs
builder.Services.AddHealthChecksUI(setup =>
{
    setup.SetEvaluationTimeInSeconds(30);
    setup.MaximumHistoryEntriesPerEndpoint(50);
    
    setup.AddHealthCheckEndpoint("API", "/health");
    setup.AddHealthCheckEndpoint("Workers", "http://myapp-workers:8080/health");
})
.AddInMemoryStorage();

// Map the UI
app.MapHealthChecksUI(options =>
{
    options.UIPath = "/health-ui";
    options.ApiPath = "/health-api";
});

Startup Health Checks

Deferred Initialization Check

// Infrastructure/HealthChecks/StartupHealthCheck.cs
public sealed class StartupHealthCheck : IHealthCheck
{
    private volatile bool _isReady;

    public bool IsReady
    {
        get => _isReady;
        set => _isReady = value;
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context,
        CancellationToken ct = default)
    {
        return Task.FromResult(_isReady
            ? HealthCheckResult.Healthy("Application started")
            : HealthCheckResult.Unhealthy("Application is starting"));
    }
}

// Program.cs
builder.Services.AddSingleton<StartupHealthCheck>();
builder.Services.AddHealthChecks()
    .AddCheck<StartupHealthCheck>("startup", tags: ["live"]);

// After configuration is complete
app.Lifetime.ApplicationStarted.Register(() =>
{
    var startupCheck = app.Services.GetRequiredService<StartupHealthCheck>();
    startupCheck.IsReady = true;
});

Dependency Timeout Configuration

Configuring Check Timeouts

// Program.cs
builder.Services.AddHealthChecks()
    .AddNpgSql(
        connectionString,
        name: "postgresql",
        timeout: TimeSpan.FromSeconds(5),
        tags: ["ready", "db"])
    .AddCheck<NatsHealthCheck>(
        "nats",
        failureStatus: HealthStatus.Degraded,
        timeout: TimeSpan.FromSeconds(3),
        tags: ["ready", "messaging"]);

Publishing Health Status

Health Check Publisher

// Infrastructure/HealthChecks/MetricsHealthCheckPublisher.cs
public sealed class MetricsHealthCheckPublisher : IHealthCheckPublisher
{
    private readonly IMetrics _metrics;
    
    private readonly Gauge _healthStatus;

    public MetricsHealthCheckPublisher(IMetrics metrics)
    {
        _metrics = metrics;
        
        var meter = new Meter("MyApp.Health");
        _healthStatus = meter.CreateGauge<double>("health_status");
    }

    public Task PublishAsync(HealthReport report, CancellationToken ct)
    {
        // Overall status
        _healthStatus.Record(
            report.Status == HealthStatus.Healthy ? 1 : 0,
            new KeyValuePair<string, object?>("check", "overall"));

        // Individual checks
        foreach (var entry in report.Entries)
        {
            var value = entry.Value.Status switch
            {
                HealthStatus.Healthy => 1,
                HealthStatus.Degraded => 0.5,
                HealthStatus.Unhealthy => 0,
                _ => 0
            };
            
            _healthStatus.Record(value,
                new KeyValuePair<string, object?>("check", entry.Key));
        }

        return Task.CompletedTask;
    }
}

// Program.cs
builder.Services.Configure<HealthCheckPublisherOptions>(options =>
{
    options.Delay = TimeSpan.FromSeconds(5);
    options.Period = TimeSpan.FromSeconds(30);
});

builder.Services.AddSingleton<IHealthCheckPublisher, MetricsHealthCheckPublisher>();

Conclusion

Health check configuration by environment:

CheckLivenessReadinessPurpose
SelfApp is running
PostgreSQLCan serve requests
NATSCan publish events
MinIOCan access storage
QdrantCan perform search

Health checks enable Kubernetes to maintain application availability by routing traffic only to healthy pods and restarting unresponsive ones.

Implementing health checks properly has been one of the highest-leverage improvements I have made to our platform’s reliability. Before this work, we had frequent incidents where pods would sit in a broken state, silently dropping requests because a downstream dependency was unreachable. Now, Kubernetes automatically removes unhealthy pods from the service mesh and restarts them only when truly necessary. The key insight was treating liveness and readiness as fundamentally different questions: liveness is about the process, readiness is about the system. Getting this distinction right eliminated our false-positive restart loops and made our deployments significantly more stable.

[Health checks in ASP.NET Core] — Microsoft , 2024-11-01 [Kubernetes Best Practices: Setting Resource Requests and Limits] — Google Cloud , 2024-04-01

Next Steps

  • Add structured logging to health check failures so you can correlate probe failures with specific dependency issues in your log aggregator.
  • Implement circuit breaker patterns alongside health checks to prevent cascading failures when a shared dependency goes down.
  • Set up Grafana dashboards using the health_status gauge metric published by the MetricsHealthCheckPublisher to track health trends over time.
  • Consider implementing warm-up health checks that validate cache priming and connection pool establishment before marking the pod as ready.

Further Reading

[ASP.NET Core Health Checks Documentation] — Microsoft , 2024 [Kubernetes Probes Deep Dive] — Kubernetes Authors , 2024 [AspNetCore.Diagnostics.HealthChecks] — GitHub Community , 2024