Health Checks and Readiness Probes in .NET
Implement comprehensive health checks for .NET applications with database, cache, and dependency monitoring for Kubernetes deployments.
When I first wired up health checks for our .NET API running on Kubernetes, I made the classic mistake of putting all dependency checks into the liveness probe. The result was catastrophic: a brief Redis connection hiccup caused Kubernetes to restart every API pod simultaneously, turning a minor blip into a full outage. That incident taught me the critical distinction between liveness and readiness probes the hard way. Liveness should only answer “is the process alive?” while readiness answers “can this pod handle traffic right now?” Getting this separation right eliminated our health check flapping problem entirely and gave us stable, predictable behavior during transient dependency failures.
Introduction
In a containerized environment like Kubernetes, the orchestrator needs to know the exact state of your application to manage traffic routing and restarts. Health checks enable Kubernetes to determine when your application is ready to receive traffic (Readiness) and when it needs to be restarted (Liveness).
This guide covers implementing comprehensive health monitoring in .NET, ensuring your services are observable and resilient.
[Health checks in ASP.NET Core] — Microsoft , 2024-11-01 [Configure Liveness, Readiness and Startup Probes] — Kubernetes Authors , 2024-06-15What We’ll Build
- Liveness & Readiness Probes: Separate checks for startup, liveness, and readiness.
- Infrastructure Checks: Custom checks for PostgreSQL, Redis, NATS, and MinIO.
- Health Dashboard: A graphical UI to visualize the state of your cluster dependencies.
Architecture Overview
flowchart LR
Kubelet["☸️ Kubelet"] -->|Probes /health/live| API["🚀 API Pod"]
LB["⚖️ Load Balancer"] -.->|Training /health/ready| API
subgraph Checks["🏥 Health Checks"]
API --> DB[(SQL)]
API --> Redis[(Cache)]
API --> Msg[(NATS)]
end
classDef primary fill:#7c3aed,color:#fff
classDef secondary fill:#06b6d4,color:#fff
classDef db fill:#f43f5e,color:#fff
classDef warning fill:#fbbf24,color:#000
class API primary
class LB secondary
class DB,Redis,Msg db
class Kubelet warning
Implementation
Basic Setup
Service Registration
// Program.cs
builder.Services.AddHealthChecks()
.AddCheck("self", () => HealthCheckResult.Healthy(), tags: ["live"])
// Database
.AddNpgSql(
builder.Configuration.GetConnectionString("AppDb")!,
name: "postgresql",
tags: ["ready", "db"])
// Redis
.AddRedis(
builder.Configuration.GetConnectionString("Redis")!,
name: "redis",
tags: ["ready", "cache"])
// Custom checks
.AddCheck<NatsHealthCheck>("nats", tags: ["ready", "messaging"])
.AddCheck<MinioHealthCheck>("minio", tags: ["ready", "storage"])
.AddCheck<QdrantHealthCheck>("qdrant", tags: ["ready", "search"]);
Endpoint Mapping
// Program.cs
app.MapHealthChecks("/health/live", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("live"),
ResponseWriter = WriteMinimalResponse
});
app.MapHealthChecks("/health/ready", new HealthCheckOptions
{
Predicate = check => check.Tags.Contains("ready"),
ResponseWriter = WriteDetailedResponse
});
app.MapHealthChecks("/health", new HealthCheckOptions
{
ResponseWriter = WriteDetailedResponse
});
Custom Health Checks
[AspNetCore.Diagnostics.HealthChecks] — Xabaril , 2024-03-01NATS Health Check
// Infrastructure/HealthChecks/NatsHealthCheck.cs
public sealed class NatsHealthCheck : IHealthCheck
{
private readonly INatsConnection _connection;
private readonly ILogger<NatsHealthCheck> _logger;
public NatsHealthCheck(INatsConnection connection, ILogger<NatsHealthCheck> logger)
{
_connection = connection;
_logger = logger;
}
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken ct = default)
{
try
{
var connectionState = _connection.ConnectionState;
if (connectionState != NatsConnectionState.Open)
{
return HealthCheckResult.Unhealthy(
$"NATS connection state: {connectionState}");
}
// Verify with a ping
var rtt = await _connection.PingAsync(ct);
var data = new Dictionary<string, object>
{
["server"] = _connection.ServerInfo?.Name ?? "unknown",
["rtt_ms"] = rtt.TotalMilliseconds,
["jetstream"] = _connection.ServerInfo?.IsJetStreamEnabled ?? false
};
return HealthCheckResult.Healthy("NATS connection is healthy", data);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "NATS health check failed");
return HealthCheckResult.Unhealthy("NATS connection failed", ex);
}
}
}
MinIO Health Check
// Infrastructure/HealthChecks/MinioHealthCheck.cs
public sealed class MinioHealthCheck : IHealthCheck
{
private readonly IMinioClient _client;
private readonly ILogger<MinioHealthCheck> _logger;
public MinioHealthCheck(IMinioClient client, ILogger<MinioHealthCheck> logger)
{
_client = client;
_logger = logger;
}
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken ct = default)
{
try
{
// List buckets as a connectivity test
var buckets = await _client.ListBucketsAsync(ct);
var data = new Dictionary<string, object>
{
["bucket_count"] = buckets.Buckets.Count,
["endpoint"] = _client.Config.Endpoint
};
return HealthCheckResult.Healthy("MinIO is accessible", data);
}
catch (MinioException ex)
{
_logger.LogWarning(ex, "MinIO health check failed");
return HealthCheckResult.Unhealthy("MinIO connection failed", ex);
}
}
}
Qdrant Health Check
// Infrastructure/HealthChecks/QdrantHealthCheck.cs
public sealed class QdrantHealthCheck : IHealthCheck
{
private readonly QdrantClient _client;
private readonly ILogger<QdrantHealthCheck> _logger;
public QdrantHealthCheck(QdrantClient client, ILogger<QdrantHealthCheck> logger)
{
_client = client;
_logger = logger;
}
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken ct = default)
{
try
{
var healthInfo = await _client.HealthAsync(ct);
var data = new Dictionary<string, object>
{
["version"] = healthInfo.Version,
["status"] = "healthy"
};
return HealthCheckResult.Healthy("Qdrant is healthy", data);
}
catch (RpcException ex) when (ex.StatusCode == StatusCode.Unavailable)
{
_logger.LogWarning(ex, "Qdrant unavailable");
return HealthCheckResult.Unhealthy("Qdrant is unavailable", ex);
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Qdrant health check failed");
return HealthCheckResult.Degraded("Qdrant check failed", ex);
}
}
}
Response Writers
Detailed Response Writer
// Infrastructure/HealthChecks/HealthCheckResponseWriter.cs
public static class HealthCheckResponseWriter
{
public static Task WriteDetailedResponse(HttpContext context, HealthReport report)
{
context.Response.ContentType = "application/json";
var response = new
{
status = report.Status.ToString(),
duration = report.TotalDuration.TotalMilliseconds,
checks = report.Entries.Select(e => new
{
name = e.Key,
status = e.Value.Status.ToString(),
duration = e.Value.Duration.TotalMilliseconds,
description = e.Value.Description,
data = e.Value.Data,
exception = e.Value.Exception?.Message
}),
timestamp = DateTime.UtcNow
};
return context.Response.WriteAsJsonAsync(response);
}
public static Task WriteMinimalResponse(HttpContext context, HealthReport report)
{
context.Response.ContentType = "text/plain";
return context.Response.WriteAsync(report.Status.ToString());
}
}
Sample Response
{
"status": "Healthy",
"duration": 245.32,
"checks": [
{
"name": "postgresql",
"status": "Healthy",
"duration": 12.5,
"description": null,
"data": {}
},
{
"name": "nats",
"status": "Healthy",
"duration": 5.2,
"description": "NATS connection is healthy",
"data": {
"server": "nats-0",
"rtt_ms": 1.23,
"jetstream": true
}
},
{
"name": "minio",
"status": "Healthy",
"duration": 45.8,
"description": "MinIO is accessible",
"data": {
"bucket_count": 12,
"endpoint": "minio.data-layer.svc.cluster.local:9000"
}
}
],
"timestamp": "2026-03-14T10:30:00Z"
}
Kubernetes Probes
[Pod Lifecycle] — Kubernetes Authors , 2024-05-20Deployment Configuration
# apps/myapp-api/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-api
spec:
template:
spec:
containers:
- name: api
image: myapp-api:latest
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
Health Check UI
Dashboard Configuration
// Program.cs
builder.Services.AddHealthChecksUI(setup =>
{
setup.SetEvaluationTimeInSeconds(30);
setup.MaximumHistoryEntriesPerEndpoint(50);
setup.AddHealthCheckEndpoint("API", "/health");
setup.AddHealthCheckEndpoint("Workers", "http://myapp-workers:8080/health");
})
.AddInMemoryStorage();
// Map the UI
app.MapHealthChecksUI(options =>
{
options.UIPath = "/health-ui";
options.ApiPath = "/health-api";
});
Startup Health Checks
Deferred Initialization Check
// Infrastructure/HealthChecks/StartupHealthCheck.cs
public sealed class StartupHealthCheck : IHealthCheck
{
private volatile bool _isReady;
public bool IsReady
{
get => _isReady;
set => _isReady = value;
}
public Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken ct = default)
{
return Task.FromResult(_isReady
? HealthCheckResult.Healthy("Application started")
: HealthCheckResult.Unhealthy("Application is starting"));
}
}
// Program.cs
builder.Services.AddSingleton<StartupHealthCheck>();
builder.Services.AddHealthChecks()
.AddCheck<StartupHealthCheck>("startup", tags: ["live"]);
// After configuration is complete
app.Lifetime.ApplicationStarted.Register(() =>
{
var startupCheck = app.Services.GetRequiredService<StartupHealthCheck>();
startupCheck.IsReady = true;
});
Dependency Timeout Configuration
Configuring Check Timeouts
// Program.cs
builder.Services.AddHealthChecks()
.AddNpgSql(
connectionString,
name: "postgresql",
timeout: TimeSpan.FromSeconds(5),
tags: ["ready", "db"])
.AddCheck<NatsHealthCheck>(
"nats",
failureStatus: HealthStatus.Degraded,
timeout: TimeSpan.FromSeconds(3),
tags: ["ready", "messaging"]);
Publishing Health Status
Health Check Publisher
// Infrastructure/HealthChecks/MetricsHealthCheckPublisher.cs
public sealed class MetricsHealthCheckPublisher : IHealthCheckPublisher
{
private readonly IMetrics _metrics;
private readonly Gauge _healthStatus;
public MetricsHealthCheckPublisher(IMetrics metrics)
{
_metrics = metrics;
var meter = new Meter("MyApp.Health");
_healthStatus = meter.CreateGauge<double>("health_status");
}
public Task PublishAsync(HealthReport report, CancellationToken ct)
{
// Overall status
_healthStatus.Record(
report.Status == HealthStatus.Healthy ? 1 : 0,
new KeyValuePair<string, object?>("check", "overall"));
// Individual checks
foreach (var entry in report.Entries)
{
var value = entry.Value.Status switch
{
HealthStatus.Healthy => 1,
HealthStatus.Degraded => 0.5,
HealthStatus.Unhealthy => 0,
_ => 0
};
_healthStatus.Record(value,
new KeyValuePair<string, object?>("check", entry.Key));
}
return Task.CompletedTask;
}
}
// Program.cs
builder.Services.Configure<HealthCheckPublisherOptions>(options =>
{
options.Delay = TimeSpan.FromSeconds(5);
options.Period = TimeSpan.FromSeconds(30);
});
builder.Services.AddSingleton<IHealthCheckPublisher, MetricsHealthCheckPublisher>();
Conclusion
Health check configuration by environment:
| Check | Liveness | Readiness | Purpose |
|---|---|---|---|
| Self | ✓ | App is running | |
| PostgreSQL | ✓ | Can serve requests | |
| NATS | ✓ | Can publish events | |
| MinIO | ✓ | Can access storage | |
| Qdrant | ✓ | Can perform search |
Health checks enable Kubernetes to maintain application availability by routing traffic only to healthy pods and restarting unresponsive ones.
Implementing health checks properly has been one of the highest-leverage improvements I have made to our platform’s reliability. Before this work, we had frequent incidents where pods would sit in a broken state, silently dropping requests because a downstream dependency was unreachable. Now, Kubernetes automatically removes unhealthy pods from the service mesh and restarts them only when truly necessary. The key insight was treating liveness and readiness as fundamentally different questions: liveness is about the process, readiness is about the system. Getting this distinction right eliminated our false-positive restart loops and made our deployments significantly more stable.
[Health checks in ASP.NET Core] — Microsoft , 2024-11-01 [Kubernetes Best Practices: Setting Resource Requests and Limits] — Google Cloud , 2024-04-01Next Steps
- Add structured logging to health check failures so you can correlate probe failures with specific dependency issues in your log aggregator.
- Implement circuit breaker patterns alongside health checks to prevent cascading failures when a shared dependency goes down.
- Set up Grafana dashboards using the
health_statusgauge metric published by theMetricsHealthCheckPublisherto track health trends over time. - Consider implementing warm-up health checks that validate cache priming and connection pool establishment before marking the pod as ready.