How do I use Curl to perform health checks on web services?

Health checks are essential for monitoring web service availability and performance. Curl provides powerful options for automated health monitoring, allowing you to verify service status, response times, and content validation. This guide covers comprehensive techniques for implementing robust health checks using Curl.

Basic Health Check Commands

Simple Status Check

The most basic health check verifies that a service responds with a successful HTTP status code:

# Basic health check
curl -f -s -o /dev/null -w "%{http_code}" https://api.example.com/health

# With connection timeout
curl -f -s -o /dev/null -w "%{http_code}" --connect-timeout 5 https://api.example.com/health

The -f flag makes curl fail silently on HTTP errors, -s suppresses progress output, -o /dev/null discards response body, and -w "%{http_code}" outputs only the HTTP status code.

Comprehensive Health Check

For more detailed monitoring, include response time and connection details:

curl -w "HTTP Status: %{http_code}\nResponse Time: %{time_total}s\nDNS Lookup: %{time_namelookup}s\nConnect Time: %{time_connect}s\n" \
     -o /dev/null -s --connect-timeout 10 --max-time 30 \
     https://api.example.com/health

Advanced Health Check Techniques

JSON Response Validation

Many health endpoints return JSON with status information. Use jq to parse and validate responses:

# Check if service status is "healthy"
response=$(curl -s --connect-timeout 5 --max-time 10 https://api.example.com/health)
status=$(echo "$response" | jq -r '.status')

if [ "$status" = "healthy" ]; then
    echo "Service is healthy"
    exit 0
else
    echo "Service is unhealthy: $status"
    exit 1
fi

Multiple Endpoint Monitoring

Check multiple services or endpoints simultaneously:

#!/bin/bash
endpoints=(
    "https://api.example.com/health"
    "https://database.example.com/status"
    "https://cache.example.com/ping"
)

for endpoint in "${endpoints[@]}"; do
    http_code=$(curl -f -s -o /dev/null -w "%{http_code}" --connect-timeout 5 "$endpoint")
    if [ "$http_code" -eq 200 ]; then
        echo "✓ $endpoint - OK"
    else
        echo "✗ $endpoint - FAILED (HTTP $http_code)"
    fi
done

Timeout and Error Handling

Setting Appropriate Timeouts

Configure timeouts to prevent hanging health checks:

# Conservative timeouts for production monitoring
curl --connect-timeout 5 \     # 5 seconds for connection
     --max-time 15 \           # 15 seconds total timeout
     --retry 3 \               # Retry 3 times on failure
     --retry-delay 2 \         # 2 seconds between retries
     -f -s -o /dev/null -w "%{http_code}" \
     https://api.example.com/health

Error Code Interpretation

Handle different types of failures appropriately:

#!/bin/bash
health_check() {
    local url=$1
    local http_code

    http_code=$(curl -f -s -o /dev/null -w "%{http_code}" \
                     --connect-timeout 5 --max-time 10 "$url" 2>/dev/null)

    case $http_code in
        200)
            echo "Service healthy"
            return 0
            ;;
        404|502|503|504)
            echo "Service unavailable (HTTP $http_code)"
            return 1
            ;;
        000)
            echo "Connection failed"
            return 2
            ;;
        *)
            echo "Unexpected response (HTTP $http_code)"
            return 3
            ;;
    esac
}

Authentication and Headers

API Key Authentication

Include authentication headers for protected health endpoints:

# API key authentication
curl -H "Authorization: Bearer YOUR_API_TOKEN" \
     -H "Content-Type: application/json" \
     -f -s -o /dev/null -w "%{http_code}" \
     https://api.example.com/health

# Basic authentication
curl -u username:password \
     -f -s -o /dev/null -w "%{http_code}" \
     https://api.example.com/health

Custom User Agent

Set a custom user agent to identify health check requests:

curl -H "User-Agent: HealthChecker/1.0" \
     -f -s -o /dev/null -w "%{http_code}" \
     https://api.example.com/health

Monitoring Scripts and Automation

Comprehensive Health Check Script

Create a reusable script for production monitoring:

#!/bin/bash
# health_check.sh

SERVICE_URL="${1:-https://api.example.com/health}"
TIMEOUT="${2:-10}"
EXPECTED_STATUS="${3:-200}"

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

health_check() {
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')

    # Perform health check with detailed timing
    local curl_output
    curl_output=$(curl -w "http_code:%{http_code};time_total:%{time_total};time_connect:%{time_connect}" \
                       -s -o /dev/null --connect-timeout 5 --max-time "$TIMEOUT" "$SERVICE_URL" 2>&1)

    local exit_code=$?

    if [ $exit_code -eq 0 ]; then
        local http_code=$(echo "$curl_output" | grep -o 'http_code:[0-9]*' | cut -d: -f2)
        local time_total=$(echo "$curl_output" | grep -o 'time_total:[0-9.]*' | cut -d: -f2)
        local time_connect=$(echo "$curl_output" | grep -o 'time_connect:[0-9.]*' | cut -d: -f2)

        if [ "$http_code" -eq "$EXPECTED_STATUS" ]; then
            echo -e "${GREEN}[$timestamp] ✓ Service healthy${NC} - HTTP $http_code (${time_total}s)"
            return 0
        else
            echo -e "${YELLOW}[$timestamp] ⚠ Unexpected status${NC} - HTTP $http_code (${time_total}s)"
            return 1
        fi
    else
        echo -e "${RED}[$timestamp] ✗ Service unreachable${NC} - Connection failed"
        return 2
    fi
}

# Run health check
health_check
exit $?

Continuous Monitoring Loop

For ongoing monitoring, create a loop with configurable intervals:

#!/bin/bash
# continuous_monitor.sh

INTERVAL="${1:-60}"  # Check every 60 seconds
SERVICE_URL="${2:-https://api.example.com/health}"

echo "Starting continuous health monitoring (interval: ${INTERVAL}s)"
echo "Monitoring: $SERVICE_URL"
echo "Press Ctrl+C to stop"
echo

while true; do
    ./health_check.sh "$SERVICE_URL"
    sleep "$INTERVAL"
done

Integration with Monitoring Systems

Nagios Integration

Create a Nagios-compatible health check plugin:

#!/bin/bash
# nagios_health_check.sh

SERVICE_URL="$1"
WARNING_TIME="$2"
CRITICAL_TIME="$3"

if [ -z "$SERVICE_URL" ]; then
    echo "UNKNOWN - Missing service URL parameter"
    exit 3
fi

# Perform health check
curl_output=$(curl -w "%{http_code};%{time_total}" -s -o /dev/null \
                   --connect-timeout 5 --max-time 15 "$SERVICE_URL" 2>/dev/null)

if [ $? -ne 0 ]; then
    echo "CRITICAL - Service unreachable"
    exit 2
fi

http_code=$(echo "$curl_output" | cut -d';' -f1)
response_time=$(echo "$curl_output" | cut -d';' -f2)

if [ "$http_code" -ne 200 ]; then
    echo "CRITICAL - HTTP $http_code"
    exit 2
fi

# Check response time thresholds
if (( $(echo "$response_time > $CRITICAL_TIME" | bc -l) )); then
    echo "CRITICAL - Response time ${response_time}s exceeds ${CRITICAL_TIME}s"
    exit 2
elif (( $(echo "$response_time > $WARNING_TIME" | bc -l) )); then
    echo "WARNING - Response time ${response_time}s exceeds ${WARNING_TIME}s"
    exit 1
else
    echo "OK - Service responding in ${response_time}s"
    exit 0
fi

Prometheus Metrics

Export health check metrics for Prometheus monitoring:

#!/bin/bash
# prometheus_health_exporter.sh

METRICS_FILE="/var/lib/prometheus/node-exporter/health_check.prom"

perform_check() {
    local service_name="$1"
    local service_url="$2"

    local start_time=$(date +%s.%N)
    local http_code=$(curl -f -s -o /dev/null -w "%{http_code}" \
                           --connect-timeout 5 --max-time 10 "$service_url" 2>/dev/null)
    local end_time=$(date +%s.%N)
    local response_time=$(echo "$end_time - $start_time" | bc)

    local status=0
    if [ "$http_code" -eq 200 ]; then
        status=1
    fi

    # Write metrics
    echo "health_check_status{service=\"$service_name\"} $status" >> "$METRICS_FILE.tmp"
    echo "health_check_response_time{service=\"$service_name\"} $response_time" >> "$METRICS_FILE.tmp"
    echo "health_check_http_code{service=\"$service_name\"} $http_code" >> "$METRICS_FILE.tmp"
}

# Clear previous metrics
> "$METRICS_FILE.tmp"

# Check multiple services
perform_check "api" "https://api.example.com/health"
perform_check "database" "https://db.example.com/status"
perform_check "cache" "https://cache.example.com/ping"

# Atomically update metrics file
mv "$METRICS_FILE.tmp" "$METRICS_FILE"

Best Practices and Considerations

Performance Optimization

Use appropriate timeouts to prevent hanging checks
Implement exponential backoff for retry logic
Cache DNS lookups when checking multiple endpoints
Use connection pooling for frequent checks

Security Considerations

When implementing health checks, consider:

Use HTTPS for sensitive health endpoints
Implement proper authentication for internal services
Avoid exposing sensitive information in health responses
Use dedicated health check endpoints separate from main APIs

Alerting Strategy

Design effective alerting based on health check results:

# Example alerting logic
consecutive_failures=0
max_failures=3

while true; do
    if ! ./health_check.sh; then
        ((consecutive_failures++))
        if [ $consecutive_failures -ge $max_failures ]; then
            # Send alert (email, Slack, PagerDuty, etc.)
            echo "ALERT: Service down for $consecutive_failures consecutive checks"
        fi
    else
        consecutive_failures=0
    fi
    sleep 60
done

Conclusion

Curl provides a robust foundation for implementing comprehensive health checks. By combining proper timeout configuration, response validation, and integration with monitoring systems, you can create reliable service monitoring solutions. Remember to implement appropriate retry logic, handle different failure scenarios, and design alerting strategies that minimize false positives while ensuring rapid incident response.

For more advanced monitoring scenarios involving JavaScript-heavy applications, consider complementing Curl-based checks with browser automation tools for comprehensive application monitoring.

Table of contents