What are the best practices for API error handling and recovery?

API error handling and recovery are critical components of building robust, production-ready applications. Proper error handling ensures your application can gracefully handle failures, provide meaningful feedback to users, and automatically recover from temporary issues. This guide covers comprehensive strategies for implementing effective API error handling and recovery mechanisms.

Understanding API Error Types

Before implementing error handling strategies, it's essential to understand the different types of errors you'll encounter when working with APIs.

HTTP Status Code Categories

Client Errors (4xx) - 400 Bad Request: Invalid request syntax or parameters - 401 Unauthorized: Authentication required or failed - 403 Forbidden: Server understood request but refuses to authorize - 404 Not Found: Requested resource doesn't exist - 429 Too Many Requests: Rate limit exceeded

Server Errors (5xx) - 500 Internal Server Error: Generic server error - 502 Bad Gateway: Invalid response from upstream server - 503 Service Unavailable: Server temporarily unavailable - 504 Gateway Timeout: Upstream server timeout

Network-Level Errors

Connection timeouts
DNS resolution failures
Network connectivity issues
SSL/TLS certificate problems

Core Error Handling Strategies

1. Proper Error Classification

The first step in effective error handling is classifying errors appropriately to determine the correct response strategy.

import requests
from enum import Enum

class ErrorType(Enum):
    RETRIABLE = "retriable"
    NON_RETRIABLE = "non_retriable"
    AUTHENTICATION = "authentication"
    RATE_LIMITED = "rate_limited"

def classify_error(response, exception=None):
    """Classify API errors to determine appropriate handling strategy"""
    if exception:
        if isinstance(exception, requests.exceptions.Timeout):
            return ErrorType.RETRIABLE
        elif isinstance(exception, requests.exceptions.ConnectionError):
            return ErrorType.RETRIABLE
        else:
            return ErrorType.NON_RETRIABLE

    if response.status_code in [500, 502, 503, 504]:
        return ErrorType.RETRIABLE
    elif response.status_code == 429:
        return ErrorType.RATE_LIMITED
    elif response.status_code in [401, 403]:
        return ErrorType.AUTHENTICATION
    elif response.status_code >= 400:
        return ErrorType.NON_RETRIABLE

    return None

2. Implementing Retry Logic with Exponential Backoff

Exponential backoff is a strategy where retry delays increase exponentially with each attempt, reducing server load and improving success rates.

import time
import random
from typing import Optional

class RetryConfig:
    def __init__(self, max_attempts=3, base_delay=1, max_delay=60, backoff_factor=2):
        self.max_attempts = max_attempts
        self.base_delay = base_delay
        self.max_delay = max_delay
        self.backoff_factor = backoff_factor

def api_request_with_retry(url, config=None, **kwargs):
    """Make API request with exponential backoff retry logic"""
    if config is None:
        config = RetryConfig()

    last_exception = None

    for attempt in range(config.max_attempts):
        try:
            response = requests.get(url, **kwargs)
            error_type = classify_error(response)

            if error_type in [ErrorType.RETRIABLE, ErrorType.RATE_LIMITED]:
                if attempt == config.max_attempts - 1:
                    response.raise_for_status()

                # Calculate delay with jitter
                delay = min(
                    config.base_delay * (config.backoff_factor ** attempt),
                    config.max_delay
                )
                jitter = delay * 0.1 * random.random()
                time.sleep(delay + jitter)
                continue
            elif error_type == ErrorType.NON_RETRIABLE:
                response.raise_for_status()

            return response

        except requests.exceptions.RequestException as e:
            last_exception = e
            error_type = classify_error(None, e)

            if error_type == ErrorType.RETRIABLE and attempt < config.max_attempts - 1:
                delay = min(
                    config.base_delay * (config.backoff_factor ** attempt),
                    config.max_delay
                )
                time.sleep(delay)
                continue
            else:
                raise

    raise last_exception

3. Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures by temporarily stopping requests to a failing service.

class CircuitBreaker {
    constructor(threshold = 5, timeout = 60000, monitoringPeriod = 10000) {
        this.threshold = threshold;
        this.timeout = timeout;
        this.monitoringPeriod = monitoringPeriod;
        this.failureCount = 0;
        this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
        this.nextAttempt = Date.now();
        this.successCount = 0;
    }

    async call(apiFunction) {
        if (this.state === 'OPEN') {
            if (Date.now() < this.nextAttempt) {
                throw new Error('Circuit breaker is OPEN');
            }
            this.state = 'HALF_OPEN';
            this.successCount = 0;
        }

        try {
            const result = await apiFunction();
            this.onSuccess();
            return result;
        } catch (error) {
            this.onFailure();
            throw error;
        }
    }

    onSuccess() {
        this.failureCount = 0;
        if (this.state === 'HALF_OPEN') {
            this.successCount++;
            if (this.successCount >= 3) {
                this.state = 'CLOSED';
            }
        }
    }

    onFailure() {
        this.failureCount++;
        if (this.failureCount >= this.threshold) {
            this.state = 'OPEN';
            this.nextAttempt = Date.now() + this.timeout;
        }
    }
}

// Usage example
const circuitBreaker = new CircuitBreaker(5, 30000); // 5 failures, 30s timeout

async function makeAPICall() {
    try {
        return await circuitBreaker.call(async () => {
            const response = await fetch('/api/data');
            if (!response.ok) {
                throw new Error(`HTTP ${response.status}`);
            }
            return response.json();
        });
    } catch (error) {
        console.error('API call failed:', error.message);
        throw error;
    }
}

Advanced Error Handling Techniques

4. Rate Limit Handling

When dealing with rate-limited APIs, implement intelligent backoff strategies based on response headers.

def handle_rate_limit(response):
    """Handle rate limit responses intelligently"""
    if response.status_code == 429:
        # Check for Retry-After header
        retry_after = response.headers.get('Retry-After')
        if retry_after:
            try:
                delay = int(retry_after)
                time.sleep(delay)
                return True
            except ValueError:
                pass

        # Check for X-RateLimit headers
        reset_time = response.headers.get('X-RateLimit-Reset')
        if reset_time:
            try:
                reset_timestamp = int(reset_time)
                current_time = int(time.time())
                delay = max(0, reset_timestamp - current_time)
                time.sleep(delay + 1)  # Add 1 second buffer
                return True
            except ValueError:
                pass

        # Default backoff if no headers available
        time.sleep(60)
        return True

    return False

5. Timeout Configuration

Implement comprehensive timeout strategies to prevent hanging requests.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class TimeoutHTTPAdapter(HTTPAdapter):
    def __init__(self, timeout=None, *args, **kwargs):
        self.timeout = timeout
        super().__init__(*args, **kwargs)

    def send(self, request, **kwargs):
        timeout = kwargs.get('timeout')
        if timeout is None and self.timeout is not None:
            kwargs['timeout'] = self.timeout
        return super().send(request, **kwargs)

def create_robust_session():
    """Create a requests session with robust error handling"""
    session = requests.Session()

    # Configure retry strategy
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
    )

    # Configure timeout adapter
    adapter = TimeoutHTTPAdapter(timeout=(5, 30))  # Connect: 5s, Read: 30s
    adapter.max_retries = retry_strategy

    session.mount("http://", adapter)
    session.mount("https://", adapter)

    return session

6. Graceful Degradation

Implement fallback mechanisms when primary APIs fail.

class APIService {
    constructor() {
        this.primaryEndpoint = 'https://api.primary.com';
        this.fallbackEndpoint = 'https://api.fallback.com';
        this.cache = new Map();
    }

    async getData(id) {
        // Try cache first
        const cached = this.cache.get(id);
        if (cached && Date.now() - cached.timestamp < 300000) { // 5 min cache
            return cached.data;
        }

        try {
            // Try primary endpoint
            const data = await this.fetchFromEndpoint(this.primaryEndpoint, id);
            this.cache.set(id, { data, timestamp: Date.now() });
            return data;
        } catch (primaryError) {
            console.warn('Primary API failed:', primaryError.message);

            try {
                // Try fallback endpoint
                const data = await this.fetchFromEndpoint(this.fallbackEndpoint, id);
                this.cache.set(id, { data, timestamp: Date.now() });
                return data;
            } catch (fallbackError) {
                console.error('Fallback API failed:', fallbackError.message);

                // Return cached data if available
                if (cached) {
                    console.info('Returning stale cached data');
                    return cached.data;
                }

                throw new Error('All API endpoints failed and no cached data available');
            }
        }
    }

    async fetchFromEndpoint(endpoint, id) {
        const response = await fetch(`${endpoint}/data/${id}`);
        if (!response.ok) {
            throw new Error(`HTTP ${response.status}: ${response.statusText}`);
        }
        return response.json();
    }
}

Error Logging and Monitoring

7. Structured Error Logging

Implement comprehensive logging to track and analyze API errors.

import logging
import json
from datetime import datetime

class APIErrorLogger:
    def __init__(self):
        self.logger = logging.getLogger('api_errors')
        handler = logging.StreamHandler()
        formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)

    def log_error(self, url, method, status_code, error_type, attempt, max_attempts, 
                  response_time=None, error_message=None):
        """Log API errors with structured data"""
        error_data = {
            'timestamp': datetime.utcnow().isoformat(),
            'url': url,
            'method': method,
            'status_code': status_code,
            'error_type': error_type.value if error_type else None,
            'attempt': attempt,
            'max_attempts': max_attempts,
            'response_time': response_time,
            'error_message': error_message
        }

        self.logger.error(f"API Error: {json.dumps(error_data)}")

# Usage in retry function
logger = APIErrorLogger()

def api_request_with_logging(url, config=None, **kwargs):
    """API request with comprehensive error logging"""
    if config is None:
        config = RetryConfig()

    for attempt in range(config.max_attempts):
        start_time = time.time()
        try:
            response = requests.get(url, **kwargs)
            response_time = time.time() - start_time

            error_type = classify_error(response)
            if error_type:
                logger.log_error(
                    url, 'GET', response.status_code, error_type,
                    attempt + 1, config.max_attempts, response_time
                )

            if error_type in [ErrorType.RETRIABLE, ErrorType.RATE_LIMITED]:
                if attempt == config.max_attempts - 1:
                    response.raise_for_status()
                continue
            elif error_type == ErrorType.NON_RETRIABLE:
                response.raise_for_status()

            return response

        except requests.exceptions.RequestException as e:
            response_time = time.time() - start_time
            error_type = classify_error(None, e)

            logger.log_error(
                url, 'GET', None, error_type,
                attempt + 1, config.max_attempts, response_time, str(e)
            )

            if error_type == ErrorType.RETRIABLE and attempt < config.max_attempts - 1:
                continue
            else:
                raise

Testing Error Handling

8. Error Simulation for Testing

Create comprehensive tests for your error handling logic.

import pytest
from unittest.mock import Mock, patch
import requests

class TestAPIErrorHandling:
    def test_retry_on_server_error(self):
        """Test retry logic for server errors"""
        mock_response = Mock()
        mock_response.status_code = 500
        mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError()

        with patch('requests.get', return_value=mock_response) as mock_get:
            with pytest.raises(requests.exceptions.HTTPError):
                api_request_with_retry('http://test.com', RetryConfig(max_attempts=2))

            assert mock_get.call_count == 2

    def test_no_retry_on_client_error(self):
        """Test no retry for client errors"""
        mock_response = Mock()
        mock_response.status_code = 400
        mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError()

        with patch('requests.get', return_value=mock_response) as mock_get:
            with pytest.raises(requests.exceptions.HTTPError):
                api_request_with_retry('http://test.com')

            assert mock_get.call_count == 1

    def test_circuit_breaker_opens_after_failures(self):
        """Test circuit breaker opens after threshold failures"""
        async def failing_api():
            raise Exception("API Error")

        circuit_breaker = CircuitBreaker(threshold=2, timeout=1000)

        # First two calls should fail and increment failure count
        with pytest.raises(Exception):
            await circuit_breaker.call(failing_api)
        with pytest.raises(Exception):
            await circuit_breaker.call(failing_api)

        # Circuit should now be open
        assert circuit_breaker.state == 'OPEN'

        # Next call should fail immediately without calling the API
        with pytest.raises(Exception, match="Circuit breaker is OPEN"):
            await circuit_breaker.call(failing_api)

Best Practices Summary

Classify Errors Properly: Distinguish between retriable and non-retriable errors
Implement Exponential Backoff: Use increasing delays between retry attempts
Add Jitter: Randomize retry delays to prevent thundering herd problems
Use Circuit Breakers: Prevent cascading failures in distributed systems
Handle Rate Limits Intelligently: Respect API rate limit headers and implement appropriate backoff
Configure Appropriate Timeouts: Set reasonable connection and read timeouts
Implement Graceful Degradation: Provide fallback mechanisms and use cached data when possible
Log Errors Comprehensively: Capture structured error data for monitoring and debugging
Test Error Scenarios: Create comprehensive tests for all error handling paths
Monitor Error Rates: Set up alerts for unusual error patterns

When implementing these strategies, consider the specific requirements of your application and the APIs you're working with. Some APIs may have specific error handling requirements or provide additional headers that can inform your retry strategies. For web scraping applications that need to handle timeouts in Puppeteer, similar principles apply when dealing with browser automation errors and timeouts.

Proper API error handling and recovery are essential for building resilient applications that can handle the unpredictable nature of network communications and external service dependencies. By implementing these best practices, you'll create more robust applications that provide better user experiences and easier maintenance.

Table of contents

What are the best practices for API error handling and recovery?

Understanding API Error Types

HTTP Status Code Categories

Network-Level Errors

Core Error Handling Strategies

1. Proper Error Classification

2. Implementing Retry Logic with Exponential Backoff

3. Circuit Breaker Pattern

Advanced Error Handling Techniques

4. Rate Limit Handling

5. Timeout Configuration

6. Graceful Degradation

Error Logging and Monitoring

7. Structured Error Logging

Testing Error Handling

8. Error Simulation for Testing

Best Practices Summary

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

How do you handle API redirects and URL changes during scraping?

What is the role of API proxies in web scraping workflows?

How do you implement API request validation and sanitization?

Get Started Now

Support