Table of contents

What are the best practices for API error handling and recovery?

API error handling and recovery are critical components of building robust, production-ready applications. Proper error handling ensures your application can gracefully handle failures, provide meaningful feedback to users, and automatically recover from temporary issues. This guide covers comprehensive strategies for implementing effective API error handling and recovery mechanisms.

Understanding API Error Types

Before implementing error handling strategies, it's essential to understand the different types of errors you'll encounter when working with APIs.

HTTP Status Code Categories

Client Errors (4xx) - 400 Bad Request: Invalid request syntax or parameters - 401 Unauthorized: Authentication required or failed - 403 Forbidden: Server understood request but refuses to authorize - 404 Not Found: Requested resource doesn't exist - 429 Too Many Requests: Rate limit exceeded

Server Errors (5xx) - 500 Internal Server Error: Generic server error - 502 Bad Gateway: Invalid response from upstream server - 503 Service Unavailable: Server temporarily unavailable - 504 Gateway Timeout: Upstream server timeout

Network-Level Errors

  • Connection timeouts
  • DNS resolution failures
  • Network connectivity issues
  • SSL/TLS certificate problems

Core Error Handling Strategies

1. Proper Error Classification

The first step in effective error handling is classifying errors appropriately to determine the correct response strategy.

import requests
from enum import Enum

class ErrorType(Enum):
    RETRIABLE = "retriable"
    NON_RETRIABLE = "non_retriable"
    AUTHENTICATION = "authentication"
    RATE_LIMITED = "rate_limited"

def classify_error(response, exception=None):
    """Classify API errors to determine appropriate handling strategy"""
    if exception:
        if isinstance(exception, requests.exceptions.Timeout):
            return ErrorType.RETRIABLE
        elif isinstance(exception, requests.exceptions.ConnectionError):
            return ErrorType.RETRIABLE
        else:
            return ErrorType.NON_RETRIABLE

    if response.status_code in [500, 502, 503, 504]:
        return ErrorType.RETRIABLE
    elif response.status_code == 429:
        return ErrorType.RATE_LIMITED
    elif response.status_code in [401, 403]:
        return ErrorType.AUTHENTICATION
    elif response.status_code >= 400:
        return ErrorType.NON_RETRIABLE

    return None

2. Implementing Retry Logic with Exponential Backoff

Exponential backoff is a strategy where retry delays increase exponentially with each attempt, reducing server load and improving success rates.

import time
import random
from typing import Optional

class RetryConfig:
    def __init__(self, max_attempts=3, base_delay=1, max_delay=60, backoff_factor=2):
        self.max_attempts = max_attempts
        self.base_delay = base_delay
        self.max_delay = max_delay
        self.backoff_factor = backoff_factor

def api_request_with_retry(url, config=None, **kwargs):
    """Make API request with exponential backoff retry logic"""
    if config is None:
        config = RetryConfig()

    last_exception = None

    for attempt in range(config.max_attempts):
        try:
            response = requests.get(url, **kwargs)
            error_type = classify_error(response)

            if error_type in [ErrorType.RETRIABLE, ErrorType.RATE_LIMITED]:
                if attempt == config.max_attempts - 1:
                    response.raise_for_status()

                # Calculate delay with jitter
                delay = min(
                    config.base_delay * (config.backoff_factor ** attempt),
                    config.max_delay
                )
                jitter = delay * 0.1 * random.random()
                time.sleep(delay + jitter)
                continue
            elif error_type == ErrorType.NON_RETRIABLE:
                response.raise_for_status()

            return response

        except requests.exceptions.RequestException as e:
            last_exception = e
            error_type = classify_error(None, e)

            if error_type == ErrorType.RETRIABLE and attempt < config.max_attempts - 1:
                delay = min(
                    config.base_delay * (config.backoff_factor ** attempt),
                    config.max_delay
                )
                time.sleep(delay)
                continue
            else:
                raise

    raise last_exception

3. Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures by temporarily stopping requests to a failing service.

class CircuitBreaker {
    constructor(threshold = 5, timeout = 60000, monitoringPeriod = 10000) {
        this.threshold = threshold;
        this.timeout = timeout;
        this.monitoringPeriod = monitoringPeriod;
        this.failureCount = 0;
        this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
        this.nextAttempt = Date.now();
        this.successCount = 0;
    }

    async call(apiFunction) {
        if (this.state === 'OPEN') {
            if (Date.now() < this.nextAttempt) {
                throw new Error('Circuit breaker is OPEN');
            }
            this.state = 'HALF_OPEN';
            this.successCount = 0;
        }

        try {
            const result = await apiFunction();
            this.onSuccess();
            return result;
        } catch (error) {
            this.onFailure();
            throw error;
        }
    }

    onSuccess() {
        this.failureCount = 0;
        if (this.state === 'HALF_OPEN') {
            this.successCount++;
            if (this.successCount >= 3) {
                this.state = 'CLOSED';
            }
        }
    }

    onFailure() {
        this.failureCount++;
        if (this.failureCount >= this.threshold) {
            this.state = 'OPEN';
            this.nextAttempt = Date.now() + this.timeout;
        }
    }
}

// Usage example
const circuitBreaker = new CircuitBreaker(5, 30000); // 5 failures, 30s timeout

async function makeAPICall() {
    try {
        return await circuitBreaker.call(async () => {
            const response = await fetch('/api/data');
            if (!response.ok) {
                throw new Error(`HTTP ${response.status}`);
            }
            return response.json();
        });
    } catch (error) {
        console.error('API call failed:', error.message);
        throw error;
    }
}

Advanced Error Handling Techniques

4. Rate Limit Handling

When dealing with rate-limited APIs, implement intelligent backoff strategies based on response headers.

def handle_rate_limit(response):
    """Handle rate limit responses intelligently"""
    if response.status_code == 429:
        # Check for Retry-After header
        retry_after = response.headers.get('Retry-After')
        if retry_after:
            try:
                delay = int(retry_after)
                time.sleep(delay)
                return True
            except ValueError:
                pass

        # Check for X-RateLimit headers
        reset_time = response.headers.get('X-RateLimit-Reset')
        if reset_time:
            try:
                reset_timestamp = int(reset_time)
                current_time = int(time.time())
                delay = max(0, reset_timestamp - current_time)
                time.sleep(delay + 1)  # Add 1 second buffer
                return True
            except ValueError:
                pass

        # Default backoff if no headers available
        time.sleep(60)
        return True

    return False

5. Timeout Configuration

Implement comprehensive timeout strategies to prevent hanging requests.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class TimeoutHTTPAdapter(HTTPAdapter):
    def __init__(self, timeout=None, *args, **kwargs):
        self.timeout = timeout
        super().__init__(*args, **kwargs)

    def send(self, request, **kwargs):
        timeout = kwargs.get('timeout')
        if timeout is None and self.timeout is not None:
            kwargs['timeout'] = self.timeout
        return super().send(request, **kwargs)

def create_robust_session():
    """Create a requests session with robust error handling"""
    session = requests.Session()

    # Configure retry strategy
    retry_strategy = Retry(
        total=3,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
    )

    # Configure timeout adapter
    adapter = TimeoutHTTPAdapter(timeout=(5, 30))  # Connect: 5s, Read: 30s
    adapter.max_retries = retry_strategy

    session.mount("http://", adapter)
    session.mount("https://", adapter)

    return session

6. Graceful Degradation

Implement fallback mechanisms when primary APIs fail.

class APIService {
    constructor() {
        this.primaryEndpoint = 'https://api.primary.com';
        this.fallbackEndpoint = 'https://api.fallback.com';
        this.cache = new Map();
    }

    async getData(id) {
        // Try cache first
        const cached = this.cache.get(id);
        if (cached && Date.now() - cached.timestamp < 300000) { // 5 min cache
            return cached.data;
        }

        try {
            // Try primary endpoint
            const data = await this.fetchFromEndpoint(this.primaryEndpoint, id);
            this.cache.set(id, { data, timestamp: Date.now() });
            return data;
        } catch (primaryError) {
            console.warn('Primary API failed:', primaryError.message);

            try {
                // Try fallback endpoint
                const data = await this.fetchFromEndpoint(this.fallbackEndpoint, id);
                this.cache.set(id, { data, timestamp: Date.now() });
                return data;
            } catch (fallbackError) {
                console.error('Fallback API failed:', fallbackError.message);

                // Return cached data if available
                if (cached) {
                    console.info('Returning stale cached data');
                    return cached.data;
                }

                throw new Error('All API endpoints failed and no cached data available');
            }
        }
    }

    async fetchFromEndpoint(endpoint, id) {
        const response = await fetch(`${endpoint}/data/${id}`);
        if (!response.ok) {
            throw new Error(`HTTP ${response.status}: ${response.statusText}`);
        }
        return response.json();
    }
}

Error Logging and Monitoring

7. Structured Error Logging

Implement comprehensive logging to track and analyze API errors.

import logging
import json
from datetime import datetime

class APIErrorLogger:
    def __init__(self):
        self.logger = logging.getLogger('api_errors')
        handler = logging.StreamHandler()
        formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)
        self.logger.addHandler(handler)
        self.logger.setLevel(logging.INFO)

    def log_error(self, url, method, status_code, error_type, attempt, max_attempts, 
                  response_time=None, error_message=None):
        """Log API errors with structured data"""
        error_data = {
            'timestamp': datetime.utcnow().isoformat(),
            'url': url,
            'method': method,
            'status_code': status_code,
            'error_type': error_type.value if error_type else None,
            'attempt': attempt,
            'max_attempts': max_attempts,
            'response_time': response_time,
            'error_message': error_message
        }

        self.logger.error(f"API Error: {json.dumps(error_data)}")

# Usage in retry function
logger = APIErrorLogger()

def api_request_with_logging(url, config=None, **kwargs):
    """API request with comprehensive error logging"""
    if config is None:
        config = RetryConfig()

    for attempt in range(config.max_attempts):
        start_time = time.time()
        try:
            response = requests.get(url, **kwargs)
            response_time = time.time() - start_time

            error_type = classify_error(response)
            if error_type:
                logger.log_error(
                    url, 'GET', response.status_code, error_type,
                    attempt + 1, config.max_attempts, response_time
                )

            if error_type in [ErrorType.RETRIABLE, ErrorType.RATE_LIMITED]:
                if attempt == config.max_attempts - 1:
                    response.raise_for_status()
                continue
            elif error_type == ErrorType.NON_RETRIABLE:
                response.raise_for_status()

            return response

        except requests.exceptions.RequestException as e:
            response_time = time.time() - start_time
            error_type = classify_error(None, e)

            logger.log_error(
                url, 'GET', None, error_type,
                attempt + 1, config.max_attempts, response_time, str(e)
            )

            if error_type == ErrorType.RETRIABLE and attempt < config.max_attempts - 1:
                continue
            else:
                raise

Testing Error Handling

8. Error Simulation for Testing

Create comprehensive tests for your error handling logic.

import pytest
from unittest.mock import Mock, patch
import requests

class TestAPIErrorHandling:
    def test_retry_on_server_error(self):
        """Test retry logic for server errors"""
        mock_response = Mock()
        mock_response.status_code = 500
        mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError()

        with patch('requests.get', return_value=mock_response) as mock_get:
            with pytest.raises(requests.exceptions.HTTPError):
                api_request_with_retry('http://test.com', RetryConfig(max_attempts=2))

            assert mock_get.call_count == 2

    def test_no_retry_on_client_error(self):
        """Test no retry for client errors"""
        mock_response = Mock()
        mock_response.status_code = 400
        mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError()

        with patch('requests.get', return_value=mock_response) as mock_get:
            with pytest.raises(requests.exceptions.HTTPError):
                api_request_with_retry('http://test.com')

            assert mock_get.call_count == 1

    def test_circuit_breaker_opens_after_failures(self):
        """Test circuit breaker opens after threshold failures"""
        async def failing_api():
            raise Exception("API Error")

        circuit_breaker = CircuitBreaker(threshold=2, timeout=1000)

        # First two calls should fail and increment failure count
        with pytest.raises(Exception):
            await circuit_breaker.call(failing_api)
        with pytest.raises(Exception):
            await circuit_breaker.call(failing_api)

        # Circuit should now be open
        assert circuit_breaker.state == 'OPEN'

        # Next call should fail immediately without calling the API
        with pytest.raises(Exception, match="Circuit breaker is OPEN"):
            await circuit_breaker.call(failing_api)

Best Practices Summary

  1. Classify Errors Properly: Distinguish between retriable and non-retriable errors
  2. Implement Exponential Backoff: Use increasing delays between retry attempts
  3. Add Jitter: Randomize retry delays to prevent thundering herd problems
  4. Use Circuit Breakers: Prevent cascading failures in distributed systems
  5. Handle Rate Limits Intelligently: Respect API rate limit headers and implement appropriate backoff
  6. Configure Appropriate Timeouts: Set reasonable connection and read timeouts
  7. Implement Graceful Degradation: Provide fallback mechanisms and use cached data when possible
  8. Log Errors Comprehensively: Capture structured error data for monitoring and debugging
  9. Test Error Scenarios: Create comprehensive tests for all error handling paths
  10. Monitor Error Rates: Set up alerts for unusual error patterns

When implementing these strategies, consider the specific requirements of your application and the APIs you're working with. Some APIs may have specific error handling requirements or provide additional headers that can inform your retry strategies. For web scraping applications that need to handle timeouts in Puppeteer, similar principles apply when dealing with browser automation errors and timeouts.

Proper API error handling and recovery are essential for building resilient applications that can handle the unpredictable nature of network communications and external service dependencies. By implementing these best practices, you'll create more robust applications that provide better user experiences and easier maintenance.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon