What are the best practices for API error handling and recovery?
API error handling and recovery are critical components of building robust, production-ready applications. Proper error handling ensures your application can gracefully handle failures, provide meaningful feedback to users, and automatically recover from temporary issues. This guide covers comprehensive strategies for implementing effective API error handling and recovery mechanisms.
Understanding API Error Types
Before implementing error handling strategies, it's essential to understand the different types of errors you'll encounter when working with APIs.
HTTP Status Code Categories
Client Errors (4xx)
- 400 Bad Request
: Invalid request syntax or parameters
- 401 Unauthorized
: Authentication required or failed
- 403 Forbidden
: Server understood request but refuses to authorize
- 404 Not Found
: Requested resource doesn't exist
- 429 Too Many Requests
: Rate limit exceeded
Server Errors (5xx)
- 500 Internal Server Error
: Generic server error
- 502 Bad Gateway
: Invalid response from upstream server
- 503 Service Unavailable
: Server temporarily unavailable
- 504 Gateway Timeout
: Upstream server timeout
Network-Level Errors
- Connection timeouts
- DNS resolution failures
- Network connectivity issues
- SSL/TLS certificate problems
Core Error Handling Strategies
1. Proper Error Classification
The first step in effective error handling is classifying errors appropriately to determine the correct response strategy.
import requests
from enum import Enum
class ErrorType(Enum):
RETRIABLE = "retriable"
NON_RETRIABLE = "non_retriable"
AUTHENTICATION = "authentication"
RATE_LIMITED = "rate_limited"
def classify_error(response, exception=None):
"""Classify API errors to determine appropriate handling strategy"""
if exception:
if isinstance(exception, requests.exceptions.Timeout):
return ErrorType.RETRIABLE
elif isinstance(exception, requests.exceptions.ConnectionError):
return ErrorType.RETRIABLE
else:
return ErrorType.NON_RETRIABLE
if response.status_code in [500, 502, 503, 504]:
return ErrorType.RETRIABLE
elif response.status_code == 429:
return ErrorType.RATE_LIMITED
elif response.status_code in [401, 403]:
return ErrorType.AUTHENTICATION
elif response.status_code >= 400:
return ErrorType.NON_RETRIABLE
return None
2. Implementing Retry Logic with Exponential Backoff
Exponential backoff is a strategy where retry delays increase exponentially with each attempt, reducing server load and improving success rates.
import time
import random
from typing import Optional
class RetryConfig:
def __init__(self, max_attempts=3, base_delay=1, max_delay=60, backoff_factor=2):
self.max_attempts = max_attempts
self.base_delay = base_delay
self.max_delay = max_delay
self.backoff_factor = backoff_factor
def api_request_with_retry(url, config=None, **kwargs):
"""Make API request with exponential backoff retry logic"""
if config is None:
config = RetryConfig()
last_exception = None
for attempt in range(config.max_attempts):
try:
response = requests.get(url, **kwargs)
error_type = classify_error(response)
if error_type in [ErrorType.RETRIABLE, ErrorType.RATE_LIMITED]:
if attempt == config.max_attempts - 1:
response.raise_for_status()
# Calculate delay with jitter
delay = min(
config.base_delay * (config.backoff_factor ** attempt),
config.max_delay
)
jitter = delay * 0.1 * random.random()
time.sleep(delay + jitter)
continue
elif error_type == ErrorType.NON_RETRIABLE:
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
last_exception = e
error_type = classify_error(None, e)
if error_type == ErrorType.RETRIABLE and attempt < config.max_attempts - 1:
delay = min(
config.base_delay * (config.backoff_factor ** attempt),
config.max_delay
)
time.sleep(delay)
continue
else:
raise
raise last_exception
3. Circuit Breaker Pattern
The circuit breaker pattern prevents cascading failures by temporarily stopping requests to a failing service.
class CircuitBreaker {
constructor(threshold = 5, timeout = 60000, monitoringPeriod = 10000) {
this.threshold = threshold;
this.timeout = timeout;
this.monitoringPeriod = monitoringPeriod;
this.failureCount = 0;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.nextAttempt = Date.now();
this.successCount = 0;
}
async call(apiFunction) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN');
}
this.state = 'HALF_OPEN';
this.successCount = 0;
}
try {
const result = await apiFunction();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
if (this.state === 'HALF_OPEN') {
this.successCount++;
if (this.successCount >= 3) {
this.state = 'CLOSED';
}
}
}
onFailure() {
this.failureCount++;
if (this.failureCount >= this.threshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.timeout;
}
}
}
// Usage example
const circuitBreaker = new CircuitBreaker(5, 30000); // 5 failures, 30s timeout
async function makeAPICall() {
try {
return await circuitBreaker.call(async () => {
const response = await fetch('/api/data');
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return response.json();
});
} catch (error) {
console.error('API call failed:', error.message);
throw error;
}
}
Advanced Error Handling Techniques
4. Rate Limit Handling
When dealing with rate-limited APIs, implement intelligent backoff strategies based on response headers.
def handle_rate_limit(response):
"""Handle rate limit responses intelligently"""
if response.status_code == 429:
# Check for Retry-After header
retry_after = response.headers.get('Retry-After')
if retry_after:
try:
delay = int(retry_after)
time.sleep(delay)
return True
except ValueError:
pass
# Check for X-RateLimit headers
reset_time = response.headers.get('X-RateLimit-Reset')
if reset_time:
try:
reset_timestamp = int(reset_time)
current_time = int(time.time())
delay = max(0, reset_timestamp - current_time)
time.sleep(delay + 1) # Add 1 second buffer
return True
except ValueError:
pass
# Default backoff if no headers available
time.sleep(60)
return True
return False
5. Timeout Configuration
Implement comprehensive timeout strategies to prevent hanging requests.
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
class TimeoutHTTPAdapter(HTTPAdapter):
def __init__(self, timeout=None, *args, **kwargs):
self.timeout = timeout
super().__init__(*args, **kwargs)
def send(self, request, **kwargs):
timeout = kwargs.get('timeout')
if timeout is None and self.timeout is not None:
kwargs['timeout'] = self.timeout
return super().send(request, **kwargs)
def create_robust_session():
"""Create a requests session with robust error handling"""
session = requests.Session()
# Configure retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
)
# Configure timeout adapter
adapter = TimeoutHTTPAdapter(timeout=(5, 30)) # Connect: 5s, Read: 30s
adapter.max_retries = retry_strategy
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
6. Graceful Degradation
Implement fallback mechanisms when primary APIs fail.
class APIService {
constructor() {
this.primaryEndpoint = 'https://api.primary.com';
this.fallbackEndpoint = 'https://api.fallback.com';
this.cache = new Map();
}
async getData(id) {
// Try cache first
const cached = this.cache.get(id);
if (cached && Date.now() - cached.timestamp < 300000) { // 5 min cache
return cached.data;
}
try {
// Try primary endpoint
const data = await this.fetchFromEndpoint(this.primaryEndpoint, id);
this.cache.set(id, { data, timestamp: Date.now() });
return data;
} catch (primaryError) {
console.warn('Primary API failed:', primaryError.message);
try {
// Try fallback endpoint
const data = await this.fetchFromEndpoint(this.fallbackEndpoint, id);
this.cache.set(id, { data, timestamp: Date.now() });
return data;
} catch (fallbackError) {
console.error('Fallback API failed:', fallbackError.message);
// Return cached data if available
if (cached) {
console.info('Returning stale cached data');
return cached.data;
}
throw new Error('All API endpoints failed and no cached data available');
}
}
}
async fetchFromEndpoint(endpoint, id) {
const response = await fetch(`${endpoint}/data/${id}`);
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
return response.json();
}
}
Error Logging and Monitoring
7. Structured Error Logging
Implement comprehensive logging to track and analyze API errors.
import logging
import json
from datetime import datetime
class APIErrorLogger:
def __init__(self):
self.logger = logging.getLogger('api_errors')
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
self.logger.addHandler(handler)
self.logger.setLevel(logging.INFO)
def log_error(self, url, method, status_code, error_type, attempt, max_attempts,
response_time=None, error_message=None):
"""Log API errors with structured data"""
error_data = {
'timestamp': datetime.utcnow().isoformat(),
'url': url,
'method': method,
'status_code': status_code,
'error_type': error_type.value if error_type else None,
'attempt': attempt,
'max_attempts': max_attempts,
'response_time': response_time,
'error_message': error_message
}
self.logger.error(f"API Error: {json.dumps(error_data)}")
# Usage in retry function
logger = APIErrorLogger()
def api_request_with_logging(url, config=None, **kwargs):
"""API request with comprehensive error logging"""
if config is None:
config = RetryConfig()
for attempt in range(config.max_attempts):
start_time = time.time()
try:
response = requests.get(url, **kwargs)
response_time = time.time() - start_time
error_type = classify_error(response)
if error_type:
logger.log_error(
url, 'GET', response.status_code, error_type,
attempt + 1, config.max_attempts, response_time
)
if error_type in [ErrorType.RETRIABLE, ErrorType.RATE_LIMITED]:
if attempt == config.max_attempts - 1:
response.raise_for_status()
continue
elif error_type == ErrorType.NON_RETRIABLE:
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
response_time = time.time() - start_time
error_type = classify_error(None, e)
logger.log_error(
url, 'GET', None, error_type,
attempt + 1, config.max_attempts, response_time, str(e)
)
if error_type == ErrorType.RETRIABLE and attempt < config.max_attempts - 1:
continue
else:
raise
Testing Error Handling
8. Error Simulation for Testing
Create comprehensive tests for your error handling logic.
import pytest
from unittest.mock import Mock, patch
import requests
class TestAPIErrorHandling:
def test_retry_on_server_error(self):
"""Test retry logic for server errors"""
mock_response = Mock()
mock_response.status_code = 500
mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError()
with patch('requests.get', return_value=mock_response) as mock_get:
with pytest.raises(requests.exceptions.HTTPError):
api_request_with_retry('http://test.com', RetryConfig(max_attempts=2))
assert mock_get.call_count == 2
def test_no_retry_on_client_error(self):
"""Test no retry for client errors"""
mock_response = Mock()
mock_response.status_code = 400
mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError()
with patch('requests.get', return_value=mock_response) as mock_get:
with pytest.raises(requests.exceptions.HTTPError):
api_request_with_retry('http://test.com')
assert mock_get.call_count == 1
def test_circuit_breaker_opens_after_failures(self):
"""Test circuit breaker opens after threshold failures"""
async def failing_api():
raise Exception("API Error")
circuit_breaker = CircuitBreaker(threshold=2, timeout=1000)
# First two calls should fail and increment failure count
with pytest.raises(Exception):
await circuit_breaker.call(failing_api)
with pytest.raises(Exception):
await circuit_breaker.call(failing_api)
# Circuit should now be open
assert circuit_breaker.state == 'OPEN'
# Next call should fail immediately without calling the API
with pytest.raises(Exception, match="Circuit breaker is OPEN"):
await circuit_breaker.call(failing_api)
Best Practices Summary
- Classify Errors Properly: Distinguish between retriable and non-retriable errors
- Implement Exponential Backoff: Use increasing delays between retry attempts
- Add Jitter: Randomize retry delays to prevent thundering herd problems
- Use Circuit Breakers: Prevent cascading failures in distributed systems
- Handle Rate Limits Intelligently: Respect API rate limit headers and implement appropriate backoff
- Configure Appropriate Timeouts: Set reasonable connection and read timeouts
- Implement Graceful Degradation: Provide fallback mechanisms and use cached data when possible
- Log Errors Comprehensively: Capture structured error data for monitoring and debugging
- Test Error Scenarios: Create comprehensive tests for all error handling paths
- Monitor Error Rates: Set up alerts for unusual error patterns
When implementing these strategies, consider the specific requirements of your application and the APIs you're working with. Some APIs may have specific error handling requirements or provide additional headers that can inform your retry strategies. For web scraping applications that need to handle timeouts in Puppeteer, similar principles apply when dealing with browser automation errors and timeouts.
Proper API error handling and recovery are essential for building resilient applications that can handle the unpredictable nature of network communications and external service dependencies. By implementing these best practices, you'll create more robust applications that provide better user experiences and easier maintenance.