How do I retry failed requests automatically with Requests?
When building robust web scraping applications or API clients, handling failed requests gracefully is crucial for reliability. The Python Requests library provides several ways to automatically retry failed requests, from built-in mechanisms to custom retry strategies. This guide covers comprehensive approaches to implement automatic retries for failed HTTP requests.
Using urllib3 Retry with Requests Session
The most efficient way to implement automatic retries is using the urllib3.util.Retry
class with a Requests session. This approach provides fine-grained control over retry behavior while being highly performant.
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
# Create a retry strategy
retry_strategy = Retry(
total=3, # Total number of retries
status_forcelist=[429, 500, 502, 503, 504], # HTTP status codes to retry
backoff_factor=1, # Backoff factor for exponential backoff
respect_retry_after_header=True # Respect server's Retry-After header
)
# Create an HTTP adapter with the retry strategy
adapter = HTTPAdapter(max_retries=retry_strategy)
# Create a session and mount the adapter
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)
# Make requests that will automatically retry on failure
try:
response = session.get("https://api.example.com/data", timeout=10)
response.raise_for_status()
print("Request successful:", response.json())
except requests.exceptions.RequestException as e:
print(f"Request failed after retries: {e}")
Advanced Retry Configuration
For more complex scenarios, you can customize the retry behavior with additional parameters:
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
import time
# Advanced retry configuration
retry_strategy = Retry(
total=5, # Maximum number of retries
read=2, # Retries for read errors
connect=3, # Retries for connection errors
status=3, # Retries for HTTP status errors
status_forcelist=[408, 429, 500, 502, 503, 504, 520, 522, 524],
backoff_factor=2, # Exponential backoff multiplier
respect_retry_after_header=True, # Honor server's retry-after header
raise_on_redirect=False, # Don't raise on redirects
raise_on_status=False # Don't raise on status errors initially
)
# Custom adapter with retry strategy
class RetryAdapter(HTTPAdapter):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def send(self, request, **kwargs):
"""Override send to add custom logging"""
try:
response = super().send(request, **kwargs)
print(f"Request to {request.url} succeeded with status {response.status_code}")
return response
except Exception as e:
print(f"Request to {request.url} failed: {e}")
raise
# Setup session with advanced retry
session = requests.Session()
adapter = RetryAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
# Add custom headers for better success rates
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})
Custom Retry Decorator
For more control over retry logic, you can implement a custom retry decorator:
import requests
import time
import random
from functools import wraps
def retry_request(max_retries=3, backoff_factor=1, status_codes=None):
"""
Decorator for retrying requests with exponential backoff
Args:
max_retries: Maximum number of retry attempts
backoff_factor: Multiplier for exponential backoff
status_codes: List of status codes that should trigger retry
"""
if status_codes is None:
status_codes = [408, 429, 500, 502, 503, 504]
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries + 1):
try:
response = func(*args, **kwargs)
# Check if status code should trigger retry
if response.status_code in status_codes:
if attempt < max_retries:
wait_time = backoff_factor * (2 ** attempt) + random.uniform(0, 1)
print(f"Request failed with status {response.status_code}. "
f"Retrying in {wait_time:.2f} seconds... (Attempt {attempt + 1})")
time.sleep(wait_time)
continue
else:
response.raise_for_status()
return response
except requests.exceptions.RequestException as e:
if attempt < max_retries:
wait_time = backoff_factor * (2 ** attempt) + random.uniform(0, 1)
print(f"Request failed: {e}. "
f"Retrying in {wait_time:.2f} seconds... (Attempt {attempt + 1})")
time.sleep(wait_time)
else:
raise e
return response
return wrapper
return decorator
# Usage example
@retry_request(max_retries=3, backoff_factor=2, status_codes=[429, 500, 502, 503])
def fetch_data(url):
response = requests.get(url, timeout=10)
return response
# Make request with automatic retries
try:
response = fetch_data("https://api.example.com/data")
print("Data retrieved successfully:", response.json())
except requests.exceptions.RequestException as e:
print(f"All retry attempts failed: {e}")
Handling Rate Limiting
When dealing with APIs that implement rate limiting, it's important to respect the Retry-After
header:
import requests
import time
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
def create_session_with_rate_limit_handling():
"""Create a session that properly handles rate limiting"""
class RateLimitAdapter(HTTPAdapter):
def send(self, request, **kwargs):
response = super().send(request, **kwargs)
# Handle rate limiting with Retry-After header
if response.status_code == 429:
retry_after = response.headers.get('Retry-After')
if retry_after:
wait_time = int(retry_after)
print(f"Rate limited. Waiting {wait_time} seconds...")
time.sleep(wait_time)
# Make the request again
return super().send(request, **kwargs)
return response
# Configure retry strategy for rate limiting
retry_strategy = Retry(
total=5,
status_forcelist=[429, 500, 502, 503, 504],
backoff_factor=1,
respect_retry_after_header=True
)
session = requests.Session()
adapter = RateLimitAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
# Usage
session = create_session_with_rate_limit_handling()
response = session.get("https://api.example.com/data")
Retry with Circuit Breaker Pattern
For production applications, implementing a circuit breaker pattern can prevent cascading failures:
import requests
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = 1
OPEN = 2
HALF_OPEN = 3
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
self.reset()
return result
except Exception as e:
self.record_failure()
raise e
def record_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
def reset(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
# Usage with circuit breaker
circuit_breaker = CircuitBreaker(failure_threshold=3, timeout=30)
def make_request_with_circuit_breaker(url):
def _make_request():
response = requests.get(url, timeout=10)
response.raise_for_status()
return response
return circuit_breaker.call(_make_request)
Best Practices for Request Retries
1. Implement Exponential Backoff with Jitter
Adding randomness to backoff intervals prevents thundering herd problems:
import random
def exponential_backoff_with_jitter(attempt, base_delay=1, max_delay=60):
"""Calculate delay with exponential backoff and jitter"""
delay = min(base_delay * (2 ** attempt), max_delay)
jitter = random.uniform(0, delay * 0.1) # Add 10% jitter
return delay + jitter
2. Set Appropriate Timeouts
Always use timeouts to prevent hanging requests:
# Configure timeouts
session = requests.Session()
session.timeout = (5, 30) # (connect_timeout, read_timeout)
3. Monitor and Log Retry Attempts
Implement proper logging for debugging and monitoring:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def log_retry_attempt(url, attempt, total_attempts, error=None):
if error:
logger.warning(f"Request to {url} failed (attempt {attempt}/{total_attempts}): {error}")
else:
logger.info(f"Retrying request to {url} (attempt {attempt}/{total_attempts})")
Integration with Web Scraping Workflows
When building web scraping applications, retry mechanisms work well with other resilience patterns. Similar to how you might handle timeouts in Puppeteer for browser-based scraping, implementing robust retry logic ensures your HTTP-based scrapers can handle temporary failures gracefully.
For complex scraping workflows that require both HTTP requests and browser automation, you can combine the retry strategies shown here with browser-based approaches for handling errors in Puppeteer to create comprehensive error handling across your entire scraping pipeline.
Conclusion
Implementing automatic retry mechanisms for failed requests is essential for building robust web scraping and API client applications. The urllib3 Retry class provides the most efficient approach for most use cases, while custom decorators and circuit breakers offer additional control for complex scenarios.
Key takeaways for implementing request retries:
- Use
urllib3.util.Retry
withrequests.Session
for built-in retry functionality - Implement exponential backoff with jitter to avoid overwhelming servers
- Respect rate limiting headers and implement proper backoff strategies
- Add circuit breaker patterns for production applications
- Always set appropriate timeouts and implement comprehensive logging
- Test your retry logic thoroughly to ensure it behaves correctly under various failure conditions
By following these patterns, you can build resilient applications that gracefully handle network failures, temporary server issues, and rate limiting while maintaining good performance and user experience.