Table of contents

How do I retry failed requests automatically with Requests?

When building robust web scraping applications or API clients, handling failed requests gracefully is crucial for reliability. The Python Requests library provides several ways to automatically retry failed requests, from built-in mechanisms to custom retry strategies. This guide covers comprehensive approaches to implement automatic retries for failed HTTP requests.

Using urllib3 Retry with Requests Session

The most efficient way to implement automatic retries is using the urllib3.util.Retry class with a Requests session. This approach provides fine-grained control over retry behavior while being highly performant.

import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

# Create a retry strategy
retry_strategy = Retry(
    total=3,                    # Total number of retries
    status_forcelist=[429, 500, 502, 503, 504],  # HTTP status codes to retry
    backoff_factor=1,           # Backoff factor for exponential backoff
    respect_retry_after_header=True  # Respect server's Retry-After header
)

# Create an HTTP adapter with the retry strategy
adapter = HTTPAdapter(max_retries=retry_strategy)

# Create a session and mount the adapter
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)

# Make requests that will automatically retry on failure
try:
    response = session.get("https://api.example.com/data", timeout=10)
    response.raise_for_status()
    print("Request successful:", response.json())
except requests.exceptions.RequestException as e:
    print(f"Request failed after retries: {e}")

Advanced Retry Configuration

For more complex scenarios, you can customize the retry behavior with additional parameters:

import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
import time

# Advanced retry configuration
retry_strategy = Retry(
    total=5,                              # Maximum number of retries
    read=2,                              # Retries for read errors
    connect=3,                           # Retries for connection errors
    status=3,                            # Retries for HTTP status errors
    status_forcelist=[408, 429, 500, 502, 503, 504, 520, 522, 524],
    backoff_factor=2,                    # Exponential backoff multiplier
    respect_retry_after_header=True,     # Honor server's retry-after header
    raise_on_redirect=False,             # Don't raise on redirects
    raise_on_status=False                # Don't raise on status errors initially
)

# Custom adapter with retry strategy
class RetryAdapter(HTTPAdapter):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def send(self, request, **kwargs):
        """Override send to add custom logging"""
        try:
            response = super().send(request, **kwargs)
            print(f"Request to {request.url} succeeded with status {response.status_code}")
            return response
        except Exception as e:
            print(f"Request to {request.url} failed: {e}")
            raise

# Setup session with advanced retry
session = requests.Session()
adapter = RetryAdapter(max_retries=retry_strategy)
session.mount("http://", adapter)
session.mount("https://", adapter)

# Add custom headers for better success rates
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})

Custom Retry Decorator

For more control over retry logic, you can implement a custom retry decorator:

import requests
import time
import random
from functools import wraps

def retry_request(max_retries=3, backoff_factor=1, status_codes=None):
    """
    Decorator for retrying requests with exponential backoff

    Args:
        max_retries: Maximum number of retry attempts
        backoff_factor: Multiplier for exponential backoff
        status_codes: List of status codes that should trigger retry
    """
    if status_codes is None:
        status_codes = [408, 429, 500, 502, 503, 504]

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries + 1):
                try:
                    response = func(*args, **kwargs)

                    # Check if status code should trigger retry
                    if response.status_code in status_codes:
                        if attempt < max_retries:
                            wait_time = backoff_factor * (2 ** attempt) + random.uniform(0, 1)
                            print(f"Request failed with status {response.status_code}. "
                                  f"Retrying in {wait_time:.2f} seconds... (Attempt {attempt + 1})")
                            time.sleep(wait_time)
                            continue
                        else:
                            response.raise_for_status()

                    return response

                except requests.exceptions.RequestException as e:
                    if attempt < max_retries:
                        wait_time = backoff_factor * (2 ** attempt) + random.uniform(0, 1)
                        print(f"Request failed: {e}. "
                              f"Retrying in {wait_time:.2f} seconds... (Attempt {attempt + 1})")
                        time.sleep(wait_time)
                    else:
                        raise e

            return response
        return wrapper
    return decorator

# Usage example
@retry_request(max_retries=3, backoff_factor=2, status_codes=[429, 500, 502, 503])
def fetch_data(url):
    response = requests.get(url, timeout=10)
    return response

# Make request with automatic retries
try:
    response = fetch_data("https://api.example.com/data")
    print("Data retrieved successfully:", response.json())
except requests.exceptions.RequestException as e:
    print(f"All retry attempts failed: {e}")

Handling Rate Limiting

When dealing with APIs that implement rate limiting, it's important to respect the Retry-After header:

import requests
import time
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

def create_session_with_rate_limit_handling():
    """Create a session that properly handles rate limiting"""

    class RateLimitAdapter(HTTPAdapter):
        def send(self, request, **kwargs):
            response = super().send(request, **kwargs)

            # Handle rate limiting with Retry-After header
            if response.status_code == 429:
                retry_after = response.headers.get('Retry-After')
                if retry_after:
                    wait_time = int(retry_after)
                    print(f"Rate limited. Waiting {wait_time} seconds...")
                    time.sleep(wait_time)
                    # Make the request again
                    return super().send(request, **kwargs)

            return response

    # Configure retry strategy for rate limiting
    retry_strategy = Retry(
        total=5,
        status_forcelist=[429, 500, 502, 503, 504],
        backoff_factor=1,
        respect_retry_after_header=True
    )

    session = requests.Session()
    adapter = RateLimitAdapter(max_retries=retry_strategy)
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    return session

# Usage
session = create_session_with_rate_limit_handling()
response = session.get("https://api.example.com/data")

Retry with Circuit Breaker Pattern

For production applications, implementing a circuit breaker pattern can prevent cascading failures:

import requests
import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = 1
    OPEN = 2
    HALF_OPEN = 3

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func(*args, **kwargs)
            self.reset()
            return result
        except Exception as e:
            self.record_failure()
            raise e

    def record_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()

        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

    def reset(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

# Usage with circuit breaker
circuit_breaker = CircuitBreaker(failure_threshold=3, timeout=30)

def make_request_with_circuit_breaker(url):
    def _make_request():
        response = requests.get(url, timeout=10)
        response.raise_for_status()
        return response

    return circuit_breaker.call(_make_request)

Best Practices for Request Retries

1. Implement Exponential Backoff with Jitter

Adding randomness to backoff intervals prevents thundering herd problems:

import random

def exponential_backoff_with_jitter(attempt, base_delay=1, max_delay=60):
    """Calculate delay with exponential backoff and jitter"""
    delay = min(base_delay * (2 ** attempt), max_delay)
    jitter = random.uniform(0, delay * 0.1)  # Add 10% jitter
    return delay + jitter

2. Set Appropriate Timeouts

Always use timeouts to prevent hanging requests:

# Configure timeouts
session = requests.Session()
session.timeout = (5, 30)  # (connect_timeout, read_timeout)

3. Monitor and Log Retry Attempts

Implement proper logging for debugging and monitoring:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def log_retry_attempt(url, attempt, total_attempts, error=None):
    if error:
        logger.warning(f"Request to {url} failed (attempt {attempt}/{total_attempts}): {error}")
    else:
        logger.info(f"Retrying request to {url} (attempt {attempt}/{total_attempts})")

Integration with Web Scraping Workflows

When building web scraping applications, retry mechanisms work well with other resilience patterns. Similar to how you might handle timeouts in Puppeteer for browser-based scraping, implementing robust retry logic ensures your HTTP-based scrapers can handle temporary failures gracefully.

For complex scraping workflows that require both HTTP requests and browser automation, you can combine the retry strategies shown here with browser-based approaches for handling errors in Puppeteer to create comprehensive error handling across your entire scraping pipeline.

Conclusion

Implementing automatic retry mechanisms for failed requests is essential for building robust web scraping and API client applications. The urllib3 Retry class provides the most efficient approach for most use cases, while custom decorators and circuit breakers offer additional control for complex scenarios.

Key takeaways for implementing request retries:

  • Use urllib3.util.Retry with requests.Session for built-in retry functionality
  • Implement exponential backoff with jitter to avoid overwhelming servers
  • Respect rate limiting headers and implement proper backoff strategies
  • Add circuit breaker patterns for production applications
  • Always set appropriate timeouts and implement comprehensive logging
  • Test your retry logic thoroughly to ensure it behaves correctly under various failure conditions

By following these patterns, you can build resilient applications that gracefully handle network failures, temporary server issues, and rate limiting while maintaining good performance and user experience.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon