Table of contents

How can I implement HTTP retry logic for failed requests?

HTTP retry logic is crucial for building resilient web scraping applications that can handle temporary network failures, server errors, and rate limiting. Implementing proper retry mechanisms ensures your applications can recover from transient issues without manual intervention.

Understanding When to Retry Requests

Not all HTTP errors should trigger a retry. Generally, you should retry requests for:

  • 5xx Server Errors (500, 502, 503, 504): Temporary server issues
  • Network timeouts: Connection or read timeouts
  • 429 Too Many Requests: Rate limiting (with proper backoff)
  • 408 Request Timeout: Request took too long

Avoid retrying for: - 4xx Client Errors (except 408, 429): Bad request, unauthorized, not found - Authentication failures: 401, 403 errors - Malformed requests: These won't succeed on retry

Basic Retry Implementation in Python

Here's a simple retry implementation using Python's requests library:

import time
import random
import requests
from requests.exceptions import RequestException, Timeout, ConnectionError

def make_request_with_retry(url, max_retries=3, backoff_factor=1):
    """
    Make HTTP request with exponential backoff retry logic
    """
    for attempt in range(max_retries + 1):
        try:
            response = requests.get(url, timeout=10)

            # Check if we should retry based on status code
            if response.status_code in [500, 502, 503, 504, 429]:
                if attempt == max_retries:
                    response.raise_for_status()

                # Calculate delay with exponential backoff
                delay = backoff_factor * (2 ** attempt) + random.uniform(0, 1)
                print(f"Attempt {attempt + 1} failed with status {response.status_code}")
                print(f"Retrying in {delay:.2f} seconds...")
                time.sleep(delay)
                continue

            # Success - return the response
            return response

        except (ConnectionError, Timeout) as e:
            if attempt == max_retries:
                raise e

            delay = backoff_factor * (2 ** attempt) + random.uniform(0, 1)
            print(f"Network error on attempt {attempt + 1}: {e}")
            print(f"Retrying in {delay:.2f} seconds...")
            time.sleep(delay)

    raise Exception(f"Max retries ({max_retries}) exceeded")

# Usage example
try:
    response = make_request_with_retry("https://httpbin.org/status/500")
    print(f"Success: {response.status_code}")
except Exception as e:
    print(f"Failed after all retries: {e}")

Advanced Python Implementation with Decorators

For more reusable retry logic, you can create a decorator:

import functools
import time
import random
from typing import Callable, Tuple, Optional

def retry_http_request(
    max_retries: int = 3,
    backoff_factor: float = 1.0,
    retry_status_codes: Tuple = (429, 500, 502, 503, 504),
    jitter: bool = True
):
    """
    Decorator for HTTP request retry logic with exponential backoff
    """
    def decorator(func: Callable) -> Callable:
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None

            for attempt in range(max_retries + 1):
                try:
                    response = func(*args, **kwargs)

                    # Check if response status code requires retry
                    if hasattr(response, 'status_code') and response.status_code in retry_status_codes:
                        if attempt == max_retries:
                            return response  # Return failed response on last attempt

                        delay = calculate_delay(attempt, backoff_factor, jitter)
                        print(f"HTTP {response.status_code} on attempt {attempt + 1}, retrying in {delay:.2f}s")
                        time.sleep(delay)
                        continue

                    return response

                except (requests.exceptions.RequestException, ConnectionError, Timeout) as e:
                    last_exception = e
                    if attempt == max_retries:
                        raise e

                    delay = calculate_delay(attempt, backoff_factor, jitter)
                    print(f"Request failed on attempt {attempt + 1}: {e}")
                    print(f"Retrying in {delay:.2f} seconds...")
                    time.sleep(delay)

            raise last_exception

        return wrapper
    return decorator

def calculate_delay(attempt: int, backoff_factor: float, jitter: bool) -> float:
    """Calculate delay with exponential backoff and optional jitter"""
    delay = backoff_factor * (2 ** attempt)
    if jitter:
        delay += random.uniform(0, delay * 0.1)  # Add up to 10% jitter
    return delay

# Usage with decorator
@retry_http_request(max_retries=5, backoff_factor=0.5)
def fetch_data(url):
    return requests.get(url, timeout=10)

# Example usage
try:
    response = fetch_data("https://api.example.com/data")
    print(f"Data fetched successfully: {response.status_code}")
except Exception as e:
    print(f"Failed to fetch data: {e}")

JavaScript/Node.js Implementation

Here's how to implement retry logic in JavaScript using async/await:

const axios = require('axios');

class RetryClient {
    constructor(maxRetries = 3, backoffFactor = 1000) {
        this.maxRetries = maxRetries;
        this.backoffFactor = backoffFactor;
        this.retryStatusCodes = [429, 500, 502, 503, 504];
    }

    async makeRequest(url, options = {}) {
        let lastError;

        for (let attempt = 0; attempt <= this.maxRetries; attempt++) {
            try {
                const response = await axios.get(url, {
                    timeout: 10000,
                    ...options
                });

                // Check if we should retry based on status code
                if (this.retryStatusCodes.includes(response.status)) {
                    if (attempt === this.maxRetries) {
                        return response; // Return failed response on last attempt
                    }

                    const delay = this.calculateDelay(attempt);
                    console.log(`HTTP ${response.status} on attempt ${attempt + 1}, retrying in ${delay}ms`);
                    await this.sleep(delay);
                    continue;
                }

                return response;

            } catch (error) {
                lastError = error;

                // Don't retry on client errors (except 429)
                if (error.response && error.response.status < 500 && error.response.status !== 429) {
                    throw error;
                }

                if (attempt === this.maxRetries) {
                    throw error;
                }

                const delay = this.calculateDelay(attempt);
                console.log(`Request failed on attempt ${attempt + 1}: ${error.message}`);
                console.log(`Retrying in ${delay}ms...`);
                await this.sleep(delay);
            }
        }

        throw lastError;
    }

    calculateDelay(attempt) {
        // Exponential backoff with jitter
        const baseDelay = this.backoffFactor * Math.pow(2, attempt);
        const jitter = Math.random() * baseDelay * 0.1;
        return Math.floor(baseDelay + jitter);
    }

    sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
}

// Usage example
async function fetchWithRetry() {
    const client = new RetryClient(5, 500);

    try {
        const response = await client.makeRequest('https://api.example.com/data');
        console.log('Success:', response.data);
        return response.data;
    } catch (error) {
        console.error('Failed after all retries:', error.message);
        throw error;
    }
}

// Example with specific configuration
fetchWithRetry().catch(console.error);

Using Third-Party Libraries

Python: Using the tenacity library

pip install tenacity
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
import requests

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60),
    retry=retry_if_exception_type((requests.exceptions.RequestException, ConnectionError))
)
def fetch_with_tenacity(url):
    response = requests.get(url, timeout=10)
    # Raise exception for HTTP errors that should trigger retry
    if response.status_code in [500, 502, 503, 504, 429]:
        response.raise_for_status()
    return response

# Usage
try:
    response = fetch_with_tenacity("https://api.example.com/data")
    print(f"Success: {response.status_code}")
except Exception as e:
    print(f"Failed: {e}")

JavaScript: Using the axios-retry library

npm install axios axios-retry
const axios = require('axios');
const axiosRetry = require('axios-retry');

// Configure axios-retry
axiosRetry(axios, {
    retries: 5,
    retryDelay: axiosRetry.exponentialDelay,
    retryCondition: (error) => {
        // Retry on network errors or 5xx errors
        return axiosRetry.isNetworkError(error) || 
               axiosRetry.isRetryableError(error) ||
               (error.response && error.response.status === 429);
    }
});

// Usage
async function fetchData() {
    try {
        const response = await axios.get('https://api.example.com/data', {
            timeout: 10000,
            'axios-retry': {
                retries: 3,
                retryDelay: (retryCount) => {
                    return retryCount * 1000; // 1s, 2s, 3s
                }
            }
        });

        console.log('Data fetched:', response.data);
        return response.data;
    } catch (error) {
        console.error('Request failed:', error.message);
        throw error;
    }
}

Implementing Circuit Breaker Pattern

For additional resilience, consider implementing a circuit breaker pattern alongside retries:

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time >= self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e

    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()

        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage with circuit breaker
circuit_breaker = CircuitBreaker(failure_threshold=3, timeout=30)

def protected_request(url):
    return circuit_breaker.call(make_request_with_retry, url)

Best Practices for HTTP Retry Logic

1. Use Exponential Backoff with Jitter

Always implement exponential backoff to avoid overwhelming servers, and add jitter to prevent the "thundering herd" problem when multiple clients retry simultaneously.

2. Set Maximum Retry Limits

Implement both maximum retry attempts and total timeout limits to prevent infinite retry loops.

3. Log Retry Attempts

Include comprehensive logging for debugging and monitoring purposes.

4. Handle Rate Limiting Properly

When encountering 429 (Too Many Requests) responses, respect the Retry-After header if present:

def handle_rate_limit(response):
    if response.status_code == 429:
        retry_after = response.headers.get('Retry-After')
        if retry_after:
            try:
                delay = int(retry_after)
                print(f"Rate limited. Waiting {delay} seconds...")
                time.sleep(delay)
                return True
            except ValueError:
                pass
    return False

5. Consider Idempotency

Ensure your requests are idempotent when implementing retries, especially for POST, PUT, and DELETE operations.

Integration with Web Scraping Tools

When working with browser automation tools, retry logic becomes even more important. For instance, when handling timeouts in Puppeteer, you might want to implement page-level retry mechanisms alongside your HTTP retry logic.

Similarly, understanding how to handle errors in Puppeteer can help you create more robust scraping applications that gracefully handle both network-level and browser-level failures.

Monitoring and Alerting

Implement monitoring to track retry patterns and failure rates:

import logging
from collections import defaultdict

class RetryMetrics:
    def __init__(self):
        self.retry_counts = defaultdict(int)
        self.failure_counts = defaultdict(int)

    def record_retry(self, url, attempt):
        self.retry_counts[url] += 1

    def record_failure(self, url):
        self.failure_counts[url] += 1

    def get_stats(self):
        return {
            'total_retries': sum(self.retry_counts.values()),
            'total_failures': sum(self.failure_counts.values()),
            'retry_by_url': dict(self.retry_counts),
            'failures_by_url': dict(self.failure_counts)
        }

# Usage
metrics = RetryMetrics()

Conclusion

Implementing robust HTTP retry logic is essential for building reliable web scraping applications. By combining exponential backoff, jitter, proper error classification, and circuit breaker patterns, you can create resilient systems that handle temporary failures gracefully while avoiding unnecessary load on target servers.

Remember to always respect rate limits, implement proper logging, and monitor your retry patterns to ensure optimal performance and reliability in your web scraping projects.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon