Table of contents

What is the best way to implement retry logic with Reqwest?

When building robust web scraping applications with Reqwest in Rust, implementing proper retry logic is essential for handling network failures, temporary server issues, and rate limiting. This guide covers multiple approaches to implement retry mechanisms that ensure your applications can gracefully recover from transient errors.

Why Retry Logic is Important

Network requests can fail for various reasons: - Temporary network connectivity issues - Server overload or maintenance - Rate limiting responses (HTTP 429) - Timeout errors - DNS resolution failures

Implementing retry logic helps your applications handle these scenarios gracefully and improve overall reliability.

Basic Retry Implementation

Here's a simple retry implementation using a loop and exponential backoff:

use reqwest;
use std::time::Duration;
use tokio::time::sleep;

async fn fetch_with_retry(url: &str, max_retries: u32) -> Result<reqwest::Response, reqwest::Error> {
    let client = reqwest::Client::new();
    let mut attempt = 0;

    loop {
        match client.get(url).send().await {
            Ok(response) => {
                if response.status().is_success() {
                    return Ok(response);
                } else if response.status().as_u16() == 429 || response.status().is_server_error() {
                    // Retry on rate limiting or server errors
                    if attempt >= max_retries {
                        return Err(reqwest::Error::from(response.error_for_status().unwrap_err()));
                    }
                } else {
                    // Don't retry on client errors (4xx except 429)
                    return Err(reqwest::Error::from(response.error_for_status().unwrap_err()));
                }
            }
            Err(e) => {
                if attempt >= max_retries {
                    return Err(e);
                }
            }
        }

        attempt += 1;
        let delay = Duration::from_millis(1000 * 2_u64.pow(attempt));
        sleep(delay).await;
    }
}

// Usage
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let response = fetch_with_retry("https://api.example.com/data", 3).await?;
    let body = response.text().await?;
    println!("Response: {}", body);
    Ok(())
}

Advanced Retry with Custom Error Handling

For more sophisticated retry logic, you can create a custom retry function that handles different types of errors differently:

use reqwest::{Client, Response, Error};
use std::time::Duration;
use tokio::time::sleep;

#[derive(Debug)]
pub struct RetryConfig {
    pub max_retries: u32,
    pub base_delay_ms: u64,
    pub max_delay_ms: u64,
    pub backoff_multiplier: f64,
}

impl Default for RetryConfig {
    fn default() -> Self {
        RetryConfig {
            max_retries: 3,
            base_delay_ms: 1000,
            max_delay_ms: 30000,
            backoff_multiplier: 2.0,
        }
    }
}

pub async fn retry_request<F, Fut>(
    request_fn: F,
    config: RetryConfig,
) -> Result<Response, Error>
where
    F: Fn() -> Fut,
    Fut: std::future::Future<Output = Result<Response, Error>>,
{
    let mut attempt = 0;

    loop {
        match request_fn().await {
            Ok(response) => {
                let status = response.status();

                // Success cases
                if status.is_success() {
                    return Ok(response);
                }

                // Determine if we should retry based on status code
                let should_retry = match status.as_u16() {
                    429 => true, // Rate limited
                    500..=599 => true, // Server errors
                    408 => true, // Request timeout
                    _ => false, // Don't retry on other client errors
                };

                if !should_retry || attempt >= config.max_retries {
                    return Ok(response);
                }
            }
            Err(e) => {
                // Check if error is retryable
                let should_retry = is_retryable_error(&e);

                if !should_retry || attempt >= config.max_retries {
                    return Err(e);
                }
            }
        }

        attempt += 1;
        let delay = calculate_delay(attempt, &config);
        sleep(delay).await;
    }
}

fn is_retryable_error(error: &Error) -> bool {
    if error.is_timeout() || error.is_connect() {
        return true;
    }

    // Check for specific error types that warrant retry
    if let Some(source) = error.source() {
        if source.to_string().contains("dns") {
            return true;
        }
    }

    false
}

fn calculate_delay(attempt: u32, config: &RetryConfig) -> Duration {
    let delay_ms = (config.base_delay_ms as f64 * config.backoff_multiplier.powi(attempt as i32 - 1)) as u64;
    let capped_delay = delay_ms.min(config.max_delay_ms);
    Duration::from_millis(capped_delay)
}

// Usage example
async fn scrape_with_retry() -> Result<String, Box<dyn std::error::Error>> {
    let client = Client::new();
    let url = "https://api.example.com/data";

    let config = RetryConfig {
        max_retries: 5,
        base_delay_ms: 500,
        max_delay_ms: 10000,
        backoff_multiplier: 1.5,
    };

    let response = retry_request(
        || client.get(url).send(),
        config,
    ).await?;

    let body = response.text().await?;
    Ok(body)
}

Using External Retry Crates

For production applications, consider using dedicated retry crates that provide more features:

Using the tokio-retry crate

First, add the dependency to your Cargo.toml:

[dependencies]
reqwest = { version = "0.11", features = ["json"] }
tokio = { version = "1.0", features = ["full"] }
tokio-retry = "0.3"

Then implement retry logic:

use reqwest::Client;
use tokio_retry::{strategy::ExponentialBackoff, Retry};
use std::time::Duration;

async fn fetch_with_tokio_retry(url: &str) -> Result<String, Box<dyn std::error::Error>> {
    let client = Client::new();

    let retry_strategy = ExponentialBackoff::from_millis(100)
        .max_delay(Duration::from_secs(10))
        .take(5); // Maximum 5 attempts

    let result = Retry::spawn(retry_strategy, || async {
        let response = client.get(url).send().await?;

        // Check if we should retry based on status
        if response.status().is_server_error() || response.status() == 429 {
            return Err(format!("Server error: {}", response.status()));
        }

        let text = response.text().await?;
        Ok::<String, Box<dyn std::error::Error>>(text)
    }).await?;

    Ok(result)
}

Retry with Rate Limiting Awareness

When dealing with APIs that implement rate limiting, it's important to respect Retry-After headers:

use reqwest::{Client, Response};
use std::time::Duration;
use tokio::time::sleep;

async fn fetch_with_rate_limit_handling(
    url: &str,
    max_retries: u32,
) -> Result<Response, reqwest::Error> {
    let client = Client::new();
    let mut attempt = 0;

    loop {
        let response = client.get(url).send().await?;

        if response.status().is_success() {
            return Ok(response);
        }

        if response.status() == 429 {
            if attempt >= max_retries {
                return Ok(response);
            }

            // Check for Retry-After header
            let delay = if let Some(retry_after) = response.headers().get("retry-after") {
                if let Ok(seconds) = retry_after.to_str().unwrap_or("").parse::<u64>() {
                    Duration::from_secs(seconds)
                } else {
                    Duration::from_secs(60) // Default fallback
                }
            } else {
                Duration::from_secs(60)
            };

            attempt += 1;
            sleep(delay).await;
            continue;
        }

        // For other errors, use exponential backoff
        if response.status().is_server_error() && attempt < max_retries {
            attempt += 1;
            let delay = Duration::from_millis(1000 * 2_u64.pow(attempt));
            sleep(delay).await;
            continue;
        }

        return Ok(response);
    }
}

Integration with Web Scraping Workflows

When implementing retry logic in web scraping applications, similar to how to handle timeouts in Puppeteer, you need to consider the broader context of your scraping workflow:

use reqwest::Client;
use serde_json::Value;
use std::collections::HashMap;

pub struct WebScraper {
    client: Client,
    retry_config: RetryConfig,
}

impl WebScraper {
    pub fn new() -> Self {
        let client = Client::builder()
            .timeout(Duration::from_secs(30))
            .user_agent("Mozilla/5.0 (compatible; WebScraper/1.0)")
            .build()
            .expect("Failed to create HTTP client");

        WebScraper {
            client,
            retry_config: RetryConfig::default(),
        }
    }

    pub async fn scrape_json_data(&self, url: &str) -> Result<Value, Box<dyn std::error::Error>> {
        let response = retry_request(
            || self.client.get(url).send(),
            self.retry_config.clone(),
        ).await?;

        let json: Value = response.json().await?;
        Ok(json)
    }

    pub async fn scrape_multiple_pages(&self, urls: Vec<&str>) -> Vec<Result<String, Box<dyn std::error::Error>>> {
        let mut results = Vec::new();

        for url in urls {
            let result = async {
                let response = retry_request(
                    || self.client.get(url).send(),
                    self.retry_config.clone(),
                ).await?;

                let body = response.text().await?;
                Ok::<String, Box<dyn std::error::Error>>(body)
            }.await;

            results.push(result);

            // Add delay between requests to be respectful
            sleep(Duration::from_millis(500)).await;
        }

        results
    }
}

Best Practices for Retry Logic

  1. Use Exponential Backoff: Gradually increase delays between retries to avoid overwhelming servers
  2. Set Maximum Retry Limits: Prevent infinite retry loops that could waste resources
  3. Respect Rate Limiting: Honor Retry-After headers and implement appropriate delays
  4. Log Retry Attempts: Keep track of retry attempts for debugging and monitoring
  5. Handle Different Error Types: Distinguish between retryable and non-retryable errors
  6. Use Jitter: Add randomness to delays to prevent thundering herd problems

Similar to how to handle errors in Puppeteer, proper error handling and retry mechanisms are crucial for building reliable web scraping applications.

Monitoring and Logging

Implement comprehensive logging for your retry logic:

use log::{info, warn, error};

async fn fetch_with_logging(url: &str, max_retries: u32) -> Result<reqwest::Response, reqwest::Error> {
    let client = reqwest::Client::new();
    let mut attempt = 0;

    loop {
        info!("Attempting request to {} (attempt {})", url, attempt + 1);

        match client.get(url).send().await {
            Ok(response) => {
                if response.status().is_success() {
                    info!("Successfully fetched {} after {} attempts", url, attempt + 1);
                    return Ok(response);
                } else {
                    warn!("Request to {} failed with status: {}", url, response.status());
                    if attempt >= max_retries {
                        error!("Max retries exceeded for {}", url);
                        return Ok(response);
                    }
                }
            }
            Err(e) => {
                warn!("Request to {} failed with error: {}", url, e);
                if attempt >= max_retries {
                    error!("Max retries exceeded for {} due to error: {}", url, e);
                    return Err(e);
                }
            }
        }

        attempt += 1;
        let delay = Duration::from_millis(1000 * 2_u64.pow(attempt));
        info!("Retrying {} in {:?}", url, delay);
        sleep(delay).await;
    }
}

Conclusion

Implementing robust retry logic with Reqwest is essential for building reliable web scraping applications. Whether you choose to implement custom retry mechanisms or use external crates like tokio-retry, the key is to handle different types of errors appropriately, respect server limitations, and implement proper backoff strategies. By following these patterns and best practices, your Rust applications will be better equipped to handle the challenges of real-world web scraping scenarios.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon