What is the best way to implement retry logic with Reqwest?
When building robust web scraping applications with Reqwest in Rust, implementing proper retry logic is essential for handling network failures, temporary server issues, and rate limiting. This guide covers multiple approaches to implement retry mechanisms that ensure your applications can gracefully recover from transient errors.
Why Retry Logic is Important
Network requests can fail for various reasons: - Temporary network connectivity issues - Server overload or maintenance - Rate limiting responses (HTTP 429) - Timeout errors - DNS resolution failures
Implementing retry logic helps your applications handle these scenarios gracefully and improve overall reliability.
Basic Retry Implementation
Here's a simple retry implementation using a loop and exponential backoff:
use reqwest;
use std::time::Duration;
use tokio::time::sleep;
async fn fetch_with_retry(url: &str, max_retries: u32) -> Result<reqwest::Response, reqwest::Error> {
let client = reqwest::Client::new();
let mut attempt = 0;
loop {
match client.get(url).send().await {
Ok(response) => {
if response.status().is_success() {
return Ok(response);
} else if response.status().as_u16() == 429 || response.status().is_server_error() {
// Retry on rate limiting or server errors
if attempt >= max_retries {
return Err(reqwest::Error::from(response.error_for_status().unwrap_err()));
}
} else {
// Don't retry on client errors (4xx except 429)
return Err(reqwest::Error::from(response.error_for_status().unwrap_err()));
}
}
Err(e) => {
if attempt >= max_retries {
return Err(e);
}
}
}
attempt += 1;
let delay = Duration::from_millis(1000 * 2_u64.pow(attempt));
sleep(delay).await;
}
}
// Usage
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let response = fetch_with_retry("https://api.example.com/data", 3).await?;
let body = response.text().await?;
println!("Response: {}", body);
Ok(())
}
Advanced Retry with Custom Error Handling
For more sophisticated retry logic, you can create a custom retry function that handles different types of errors differently:
use reqwest::{Client, Response, Error};
use std::time::Duration;
use tokio::time::sleep;
#[derive(Debug)]
pub struct RetryConfig {
pub max_retries: u32,
pub base_delay_ms: u64,
pub max_delay_ms: u64,
pub backoff_multiplier: f64,
}
impl Default for RetryConfig {
fn default() -> Self {
RetryConfig {
max_retries: 3,
base_delay_ms: 1000,
max_delay_ms: 30000,
backoff_multiplier: 2.0,
}
}
}
pub async fn retry_request<F, Fut>(
request_fn: F,
config: RetryConfig,
) -> Result<Response, Error>
where
F: Fn() -> Fut,
Fut: std::future::Future<Output = Result<Response, Error>>,
{
let mut attempt = 0;
loop {
match request_fn().await {
Ok(response) => {
let status = response.status();
// Success cases
if status.is_success() {
return Ok(response);
}
// Determine if we should retry based on status code
let should_retry = match status.as_u16() {
429 => true, // Rate limited
500..=599 => true, // Server errors
408 => true, // Request timeout
_ => false, // Don't retry on other client errors
};
if !should_retry || attempt >= config.max_retries {
return Ok(response);
}
}
Err(e) => {
// Check if error is retryable
let should_retry = is_retryable_error(&e);
if !should_retry || attempt >= config.max_retries {
return Err(e);
}
}
}
attempt += 1;
let delay = calculate_delay(attempt, &config);
sleep(delay).await;
}
}
fn is_retryable_error(error: &Error) -> bool {
if error.is_timeout() || error.is_connect() {
return true;
}
// Check for specific error types that warrant retry
if let Some(source) = error.source() {
if source.to_string().contains("dns") {
return true;
}
}
false
}
fn calculate_delay(attempt: u32, config: &RetryConfig) -> Duration {
let delay_ms = (config.base_delay_ms as f64 * config.backoff_multiplier.powi(attempt as i32 - 1)) as u64;
let capped_delay = delay_ms.min(config.max_delay_ms);
Duration::from_millis(capped_delay)
}
// Usage example
async fn scrape_with_retry() -> Result<String, Box<dyn std::error::Error>> {
let client = Client::new();
let url = "https://api.example.com/data";
let config = RetryConfig {
max_retries: 5,
base_delay_ms: 500,
max_delay_ms: 10000,
backoff_multiplier: 1.5,
};
let response = retry_request(
|| client.get(url).send(),
config,
).await?;
let body = response.text().await?;
Ok(body)
}
Using External Retry Crates
For production applications, consider using dedicated retry crates that provide more features:
Using the tokio-retry
crate
First, add the dependency to your Cargo.toml
:
[dependencies]
reqwest = { version = "0.11", features = ["json"] }
tokio = { version = "1.0", features = ["full"] }
tokio-retry = "0.3"
Then implement retry logic:
use reqwest::Client;
use tokio_retry::{strategy::ExponentialBackoff, Retry};
use std::time::Duration;
async fn fetch_with_tokio_retry(url: &str) -> Result<String, Box<dyn std::error::Error>> {
let client = Client::new();
let retry_strategy = ExponentialBackoff::from_millis(100)
.max_delay(Duration::from_secs(10))
.take(5); // Maximum 5 attempts
let result = Retry::spawn(retry_strategy, || async {
let response = client.get(url).send().await?;
// Check if we should retry based on status
if response.status().is_server_error() || response.status() == 429 {
return Err(format!("Server error: {}", response.status()));
}
let text = response.text().await?;
Ok::<String, Box<dyn std::error::Error>>(text)
}).await?;
Ok(result)
}
Retry with Rate Limiting Awareness
When dealing with APIs that implement rate limiting, it's important to respect Retry-After
headers:
use reqwest::{Client, Response};
use std::time::Duration;
use tokio::time::sleep;
async fn fetch_with_rate_limit_handling(
url: &str,
max_retries: u32,
) -> Result<Response, reqwest::Error> {
let client = Client::new();
let mut attempt = 0;
loop {
let response = client.get(url).send().await?;
if response.status().is_success() {
return Ok(response);
}
if response.status() == 429 {
if attempt >= max_retries {
return Ok(response);
}
// Check for Retry-After header
let delay = if let Some(retry_after) = response.headers().get("retry-after") {
if let Ok(seconds) = retry_after.to_str().unwrap_or("").parse::<u64>() {
Duration::from_secs(seconds)
} else {
Duration::from_secs(60) // Default fallback
}
} else {
Duration::from_secs(60)
};
attempt += 1;
sleep(delay).await;
continue;
}
// For other errors, use exponential backoff
if response.status().is_server_error() && attempt < max_retries {
attempt += 1;
let delay = Duration::from_millis(1000 * 2_u64.pow(attempt));
sleep(delay).await;
continue;
}
return Ok(response);
}
}
Integration with Web Scraping Workflows
When implementing retry logic in web scraping applications, similar to how to handle timeouts in Puppeteer, you need to consider the broader context of your scraping workflow:
use reqwest::Client;
use serde_json::Value;
use std::collections::HashMap;
pub struct WebScraper {
client: Client,
retry_config: RetryConfig,
}
impl WebScraper {
pub fn new() -> Self {
let client = Client::builder()
.timeout(Duration::from_secs(30))
.user_agent("Mozilla/5.0 (compatible; WebScraper/1.0)")
.build()
.expect("Failed to create HTTP client");
WebScraper {
client,
retry_config: RetryConfig::default(),
}
}
pub async fn scrape_json_data(&self, url: &str) -> Result<Value, Box<dyn std::error::Error>> {
let response = retry_request(
|| self.client.get(url).send(),
self.retry_config.clone(),
).await?;
let json: Value = response.json().await?;
Ok(json)
}
pub async fn scrape_multiple_pages(&self, urls: Vec<&str>) -> Vec<Result<String, Box<dyn std::error::Error>>> {
let mut results = Vec::new();
for url in urls {
let result = async {
let response = retry_request(
|| self.client.get(url).send(),
self.retry_config.clone(),
).await?;
let body = response.text().await?;
Ok::<String, Box<dyn std::error::Error>>(body)
}.await;
results.push(result);
// Add delay between requests to be respectful
sleep(Duration::from_millis(500)).await;
}
results
}
}
Best Practices for Retry Logic
- Use Exponential Backoff: Gradually increase delays between retries to avoid overwhelming servers
- Set Maximum Retry Limits: Prevent infinite retry loops that could waste resources
- Respect Rate Limiting: Honor
Retry-After
headers and implement appropriate delays - Log Retry Attempts: Keep track of retry attempts for debugging and monitoring
- Handle Different Error Types: Distinguish between retryable and non-retryable errors
- Use Jitter: Add randomness to delays to prevent thundering herd problems
Similar to how to handle errors in Puppeteer, proper error handling and retry mechanisms are crucial for building reliable web scraping applications.
Monitoring and Logging
Implement comprehensive logging for your retry logic:
use log::{info, warn, error};
async fn fetch_with_logging(url: &str, max_retries: u32) -> Result<reqwest::Response, reqwest::Error> {
let client = reqwest::Client::new();
let mut attempt = 0;
loop {
info!("Attempting request to {} (attempt {})", url, attempt + 1);
match client.get(url).send().await {
Ok(response) => {
if response.status().is_success() {
info!("Successfully fetched {} after {} attempts", url, attempt + 1);
return Ok(response);
} else {
warn!("Request to {} failed with status: {}", url, response.status());
if attempt >= max_retries {
error!("Max retries exceeded for {}", url);
return Ok(response);
}
}
}
Err(e) => {
warn!("Request to {} failed with error: {}", url, e);
if attempt >= max_retries {
error!("Max retries exceeded for {} due to error: {}", url, e);
return Err(e);
}
}
}
attempt += 1;
let delay = Duration::from_millis(1000 * 2_u64.pow(attempt));
info!("Retrying {} in {:?}", url, delay);
sleep(delay).await;
}
}
Conclusion
Implementing robust retry logic with Reqwest is essential for building reliable web scraping applications. Whether you choose to implement custom retry mechanisms or use external crates like tokio-retry
, the key is to handle different types of errors appropriately, respect server limitations, and implement proper backoff strategies. By following these patterns and best practices, your Rust applications will be better equipped to handle the challenges of real-world web scraping scenarios.