How to Implement Retry Logic for Failed Requests in Rust?
When building robust web scraping or API client applications in Rust, implementing proper retry logic is essential for handling transient network failures, rate limiting, and server errors. This comprehensive guide covers various approaches to implement retry mechanisms in Rust, from basic implementations to advanced patterns using popular crates.
Why Implement Retry Logic?
Retry logic is crucial for: - Network resilience: Handling temporary connectivity issues - Rate limiting: Respecting API limits with backoff strategies - Server errors: Recovering from temporary server issues (5xx errors) - Improved reliability: Reducing application failures due to transient issues
Basic Retry Implementation
Let's start with a simple retry mechanism using Rust's standard library:
use std::time::Duration;
use tokio::time::sleep;
async fn retry_with_backoff<F, T, E, Fut>(
mut operation: F,
max_retries: usize,
base_delay: Duration,
) -> Result<T, E>
where
F: FnMut() -> Fut,
Fut: std::future::Future<Output = Result<T, E>>,
{
let mut attempts = 0;
loop {
match operation().await {
Ok(result) => return Ok(result),
Err(err) => {
attempts += 1;
if attempts >= max_retries {
return Err(err);
}
// Exponential backoff
let delay = base_delay * 2_u32.pow(attempts as u32 - 1);
sleep(delay).await;
}
}
}
}
// Example usage
async fn fetch_data() -> Result<String, reqwest::Error> {
let client = reqwest::Client::new();
retry_with_backoff(
|| client.get("https://api.example.com/data").send().and_then(|resp| resp.text()),
3,
Duration::from_millis(100),
).await
}
Advanced Retry with Custom Error Handling
For more sophisticated retry logic, you can implement conditional retries based on error types:
use reqwest::{Error, StatusCode};
use std::time::Duration;
use tokio::time::sleep;
#[derive(Debug)]
pub enum RetryError {
MaxRetriesExceeded,
NonRetryableError(Error),
}
#[derive(Clone)]
pub struct RetryConfig {
pub max_retries: usize,
pub base_delay: Duration,
pub max_delay: Duration,
pub backoff_multiplier: f64,
}
impl Default for RetryConfig {
fn default() -> Self {
Self {
max_retries: 3,
base_delay: Duration::from_millis(100),
max_delay: Duration::from_secs(30),
backoff_multiplier: 2.0,
}
}
}
pub async fn retry_request<F, Fut>(
operation: F,
config: RetryConfig,
) -> Result<reqwest::Response, RetryError>
where
F: Fn() -> Fut,
Fut: std::future::Future<Output = Result<reqwest::Response, Error>>,
{
let mut attempts = 0;
let mut delay = config.base_delay;
loop {
match operation().await {
Ok(response) => {
// Check if response status indicates we should retry
if should_retry_status(response.status()) {
attempts += 1;
if attempts >= config.max_retries {
return Err(RetryError::MaxRetriesExceeded);
}
} else {
return Ok(response);
}
}
Err(err) => {
if !should_retry_error(&err) {
return Err(RetryError::NonRetryableError(err));
}
attempts += 1;
if attempts >= config.max_retries {
return Err(RetryError::MaxRetriesExceeded);
}
}
}
// Apply exponential backoff with jitter
sleep(delay).await;
delay = std::cmp::min(
Duration::from_secs_f64(delay.as_secs_f64() * config.backoff_multiplier),
config.max_delay,
);
}
}
fn should_retry_status(status: StatusCode) -> bool {
matches!(
status.as_u16(),
429 | // Too Many Requests
500..=599 // Server errors
)
}
fn should_retry_error(error: &Error) -> bool {
error.is_timeout() || error.is_connect() || error.is_request()
}
Using the tokio-retry
Crate
For production applications, consider using the tokio-retry
crate, which provides robust retry mechanisms:
First, add it to your Cargo.toml
:
[dependencies]
tokio-retry = "0.3"
reqwest = { version = "0.11", features = ["json"] }
tokio = { version = "1.0", features = ["full"] }
Then implement retry logic:
use tokio_retry::{strategy::ExponentialBackoff, Retry};
use reqwest::{Client, Error};
use std::time::Duration;
async fn fetch_with_retry(url: &str) -> Result<String, Error> {
let client = Client::new();
let retry_strategy = ExponentialBackoff::from_millis(100)
.max_delay(Duration::from_secs(60))
.take(5); // Maximum 5 attempts
Retry::spawn(retry_strategy, || async {
let response = client.get(url).send().await?;
// Only retry on certain status codes
if response.status().is_server_error() || response.status() == 429 {
return Err(reqwest::Error::from(response.error_for_status().unwrap_err()));
}
response.text().await
}).await
}
// Usage example
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let data = fetch_with_retry("https://api.example.com/data").await?;
println!("Fetched data: {}", data);
Ok(())
}
Circuit Breaker Pattern
For advanced error handling, implement a circuit breaker pattern to prevent cascading failures:
use std::sync::{Arc, Mutex};
use std::time::{Duration, Instant};
#[derive(Debug, Clone)]
pub enum CircuitState {
Closed,
Open,
HalfOpen,
}
pub struct CircuitBreaker {
state: Arc<Mutex<CircuitState>>,
failure_count: Arc<Mutex<usize>>,
last_failure_time: Arc<Mutex<Option<Instant>>>,
failure_threshold: usize,
timeout: Duration,
}
impl CircuitBreaker {
pub fn new(failure_threshold: usize, timeout: Duration) -> Self {
Self {
state: Arc::new(Mutex::new(CircuitState::Closed)),
failure_count: Arc::new(Mutex::new(0)),
last_failure_time: Arc::new(Mutex::new(None)),
failure_threshold,
timeout,
}
}
pub async fn call<F, T, E>(&self, operation: F) -> Result<T, String>
where
F: FnOnce() -> Result<T, E>,
E: std::fmt::Debug,
{
// Check circuit state
if self.is_open() {
// Circuit is open, fail fast
return Err("Circuit breaker is open".to_string());
}
match operation() {
Ok(result) => {
self.on_success();
Ok(result)
}
Err(_err) => {
self.on_failure();
Err("Operation failed".to_string())
}
}
}
fn is_open(&self) -> bool {
let state = self.state.lock().unwrap();
match *state {
CircuitState::Open => {
let last_failure = self.last_failure_time.lock().unwrap();
if let Some(time) = *last_failure {
if time.elapsed() > self.timeout {
drop(state);
*self.state.lock().unwrap() = CircuitState::HalfOpen;
false
} else {
true
}
} else {
false
}
}
_ => false,
}
}
fn on_success(&self) {
let mut state = self.state.lock().unwrap();
let mut failure_count = self.failure_count.lock().unwrap();
*state = CircuitState::Closed;
*failure_count = 0;
}
fn on_failure(&self) {
let mut failure_count = self.failure_count.lock().unwrap();
*failure_count += 1;
if *failure_count >= self.failure_threshold {
let mut state = self.state.lock().unwrap();
let mut last_failure = self.last_failure_time.lock().unwrap();
*state = CircuitState::Open;
*last_failure = Some(Instant::now());
}
}
}
Web Scraping with Retry Logic
When building web scrapers, robust retry mechanisms are essential. Similar to how to handle timeouts in Puppeteer, implementing proper retry logic helps deal with unreliable network conditions:
use reqwest::{Client, header::{HeaderMap, HeaderValue, USER_AGENT}};
use tokio::time::Duration;
use scraper::{Html, Selector};
pub struct WebScraper {
client: Client,
retry_config: RetryConfig,
}
impl WebScraper {
pub fn new() -> Self {
let mut headers = HeaderMap::new();
headers.insert(
USER_AGENT,
HeaderValue::from_static("Mozilla/5.0 (compatible; WebScraper/1.0)")
);
let client = Client::builder()
.default_headers(headers)
.timeout(Duration::from_secs(30))
.build()
.unwrap();
Self {
client,
retry_config: RetryConfig::default(),
}
}
pub async fn scrape_with_retry(&self, url: &str) -> Result<Vec<String>, Box<dyn std::error::Error>> {
let response = retry_request(
|| self.client.get(url).send(),
self.retry_config.clone()
).await
.map_err(|_| "Failed to fetch after retries")?;
let html_content = response.text().await?;
let document = Html::parse_document(&html_content);
let selector = Selector::parse("h1, h2, h3").unwrap();
let headings: Vec<String> = document
.select(&selector)
.map(|element| element.text().collect::<String>())
.collect();
Ok(headings)
}
}
Rate Limiting Integration
Combine retry logic with rate limiting for responsible web scraping:
use std::collections::VecDeque;
use std::time::{Duration, Instant};
use tokio::time::sleep;
pub struct RateLimiter {
requests: VecDeque<Instant>,
max_requests: usize,
window: Duration,
}
impl RateLimiter {
pub fn new(max_requests: usize, window: Duration) -> Self {
Self {
requests: VecDeque::new(),
max_requests,
window,
}
}
pub async fn acquire(&mut self) {
let now = Instant::now();
// Remove old requests outside the window
while let Some(&front) = self.requests.front() {
if now.duration_since(front) > self.window {
self.requests.pop_front();
} else {
break;
}
}
// If we're at the limit, wait
if self.requests.len() >= self.max_requests {
if let Some(&front) = self.requests.front() {
let wait_time = self.window.saturating_sub(now.duration_since(front));
if wait_time > Duration::ZERO {
sleep(wait_time).await;
}
}
}
self.requests.push_back(now);
}
}
Handling Different Error Types
Create a comprehensive error handling system that categorizes different types of failures:
use std::fmt;
#[derive(Debug)]
pub enum ScrapingError {
Network(reqwest::Error),
Parse(String),
RateLimit,
Timeout,
AuthenticationRequired,
NonRetryable(String),
}
impl fmt::Display for ScrapingError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
ScrapingError::Network(e) => write!(f, "Network error: {}", e),
ScrapingError::Parse(e) => write!(f, "Parse error: {}", e),
ScrapingError::RateLimit => write!(f, "Rate limit exceeded"),
ScrapingError::Timeout => write!(f, "Request timeout"),
ScrapingError::AuthenticationRequired => write!(f, "Authentication required"),
ScrapingError::NonRetryable(e) => write!(f, "Non-retryable error: {}", e),
}
}
}
impl std::error::Error for ScrapingError {}
impl ScrapingError {
pub fn is_retryable(&self) -> bool {
matches!(
self,
ScrapingError::Network(_) |
ScrapingError::RateLimit |
ScrapingError::Timeout
)
}
}
// Enhanced retry function with error categorization
pub async fn smart_retry<F, Fut, T>(
operation: F,
config: RetryConfig,
) -> Result<T, ScrapingError>
where
F: Fn() -> Fut,
Fut: std::future::Future<Output = Result<T, ScrapingError>>,
{
let mut attempts = 0;
let mut delay = config.base_delay;
loop {
match operation().await {
Ok(result) => return Ok(result),
Err(err) => {
if !err.is_retryable() {
return Err(err);
}
attempts += 1;
if attempts >= config.max_retries {
return Err(err);
}
// Adjust delay based on error type
let actual_delay = match err {
ScrapingError::RateLimit => delay * 3, // Longer delay for rate limits
_ => delay,
};
tokio::time::sleep(actual_delay).await;
delay = std::cmp::min(
Duration::from_secs_f64(delay.as_secs_f64() * config.backoff_multiplier),
config.max_delay,
);
}
}
}
}
Testing Retry Logic
Create comprehensive tests for your retry mechanisms:
#[cfg(test)]
mod tests {
use super::*;
use mockito::{mock, Matcher};
use tokio_test;
#[tokio::test]
async fn test_retry_on_server_error() {
let _m = mock("GET", "/test")
.with_status(500)
.with_header("content-type", "text/plain")
.with_body("Server Error")
.expect(3) // Should be called 3 times
.create();
let url = &mockito::server_url();
let result = fetch_with_retry(&format!("{}/test", url)).await;
assert!(result.is_err());
}
#[tokio::test]
async fn test_successful_retry() {
let _m1 = mock("GET", "/test")
.with_status(500)
.expect(2)
.create();
let _m2 = mock("GET", "/test")
.with_status(200)
.with_body("Success")
.expect(1)
.create();
let url = &mockito::server_url();
let result = fetch_with_retry(&format!("{}/test", url)).await;
assert!(result.is_ok());
assert_eq!(result.unwrap(), "Success");
}
#[tokio::test]
async fn test_circuit_breaker() {
let cb = CircuitBreaker::new(2, Duration::from_millis(100));
// First failure
let result1 = cb.call(|| -> Result<(), &str> { Err("error") }).await;
assert!(result1.is_err());
// Second failure should open circuit
let result2 = cb.call(|| -> Result<(), &str> { Err("error") }).await;
assert!(result2.is_err());
// Third call should fail fast (circuit open)
let result3 = cb.call(|| -> Result<(), &str> { Ok(()) }).await;
assert!(result3.is_err());
}
}
Best Practices
- Configure appropriate timeouts: Set reasonable timeouts for both individual requests and overall operations
- Use exponential backoff: Implement exponential backoff with jitter to avoid thundering herd problems
- Limit retry attempts: Set maximum retry limits to prevent infinite loops
- Log retry attempts: Include comprehensive logging for debugging and monitoring
- Handle different error types: Distinguish between retryable and non-retryable errors
- Consider circuit breakers: Use circuit breaker patterns for high-traffic applications
- Test thoroughly: Create comprehensive tests covering various failure scenarios
- Monitor metrics: Track retry rates, success rates, and error patterns
Console Commands for Testing
Test your retry implementation with these commands:
# Test with a mock server that returns errors
cargo test test_retry_logic
# Run with debug logging
RUST_LOG=debug cargo run
# Benchmark retry performance
cargo bench --bench retry_benchmarks
# Run integration tests
cargo test --test integration_tests
# Check code coverage
cargo tarpaulin --out Html
Production Monitoring
Implement comprehensive monitoring for your retry logic:
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
#[derive(Default)]
pub struct RetryMetrics {
pub total_requests: AtomicU64,
pub successful_requests: AtomicU64,
pub failed_requests: AtomicU64,
pub retry_attempts: AtomicU64,
}
impl RetryMetrics {
pub fn record_request(&self) {
self.total_requests.fetch_add(1, Ordering::Relaxed);
}
pub fn record_success(&self) {
self.successful_requests.fetch_add(1, Ordering::Relaxed);
}
pub fn record_failure(&self) {
self.failed_requests.fetch_add(1, Ordering::Relaxed);
}
pub fn record_retry(&self) {
self.retry_attempts.fetch_add(1, Ordering::Relaxed);
}
pub fn success_rate(&self) -> f64 {
let total = self.total_requests.load(Ordering::Relaxed);
if total == 0 {
return 0.0;
}
let successful = self.successful_requests.load(Ordering::Relaxed);
successful as f64 / total as f64
}
}
Conclusion
Implementing robust retry logic in Rust requires careful consideration of error types, backoff strategies, and rate limiting. Whether using custom implementations or established crates like tokio-retry
, the key is to balance resilience with performance while respecting server resources.
The examples provided offer a foundation for building production-ready retry mechanisms that can handle the challenges of modern web scraping and API integration. Much like how to handle errors in Puppeteer, proper error handling and retry logic are essential components of any reliable web automation system.
Remember to always test your retry logic thoroughly, monitor its behavior in production, and adjust your strategies based on real-world performance data. With Rust's type safety and performance characteristics, you can build highly reliable systems that gracefully handle network failures and provide excellent user experiences.