Yes, Reqwest is Thread-Safe
The reqwest
library is fully thread-safe for concurrent scraping tasks. As Rust's most popular HTTP client, it's specifically designed to handle concurrent operations safely through Rust's ownership system and built-in concurrency primitives.
Key Thread Safety Features
1. Client Sharing
The reqwest::Client
can be safely shared across multiple threads using Arc<T>
(atomic reference counting):
use reqwest;
use std::sync::Arc;
use std::thread;
let client = Arc::new(reqwest::blocking::Client::new());
// Client can now be cloned and shared across threads
2. Internal Implementation
- Uses connection pooling with thread-safe mechanisms
- Built on
hyper
which provides async I/O guarantees - Leverages Rust's type system to prevent data races at compile time
3. Zero-Cost Abstractions
Thread safety comes with minimal performance overhead due to Rust's zero-cost abstractions.
Concurrent Scraping Examples
Blocking Client with Threads
use reqwest::blocking::Client;
use std::sync::Arc;
use std::thread;
use std::time::Duration;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create a shared client with custom configuration
let client = Arc::new(
Client::builder()
.timeout(Duration::from_secs(10))
.user_agent("Mozilla/5.0 (compatible; RustScraper/1.0)")
.build()?
);
let urls = vec![
"https://httpbin.org/json",
"https://httpbin.org/html",
"https://httpbin.org/xml",
"https://httpbin.org/robots.txt",
];
let mut handles = vec![];
for (i, url) in urls.into_iter().enumerate() {
let client = Arc::clone(&client);
let handle = thread::spawn(move || {
println!("Thread {} started for {}", i, url);
match client.get(url).send() {
Ok(response) => {
let status = response.status();
let content_length = response.content_length().unwrap_or(0);
println!("Thread {}: {} - Status: {}, Size: {} bytes",
i, url, status, content_length);
// Process response body if needed
if let Ok(text) = response.text() {
println!("Thread {}: Got {} characters", i, text.len());
}
}
Err(e) => eprintln!("Thread {}: Error - {}", i, e),
}
});
handles.push(handle);
}
// Wait for all threads to complete
for handle in handles {
if let Err(e) = handle.join() {
eprintln!("Thread panicked: {:?}", e);
}
}
Ok(())
}
Async Client with Tokio
use reqwest::Client;
use tokio;
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create async client with configuration
let client = Client::builder()
.timeout(Duration::from_secs(10))
.pool_max_idle_per_host(10)
.build()?;
let urls = vec![
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/3",
"https://httpbin.org/json",
];
// Spawn concurrent tasks
let tasks: Vec<_> = urls.into_iter().enumerate().map(|(i, url)| {
let client = client.clone(); // Cheap clone for async
tokio::spawn(async move {
println!("Task {} started for {}", i, url);
match client.get(url).send().await {
Ok(response) => {
let status = response.status();
println!("Task {}: {} - Status: {}", i, url, status);
// Parse JSON response example
if status.is_success() {
if let Ok(json) = response.json::<serde_json::Value>().await {
println!("Task {}: JSON keys: {:?}", i,
json.as_object().map(|o| o.keys().collect::<Vec<_>>()));
}
}
}
Err(e) => eprintln!("Task {}: Error - {}", i, e),
}
})
}).collect();
// Wait for all tasks to complete
for task in tasks {
if let Err(e) = task.await {
eprintln!("Task failed: {}", e);
}
}
Ok(())
}
Real-World Scraping Example
use reqwest::Client;
use tokio;
use serde::Deserialize;
use std::collections::HashMap;
#[derive(Deserialize, Debug)]
struct ApiResponse {
title: Option<String>,
status: String,
}
async fn scrape_with_rate_limiting() -> Result<(), Box<dyn std::error::Error>> {
let client = Client::builder()
.timeout(std::time::Duration::from_secs(30))
.build()?;
let urls = (1..=10).map(|i| format!("https://httpbin.org/json?page={}", i));
// Process in batches to avoid overwhelming the server
for batch in urls.collect::<Vec<_>>().chunks(3) {
let tasks: Vec<_> = batch.iter().map(|url| {
let client = client.clone();
let url = url.clone();
tokio::spawn(async move {
// Add delay to be respectful
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
let response = client
.get(&url)
.header("Accept", "application/json")
.send()
.await?;
let data: ApiResponse = response.json().await?;
Ok::<_, Box<dyn std::error::Error + Send + Sync>>((url, data))
})
}).collect();
// Await current batch
for task in tasks {
match task.await? {
Ok((url, data)) => println!("✓ {}: {:?}", url, data),
Err(e) => eprintln!("✗ Error: {}", e),
}
}
// Pause between batches
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
}
Ok(())
}
Best Practices for Concurrent Scraping
1. Use Connection Pooling
let client = Client::builder()
.pool_max_idle_per_host(10) // Reuse connections
.pool_idle_timeout(Duration::from_secs(30))
.build()?;
2. Configure Timeouts
let client = Client::builder()
.timeout(Duration::from_secs(10))
.connect_timeout(Duration::from_secs(5))
.build()?;
3. Implement Rate Limiting
use tokio::time::{sleep, Duration};
// Add delays between requests
sleep(Duration::from_millis(100)).await;
4. Handle Errors Gracefully
match client.get(url).send().await {
Ok(response) if response.status().is_success() => {
// Process successful response
}
Ok(response) => {
eprintln!("HTTP error: {}", response.status());
}
Err(e) if e.is_timeout() => {
eprintln!("Request timed out: {}", e);
}
Err(e) => {
eprintln!("Request failed: {}", e);
}
}
Performance Considerations
- Async is preferred for I/O-bound scraping tasks
- Connection pooling reduces overhead
- Batch processing prevents overwhelming target servers
- Resource limits prevent memory exhaustion with large-scale scraping
The reqwest
library's thread safety, combined with Rust's ownership model, makes it an excellent choice for building robust, concurrent web scrapers that are both safe and performant.