While reqwest
is the most popular HTTP client for Rust web scraping, several excellent alternatives offer different features, performance characteristics, and complexity levels. Here's a comprehensive guide to the best reqwest alternatives for web scraping in Rust.
Quick Comparison Table
| Library | Async/Sync | Complexity | Best For | Binary Size | |---------|------------|------------|----------|-------------| | ureq | Sync only | Low | Simple scripts, CLI tools | Small | | surf | Async only | Medium | Cross-runtime compatibility | Medium | | isahc | Both | Medium | curl features, HTTP/2 | Medium | | hyper | Async only | High | Performance-critical apps | Small | | attohttpc | Sync only | Low | Minimal dependencies | Very small |
1. ureq - Simple and Synchronous
Best for: Simple scripts, command-line tools, and applications where async isn't needed.
Pros: Zero async dependencies, small binary size, simple API
Cons: Blocking operations, no HTTP/2 support
// Cargo.toml
// [dependencies]
// ureq = { version = "2.9", features = ["json"] }
// serde = { version = "1.0", features = ["derive"] }
use ureq;
use serde::Deserialize;
#[derive(Deserialize)]
struct ApiResponse {
title: String,
body: String,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Basic GET request
let response = ureq::get("https://httpbin.org/get")
.set("User-Agent", "my-scraper/1.0")
.call()?;
let html = response.into_string()?;
println!("Response: {}", html);
// JSON handling
let json_response: ApiResponse = ureq::get("https://jsonplaceholder.typicode.com/posts/1")
.call()?
.into_json()?;
println!("Title: {}", json_response.title);
// POST with form data
let form_response = ureq::post("https://httpbin.org/post")
.send_form(&[
("key1", "value1"),
("key2", "value2"),
])?;
Ok(())
}
2. surf - Framework-Agnostic Async
Best for: Applications that need to work with different async runtimes (tokio, async-std).
Pros: Runtime-agnostic, clean API, good middleware support
Cons: Less mature ecosystem, limited HTTP/2 support
// Cargo.toml
// [dependencies]
// surf = "2.3"
// async-std = { version = "1.12", features = ["attributes"] }
// serde = { version = "1.0", features = ["derive"] }
use surf::{Client, Config, Url};
use serde::Deserialize;
use std::time::Duration;
#[derive(Deserialize)]
struct Quote {
text: String,
author: String,
}
#[async_std::main]
async fn main() -> surf::Result<()> {
// Create a configured client
let client: Client = Config::new()
.set_timeout(Some(Duration::from_secs(30)))
.try_into()?;
// Basic scraping with custom headers
let mut response = client
.get("https://quotes.toscrape.com/")
.header("User-Agent", "Mozilla/5.0 (compatible; MyBot/1.0)")
.header("Accept", "text/html,application/xhtml+xml")
.await?;
let html = response.body_string().await?;
println!("Scraped {} characters", html.len());
// Handle JSON APIs
let quotes: Vec<Quote> = surf::get("https://api.quotable.io/quotes")
.await?
.body_json()
.await?;
for quote in quotes.iter().take(3) {
println!("\"{}\" - {}", quote.text, quote.author);
}
// POST request with JSON
let new_post = surf::post("https://jsonplaceholder.typicode.com/posts")
.body_json(&serde_json::json!({
"title": "My Post",
"body": "This is the body",
"userId": 1
}))?
.await?;
println!("Status: {}", new_post.status());
Ok(())
}
3. isahc - libcurl-Powered HTTP Client
Best for: Applications needing advanced HTTP features or curl compatibility.
Pros: Built on proven libcurl, supports HTTP/2, both sync and async APIs
Cons: Larger dependency footprint, requires libcurl
// Cargo.toml
// [dependencies]
// isahc = { version = "1.7", features = ["json"] }
// tokio = { version = "1.0", features = ["full"] }
use isahc::{AsyncReadResponseExt, HttpClient, Request};
use isahc::config::{RedirectPolicy, VersionNegotiation};
use std::time::Duration;
#[tokio::main]
async fn main() -> Result<(), isahc::Error> {
// Create a configured client
let client = HttpClient::builder()
.timeout(Duration::from_secs(30))
.redirect_policy(RedirectPolicy::Follow)
.version_negotiation(VersionNegotiation::http2())
.build()?;
// Async scraping with custom configuration
let request = Request::get("https://httpbin.org/headers")
.header("User-Agent", "isahc-scraper/1.0")
.body(())?;
let mut response = client.send_async(request).await?;
let headers_info = response.text().await?;
println!("Headers response: {}", headers_info);
// Sync API example
let sync_response = isahc::get("https://httpbin.org/ip")?;
let ip_info = sync_response.text()?;
println!("IP info: {}", ip_info);
// File download example
let mut file_response = isahc::get_async("https://httpbin.org/robots.txt").await?;
let robots_txt = file_response.text().await?;
println!("Robots.txt:\n{}", robots_txt);
Ok(())
}
4. hyper - Low-Level Performance
Best for: High-performance applications, custom HTTP implementations, when you need maximum control.
Pros: Fastest performance, low-level control, minimal dependencies
Cons: Complex API, requires more boilerplate code
// Cargo.toml
// [dependencies]
// hyper = { version = "1.0", features = ["full"] }
// hyper-util = { version = "0.1", features = ["full"] }
// tokio = { version = "1.0", features = ["full"] }
use hyper::body::Incoming;
use hyper::{Request, Response, StatusCode};
use hyper_util::client::legacy::Client;
use hyper_util::rt::TokioExecutor;
use std::collections::HashMap;
type Result<T> = std::result::Result<T, Box<dyn std::error::Error + Send + Sync>>;
#[tokio::main]
async fn main() -> Result<()> {
let client = Client::builder(TokioExecutor::new()).build_http();
// GET request with custom headers
let req = Request::builder()
.method("GET")
.uri("https://httpbin.org/user-agent")
.header("User-Agent", "hyper-scraper/1.0")
.body(String::new())?;
let res = client.request(req).await?;
let status = res.status();
let body_bytes = hyper::body::to_bytes(res.into_body()).await?;
let body = String::from_utf8(body_bytes.to_vec())?;
println!("Status: {}", status);
println!("Response: {}", body);
// Multiple concurrent requests
let urls = vec![
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/2",
"https://httpbin.org/delay/3",
];
let futures: Vec<_> = urls.into_iter().map(|url| {
let client = client.clone();
async move {
let req = Request::builder()
.uri(url)
.body(String::new())?;
let res = client.request(req).await?;
let body_bytes = hyper::body::to_bytes(res.into_body()).await?;
Ok::<_, Box<dyn std::error::Error + Send + Sync>>(body_bytes.len())
}
}).collect();
let results = futures::future::join_all(futures).await;
println!("Concurrent requests completed: {:?}", results);
Ok(())
}
5. attohttpc - Minimal Dependencies
Best for: Applications where binary size and dependency count matter most.
Pros: Minimal dependencies, small binary size, simple API
Cons: Sync only, limited features
// Cargo.toml
// [dependencies]
// attohttpc = { version = "0.24", features = ["json"] }
use attohttpc;
use std::collections::HashMap;
fn main() -> attohttpc::Result<()> {
// Simple GET request
let response = attohttpc::get("https://httpbin.org/get")
.header("User-Agent", "attohttpc-scraper/1.0")
.send()?;
println!("Status: {}", response.status());
let text = response.text()?;
println!("Body length: {}", text.len());
// JSON handling
let json_response: HashMap<String, serde_json::Value> =
attohttpc::get("https://httpbin.org/json")
.send()?
.json()?;
println!("JSON keys: {:?}", json_response.keys().collect::<Vec<_>>());
// POST with form data
let form_response = attohttpc::post("https://httpbin.org/post")
.form(&[("key", "value")])?
.send()?;
println!("Form POST status: {}", form_response.status());
Ok(())
}
Choosing the Right Alternative
Use ureq when:
- Building CLI tools or simple scripts
- You don't need async/await
- Binary size matters
- You want minimal dependencies
Use surf when:
- You need async support but want runtime flexibility
- Building web services that might switch between tokio/async-std
- You prefer a high-level, clean API
Use isahc when:
- You need advanced HTTP features (HTTP/2, custom SSL)
- You want both sync and async APIs
- You're migrating from curl-based solutions
Use hyper when:
- Performance is critical
- You're building HTTP infrastructure
- You need fine-grained control over the HTTP stack
- You're comfortable with low-level APIs
Use attohttpc when:
- You're building embedded applications
- Dependency count is critical
- You need a simple, no-frills HTTP client
Performance Considerations
For web scraping workloads:
- Concurrent requests: hyper > surf ≈ isahc > ureq > attohttpc
- Memory usage: attohttpc < hyper < ureq < surf < isahc
- Binary size: attohttpc < hyper < ureq < surf < isahc
- Ease of use: ureq ≈ attohttpc > surf > isahc > hyper
Choose based on your specific requirements for performance, simplicity, and feature set.