What are the alternatives to Reqwest for web scraping in Rust?

While reqwest is the most popular HTTP client for Rust web scraping, several excellent alternatives offer different features, performance characteristics, and complexity levels. Here's a comprehensive guide to the best reqwest alternatives for web scraping in Rust.

Quick Comparison Table

| Library | Async/Sync | Complexity | Best For | Binary Size | |---------|------------|------------|----------|-------------| | ureq | Sync only | Low | Simple scripts, CLI tools | Small | | surf | Async only | Medium | Cross-runtime compatibility | Medium | | isahc | Both | Medium | curl features, HTTP/2 | Medium | | hyper | Async only | High | Performance-critical apps | Small | | attohttpc | Sync only | Low | Minimal dependencies | Very small |

1. ureq - Simple and Synchronous

Best for: Simple scripts, command-line tools, and applications where async isn't needed.

Pros: Zero async dependencies, small binary size, simple API
Cons: Blocking operations, no HTTP/2 support

// Cargo.toml
// [dependencies]
// ureq = { version = "2.9", features = ["json"] }
// serde = { version = "1.0", features = ["derive"] }

use ureq;
use serde::Deserialize;

#[derive(Deserialize)]
struct ApiResponse {
    title: String,
    body: String,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Basic GET request
    let response = ureq::get("https://httpbin.org/get")
        .set("User-Agent", "my-scraper/1.0")
        .call()?;

    let html = response.into_string()?;
    println!("Response: {}", html);

    // JSON handling
    let json_response: ApiResponse = ureq::get("https://jsonplaceholder.typicode.com/posts/1")
        .call()?
        .into_json()?;

    println!("Title: {}", json_response.title);

    // POST with form data
    let form_response = ureq::post("https://httpbin.org/post")
        .send_form(&[
            ("key1", "value1"),
            ("key2", "value2"),
        ])?;

    Ok(())
}

2. surf - Framework-Agnostic Async

Best for: Applications that need to work with different async runtimes (tokio, async-std).

Pros: Runtime-agnostic, clean API, good middleware support
Cons: Less mature ecosystem, limited HTTP/2 support

// Cargo.toml
// [dependencies]
// surf = "2.3"
// async-std = { version = "1.12", features = ["attributes"] }
// serde = { version = "1.0", features = ["derive"] }

use surf::{Client, Config, Url};
use serde::Deserialize;
use std::time::Duration;

#[derive(Deserialize)]
struct Quote {
    text: String,
    author: String,
}

#[async_std::main]
async fn main() -> surf::Result<()> {
    // Create a configured client
    let client: Client = Config::new()
        .set_timeout(Some(Duration::from_secs(30)))
        .try_into()?;

    // Basic scraping with custom headers
    let mut response = client
        .get("https://quotes.toscrape.com/")
        .header("User-Agent", "Mozilla/5.0 (compatible; MyBot/1.0)")
        .header("Accept", "text/html,application/xhtml+xml")
        .await?;

    let html = response.body_string().await?;
    println!("Scraped {} characters", html.len());

    // Handle JSON APIs
    let quotes: Vec<Quote> = surf::get("https://api.quotable.io/quotes")
        .await?
        .body_json()
        .await?;

    for quote in quotes.iter().take(3) {
        println!("\"{}\" - {}", quote.text, quote.author);
    }

    // POST request with JSON
    let new_post = surf::post("https://jsonplaceholder.typicode.com/posts")
        .body_json(&serde_json::json!({
            "title": "My Post",
            "body": "This is the body",
            "userId": 1
        }))?
        .await?;

    println!("Status: {}", new_post.status());

    Ok(())
}

3. isahc - libcurl-Powered HTTP Client

Best for: Applications needing advanced HTTP features or curl compatibility.

Pros: Built on proven libcurl, supports HTTP/2, both sync and async APIs
Cons: Larger dependency footprint, requires libcurl

// Cargo.toml
// [dependencies]
// isahc = { version = "1.7", features = ["json"] }
// tokio = { version = "1.0", features = ["full"] }

use isahc::{AsyncReadResponseExt, HttpClient, Request};
use isahc::config::{RedirectPolicy, VersionNegotiation};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), isahc::Error> {
    // Create a configured client
    let client = HttpClient::builder()
        .timeout(Duration::from_secs(30))
        .redirect_policy(RedirectPolicy::Follow)
        .version_negotiation(VersionNegotiation::http2())
        .build()?;

    // Async scraping with custom configuration
    let request = Request::get("https://httpbin.org/headers")
        .header("User-Agent", "isahc-scraper/1.0")
        .body(())?;

    let mut response = client.send_async(request).await?;
    let headers_info = response.text().await?;
    println!("Headers response: {}", headers_info);

    // Sync API example
    let sync_response = isahc::get("https://httpbin.org/ip")?;
    let ip_info = sync_response.text()?;
    println!("IP info: {}", ip_info);

    // File download example
    let mut file_response = isahc::get_async("https://httpbin.org/robots.txt").await?;
    let robots_txt = file_response.text().await?;
    println!("Robots.txt:\n{}", robots_txt);

    Ok(())
}

4. hyper - Low-Level Performance

Best for: High-performance applications, custom HTTP implementations, when you need maximum control.

Pros: Fastest performance, low-level control, minimal dependencies
Cons: Complex API, requires more boilerplate code

// Cargo.toml
// [dependencies]
// hyper = { version = "1.0", features = ["full"] }
// hyper-util = { version = "0.1", features = ["full"] }
// tokio = { version = "1.0", features = ["full"] }

use hyper::body::Incoming;
use hyper::{Request, Response, StatusCode};
use hyper_util::client::legacy::Client;
use hyper_util::rt::TokioExecutor;
use std::collections::HashMap;

type Result<T> = std::result::Result<T, Box<dyn std::error::Error + Send + Sync>>;

#[tokio::main]
async fn main() -> Result<()> {
    let client = Client::builder(TokioExecutor::new()).build_http();

    // GET request with custom headers
    let req = Request::builder()
        .method("GET")
        .uri("https://httpbin.org/user-agent")
        .header("User-Agent", "hyper-scraper/1.0")
        .body(String::new())?;

    let res = client.request(req).await?;
    let status = res.status();
    let body_bytes = hyper::body::to_bytes(res.into_body()).await?;
    let body = String::from_utf8(body_bytes.to_vec())?;

    println!("Status: {}", status);
    println!("Response: {}", body);

    // Multiple concurrent requests
    let urls = vec![
        "https://httpbin.org/delay/1",
        "https://httpbin.org/delay/2",
        "https://httpbin.org/delay/3",
    ];

    let futures: Vec<_> = urls.into_iter().map(|url| {
        let client = client.clone();
        async move {
            let req = Request::builder()
                .uri(url)
                .body(String::new())?;
            let res = client.request(req).await?;
            let body_bytes = hyper::body::to_bytes(res.into_body()).await?;
            Ok::<_, Box<dyn std::error::Error + Send + Sync>>(body_bytes.len())
        }
    }).collect();

    let results = futures::future::join_all(futures).await;
    println!("Concurrent requests completed: {:?}", results);

    Ok(())
}

5. attohttpc - Minimal Dependencies

Best for: Applications where binary size and dependency count matter most.

Pros: Minimal dependencies, small binary size, simple API
Cons: Sync only, limited features

// Cargo.toml
// [dependencies]
// attohttpc = { version = "0.24", features = ["json"] }

use attohttpc;
use std::collections::HashMap;

fn main() -> attohttpc::Result<()> {
    // Simple GET request
    let response = attohttpc::get("https://httpbin.org/get")
        .header("User-Agent", "attohttpc-scraper/1.0")
        .send()?;

    println!("Status: {}", response.status());
    let text = response.text()?;
    println!("Body length: {}", text.len());

    // JSON handling
    let json_response: HashMap<String, serde_json::Value> = 
        attohttpc::get("https://httpbin.org/json")
            .send()?
            .json()?;

    println!("JSON keys: {:?}", json_response.keys().collect::<Vec<_>>());

    // POST with form data
    let form_response = attohttpc::post("https://httpbin.org/post")
        .form(&[("key", "value")])?
        .send()?;

    println!("Form POST status: {}", form_response.status());

    Ok(())
}

Choosing the Right Alternative

Use ureq when:

  • Building CLI tools or simple scripts
  • You don't need async/await
  • Binary size matters
  • You want minimal dependencies

Use surf when:

  • You need async support but want runtime flexibility
  • Building web services that might switch between tokio/async-std
  • You prefer a high-level, clean API

Use isahc when:

  • You need advanced HTTP features (HTTP/2, custom SSL)
  • You want both sync and async APIs
  • You're migrating from curl-based solutions

Use hyper when:

  • Performance is critical
  • You're building HTTP infrastructure
  • You need fine-grained control over the HTTP stack
  • You're comfortable with low-level APIs

Use attohttpc when:

  • You're building embedded applications
  • Dependency count is critical
  • You need a simple, no-frills HTTP client

Performance Considerations

For web scraping workloads:

  1. Concurrent requests: hyper > surf ≈ isahc > ureq > attohttpc
  2. Memory usage: attohttpc < hyper < ureq < surf < isahc
  3. Binary size: attohttpc < hyper < ureq < surf < isahc
  4. Ease of use: ureq ≈ attohttpc > surf > isahc > hyper

Choose based on your specific requirements for performance, simplicity, and feature set.

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon