Table of contents

What is the best way to handle large file downloads with Reqwest?

When downloading large files with Reqwest, streaming the response is essential to avoid memory exhaustion and provide a responsive user experience. Instead of loading the entire file into memory, stream the content chunk by chunk directly to disk.

Basic Streaming Download

Here's a fundamental example of streaming a large file download:

use tokio::fs::File;
use tokio::io::AsyncWriteExt;
use reqwest::Client;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let url = "https://example.com/largefile.zip";
    let client = Client::new();

    let response = client.get(url).send().await?;

    if !response.status().is_success() {
        return Err(format!("HTTP error: {}", response.status()).into());
    }

    let mut file = File::create("largefile.zip").await?;
    let mut stream = response.bytes_stream();

    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        file.write_all(&chunk).await?;
    }

    file.flush().await?;
    println!("Download completed successfully");
    Ok(())
}

Advanced Download with Progress Tracking

For production applications, you'll want progress tracking and better error handling:

use tokio::fs::File;
use tokio::io::AsyncWriteExt;
use reqwest::Client;
use futures_util::StreamExt;
use std::time::Instant;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let url = "https://example.com/largefile.zip";
    let client = Client::new();

    // Send HEAD request to get file size
    let head_response = client.head(url).send().await?;
    let total_size = head_response
        .headers()
        .get(reqwest::header::CONTENT_LENGTH)
        .and_then(|val| val.to_str().ok())
        .and_then(|s| s.parse::<u64>().ok())
        .unwrap_or(0);

    // Start the actual download
    let response = client.get(url).send().await?;

    if !response.status().is_success() {
        return Err(format!("HTTP error: {}", response.status()).into());
    }

    let mut file = File::create("largefile.zip").await?;
    let mut stream = response.bytes_stream();
    let mut downloaded = 0u64;
    let start_time = Instant::now();

    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        file.write_all(&chunk).await?;

        downloaded += chunk.len() as u64;

        // Print progress every 1MB
        if downloaded % (1024 * 1024) == 0 || downloaded == total_size {
            let elapsed = start_time.elapsed().as_secs_f64();
            let speed = downloaded as f64 / elapsed / 1024.0 / 1024.0; // MB/s
            let progress = if total_size > 0 {
                (downloaded as f64 / total_size as f64 * 100.0) as u32
            } else {
                0
            };

            println!(
                "Downloaded: {:.1} MB | Progress: {}% | Speed: {:.1} MB/s",
                downloaded as f64 / 1024.0 / 1024.0,
                progress,
                speed
            );
        }
    }

    file.flush().await?;
    println!("Download completed in {:.2} seconds", start_time.elapsed().as_secs_f64());
    Ok(())
}

Download with Retry Logic

For unreliable network connections, implement retry logic:

use tokio::fs::File;
use tokio::io::AsyncWriteExt;
use reqwest::Client;
use futures_util::StreamExt;
use std::time::Duration;

async fn download_with_retry(
    url: &str,
    output_path: &str,
    max_retries: u32,
) -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::builder()
        .timeout(Duration::from_secs(30))
        .build()?;

    for attempt in 1..=max_retries {
        println!("Download attempt {} of {}", attempt, max_retries);

        match download_file(&client, url, output_path).await {
            Ok(_) => {
                println!("Download successful!");
                return Ok(());
            }
            Err(e) => {
                eprintln!("Attempt {} failed: {}", attempt, e);
                if attempt < max_retries {
                    tokio::time::sleep(Duration::from_secs(2_u64.pow(attempt - 1))).await;
                }
            }
        }
    }

    Err("All download attempts failed".into())
}

async fn download_file(
    client: &Client,
    url: &str,
    output_path: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    let response = client.get(url).send().await?;

    if !response.status().is_success() {
        return Err(format!("HTTP error: {}", response.status()).into());
    }

    let mut file = File::create(output_path).await?;
    let mut stream = response.bytes_stream();

    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        file.write_all(&chunk).await?;
    }

    file.flush().await?;
    Ok(())
}

Resumable Downloads

For very large files, implement resumable downloads using HTTP range requests:

use tokio::fs::{File, OpenOptions};
use tokio::io::AsyncWriteExt;
use reqwest::{Client, header};
use std::path::Path;

async fn resume_download(
    url: &str,
    output_path: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::new();

    // Check if partial file exists
    let start_pos = if Path::new(output_path).exists() {
        tokio::fs::metadata(output_path).await?.len()
    } else {
        0
    };

    let mut request = client.get(url);

    // Add Range header for resumable download
    if start_pos > 0 {
        request = request.header(
            header::RANGE,
            format!("bytes={}-", start_pos)
        );
        println!("Resuming download from byte {}", start_pos);
    }

    let response = request.send().await?;

    if !response.status().is_success() && response.status() != 206 {
        return Err(format!("HTTP error: {}", response.status()).into());
    }

    // Open file in append mode if resuming
    let mut file = if start_pos > 0 {
        OpenOptions::new().append(true).open(output_path).await?
    } else {
        File::create(output_path).await?
    };

    let mut stream = response.bytes_stream();
    let mut downloaded = start_pos;

    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        file.write_all(&chunk).await?;
        downloaded += chunk.len() as u64;

        if downloaded % (1024 * 1024) == 0 {
            println!("Downloaded: {:.1} MB", downloaded as f64 / 1024.0 / 1024.0);
        }
    }

    file.flush().await?;
    println!("Download completed! Total size: {:.1} MB", downloaded as f64 / 1024.0 / 1024.0);
    Ok(())
}

Required Dependencies

Add these dependencies to your Cargo.toml:

[dependencies]
reqwest = { version = "0.11", features = ["stream"] }
tokio = { version = "1", features = ["full"] }
futures-util = "0.3"

Key Benefits of Streaming

  1. Memory Efficiency: Only small chunks are held in memory at any time
  2. Progress Tracking: Real-time download progress and speed calculation
  3. Error Recovery: Ability to resume interrupted downloads
  4. Timeout Handling: Prevents hanging on slow connections
  5. Scalability: Can handle files of any size without memory constraints

Best Practices

  • Always use bytes_stream() instead of loading the entire response body
  • Implement proper error handling and retry logic for network failures
  • Add progress indicators for better user experience
  • Use appropriate timeouts to prevent hanging connections
  • Consider implementing resumable downloads for very large files
  • Flush the file buffer after writing to ensure data persistence

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon