Table of contents

What is the 'stream' parameter in Requests, and when should I use it?

The stream parameter in Python's requests library controls how response content is downloaded and handled. When set to True, it enables chunked, memory-efficient data processing instead of loading entire responses into memory at once.

How Stream Parameter Works

Default Behavior (stream=False): - Downloads entire response content immediately - Stores complete data in memory before returning Response object - Simple but memory-intensive for large files

Streaming Behavior (stream=True): - Returns Response object immediately without downloading content - Downloads data on-demand as you iterate through it - Memory-efficient for large files and real-time data

Basic File Download Example

import requests

# Download large file with streaming
response = requests.get('https://example.com/largefile.zip', stream=True)

if response.status_code == 200:
    with open('largefile.zip', 'wb') as file:
        for chunk in response.iter_content(chunk_size=8192):
            file.write(chunk)

Advanced Examples

Download with Progress Tracking

import requests
from tqdm import tqdm

def download_with_progress(url, filename):
    response = requests.get(url, stream=True)
    total_size = int(response.headers.get('content-length', 0))

    with open(filename, 'wb') as file, tqdm(
        desc=filename,
        total=total_size,
        unit='B',
        unit_scale=True,
        unit_divisor=1024,
    ) as progress_bar:
        for chunk in response.iter_content(chunk_size=8192):
            size = file.write(chunk)
            progress_bar.update(size)

download_with_progress('https://example.com/file.zip', 'file.zip')

Line-by-Line Text Processing

import requests

response = requests.get('https://example.com/logfile.txt', stream=True)

# Process large text files line by line
for line in response.iter_lines(decode_unicode=True):
    if line:  # Filter out empty lines
        process_log_line(line)

JSON Streaming API

import requests
import json

def stream_json_api(url):
    response = requests.get(url, stream=True)

    for line in response.iter_lines():
        if line:
            try:
                data = json.loads(line.decode('utf-8'))
                yield data
            except json.JSONDecodeError:
                continue

# Process streaming JSON data
for item in stream_json_api('https://api.example.com/stream'):
    handle_data(item)

When to Use Stream Parameter

✅ Use stream=True when:

  1. Large File Downloads - Files that exceed available memory
  2. Progress Tracking - Need to show download progress to users
  3. Real-time Data - Streaming APIs or live data feeds
  4. Memory Constraints - Limited memory environments
  5. Processing on the Fly - Data processing during download

❌ Avoid stream=True when:

  1. Small Responses - Files under a few MB where memory isn't a concern
  2. Simple API Calls - JSON responses that fit comfortably in memory
  3. Response Processing - When you need the complete response for parsing

Important Considerations

Connection Management

import requests

# Always use context manager or explicitly close
response = requests.get('https://example.com/file', stream=True)
try:
    for chunk in response.iter_content(chunk_size=8192):
        process_chunk(chunk)
finally:
    response.close()  # Important: close the connection

# Or use with statement (recommended)
with requests.get('https://example.com/file', stream=True) as response:
    for chunk in response.iter_content(chunk_size=8192):
        process_chunk(chunk)

Optimal Chunk Sizes

# Different chunk sizes for different use cases
response = requests.get(url, stream=True)

# Small chunks for real-time processing
for chunk in response.iter_content(chunk_size=1024):  # 1KB
    process_immediately(chunk)

# Larger chunks for file downloads
for chunk in response.iter_content(chunk_size=65536):  # 64KB
    write_to_file(chunk)

Error Handling

import requests
from requests.exceptions import RequestException

def safe_download(url, filename):
    try:
        with requests.get(url, stream=True, timeout=30) as response:
            response.raise_for_status()

            with open(filename, 'wb') as file:
                for chunk in response.iter_content(chunk_size=8192):
                    if chunk:  # Filter out keep-alive chunks
                        file.write(chunk)

    except RequestException as e:
        print(f"Download failed: {e}")
        return False
    return True

Key Methods for Streaming

  • iter_content(chunk_size=1) - Iterate over response content in chunks
  • iter_lines(chunk_size=512) - Iterate over response lines
  • raw.read(amt=None) - Read raw bytes directly from urllib3 response

The stream parameter is essential for building memory-efficient, scalable applications that handle large files or real-time data streams.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon