Yes, urllib3 supports streaming large files efficiently without loading the entire content into memory. This is crucial for handling large downloads, processing files with limited memory resources, or building memory-efficient applications.
Basic File Streaming
The key to streaming with urllib3 is using preload_content=False
in your request:
import urllib3
# Create a PoolManager instance
http = urllib3.PoolManager()
# Stream a large file
url = "https://example.com/largefile.zip"
response = http.request('GET', url, preload_content=False)
# Download in chunks
chunk_size = 8192 # 8KB chunks
with open('largefile.zip', 'wb') as out:
while True:
data = response.read(chunk_size)
if not data:
break
out.write(data)
# Always release the connection
response.release_conn()
Advanced Streaming with Progress Tracking
Here's a more robust example with progress tracking and error handling:
import urllib3
from urllib3.util.retry import Retry
def download_large_file(url, filename, chunk_size=8192):
# Configure retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504]
)
http = urllib3.PoolManager(retries=retry_strategy)
try:
# Get file size for progress tracking
head_response = http.request('HEAD', url)
total_size = int(head_response.headers.get('Content-Length', 0))
# Stream the file
response = http.request('GET', url, preload_content=False)
downloaded = 0
with open(filename, 'wb') as out:
while True:
data = response.read(chunk_size)
if not data:
break
out.write(data)
downloaded += len(data)
# Show progress
if total_size > 0:
progress = (downloaded / total_size) * 100
print(f"Downloaded: {progress:.1f}%", end='\r')
print(f"\nDownload completed: {filename}")
except urllib3.exceptions.HTTPError as e:
print(f"HTTP error occurred: {e}")
except Exception as e:
print(f"Error downloading file: {e}")
finally:
response.release_conn()
# Usage
download_large_file("https://example.com/largefile.zip", "local_file.zip")
Processing Streaming Data
You can also process streaming data without saving to disk:
import urllib3
import hashlib
def process_stream(url):
http = urllib3.PoolManager()
response = http.request('GET', url, preload_content=False)
# Example: Calculate MD5 hash while streaming
md5_hash = hashlib.md5()
total_bytes = 0
try:
for chunk in response.stream(1024):
md5_hash.update(chunk)
total_bytes += len(chunk)
print(f"File size: {total_bytes} bytes")
print(f"MD5 hash: {md5_hash.hexdigest()}")
finally:
response.release_conn()
Key Parameters and Best Practices
Chunk Size Selection
- Small files (< 1MB): 1024-4096 bytes
- Medium files (1-100MB): 8192-65536 bytes
- Large files (> 100MB): 65536-1048576 bytes
Important Notes
- Always use
preload_content=False
to enable streaming - Always call
response.release_conn()
to prevent connection leaks - Choose appropriate chunk sizes based on file size and memory constraints
- Handle network errors with retry strategies
- Use context managers when possible for automatic cleanup
Memory Benefits
Streaming with urllib3 keeps memory usage constant regardless of file size, making it ideal for: - Downloading large datasets - Processing log files - Handling media files - Building file proxy services - Working in memory-constrained environments