What are the performance differences between urllib3 and requests library?
When building Python applications that make HTTP requests, developers often choose between urllib3 and requests. While requests provides a more user-friendly API, urllib3 offers lower-level control and typically better performance. Understanding these performance differences is crucial for building efficient web scraping and API integration applications.
Overview of urllib3 vs requests
urllib3 is a powerful, low-level HTTP client library that provides connection pooling, thread safety, and extensive configuration options. It serves as the foundation for many higher-level libraries, including requests.
requests is a high-level HTTP library built on top of urllib3 that provides an elegant, human-friendly API. While it offers convenience and ease of use, this abstraction layer introduces some performance overhead.
Performance Comparison Metrics
Speed and Latency
urllib3 consistently outperforms requests in terms of raw speed due to its minimal overhead:
import time
import urllib3
import requests
# urllib3 example
def urllib3_request():
http = urllib3.PoolManager()
start_time = time.time()
response = http.request('GET', 'https://httpbin.org/json')
end_time = time.time()
return end_time - start_time
# requests example
def requests_request():
start_time = time.time()
response = requests.get('https://httpbin.org/json')
end_time = time.time()
return end_time - start_time
# Benchmark results typically show 10-20% faster response times with urllib3
Memory Usage
urllib3 demonstrates significantly lower memory consumption:
import tracemalloc
import urllib3
import requests
# Memory profiling with urllib3
tracemalloc.start()
http = urllib3.PoolManager()
for i in range(100):
response = http.request('GET', 'https://httpbin.org/json')
current, peak = tracemalloc.get_traced_memory()
print(f"urllib3 - Current: {current / 1024 / 1024:.2f} MB, Peak: {peak / 1024 / 1024:.2f} MB")
tracemalloc.stop()
# Memory profiling with requests
tracemalloc.start()
for i in range(100):
response = requests.get('https://httpbin.org/json')
current, peak = tracemalloc.get_traced_memory()
print(f"requests - Current: {current / 1024 / 1024:.2f} MB, Peak: {peak / 1024 / 1024:.2f} MB")
tracemalloc.stop()
Connection Pooling Performance
urllib3 Connection Pooling
urllib3 provides explicit control over connection pooling, which significantly improves performance for multiple requests:
import urllib3
# Efficient connection pooling with urllib3
http = urllib3.PoolManager(
num_pools=10,
maxsize=10,
block=True,
retries=urllib3.Retry(total=3, backoff_factor=0.1)
)
# Reuse connections across multiple requests
urls = ['https://httpbin.org/json', 'https://httpbin.org/ip', 'https://httpbin.org/headers']
for url in urls:
response = http.request('GET', url)
print(f"Status: {response.status}")
requests Session Performance
While requests supports sessions for connection pooling, it's less explicit and efficient:
import requests
# Connection pooling with requests session
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(
pool_connections=10,
pool_maxsize=10,
max_retries=3
)
session.mount('http://', adapter)
session.mount('https://', adapter)
urls = ['https://httpbin.org/json', 'https://httpbin.org/ip', 'https://httpbin.org/headers']
for url in urls:
response = session.get(url)
print(f"Status: {response.status_code}")
Concurrent Request Performance
urllib3 with Threading
urllib3's thread-safe design makes it excellent for concurrent operations:
import concurrent.futures
import urllib3
def make_request(url):
http = urllib3.PoolManager()
response = http.request('GET', url)
return response.status
# Concurrent requests with urllib3
urls = ['https://httpbin.org/delay/1'] * 10
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(make_request, url) for url in urls]
results = [future.result() for future in concurrent.futures.as_completed(futures)]
print(f"Completed {len(results)} requests")
Async Support Comparison
For asynchronous operations, consider specialized libraries:
import asyncio
import aiohttp
async def async_request_aiohttp():
async with aiohttp.ClientSession() as session:
async with session.get('https://httpbin.org/json') as response:
return await response.json()
# urllib3 doesn't natively support async, but can be used with asyncio
import asyncio
import urllib3
async def async_request_urllib3():
loop = asyncio.get_event_loop()
http = urllib3.PoolManager()
response = await loop.run_in_executor(None, http.request, 'GET', 'https://httpbin.org/json')
return response.data
Real-World Benchmarks
High-Volume Request Scenarios
For applications making thousands of requests, the performance difference becomes significant:
import time
import urllib3
import requests
def benchmark_libraries(num_requests=1000):
# urllib3 benchmark
http = urllib3.PoolManager()
start_time = time.time()
for i in range(num_requests):
response = http.request('GET', 'https://httpbin.org/json')
urllib3_time = time.time() - start_time
# requests benchmark
session = requests.Session()
start_time = time.time()
for i in range(num_requests):
response = session.get('https://httpbin.org/json')
requests_time = time.time() - start_time
print(f"urllib3: {urllib3_time:.2f}s")
print(f"requests: {requests_time:.2f}s")
print(f"Performance improvement: {((requests_time - urllib3_time) / requests_time * 100):.1f}%")
benchmark_libraries()
When to Choose Each Library
Choose urllib3 when:
- Performance is critical: High-volume applications, real-time systems
- Memory constraints exist: Resource-limited environments
- Fine-grained control needed: Custom connection pooling, advanced retry logic
- Building library foundations: Creating higher-level HTTP abstractions
Choose requests when:
- Development speed matters: Rapid prototyping, simple integrations
- Team familiarity: Most developers know requests syntax
- Feature richness required: Built-in JSON handling, authentication helpers
- Maintenance simplicity: Less configuration, more defaults
Optimization Strategies
urllib3 Performance Tuning
import urllib3
# Optimized urllib3 configuration
http = urllib3.PoolManager(
num_pools=50, # Increase pool count for many hosts
maxsize=20, # More connections per pool
block=True, # Block when pool is full
timeout=urllib3.Timeout(
connect=5.0,
read=10.0
),
retries=urllib3.Retry(
total=3,
backoff_factor=0.1,
status_forcelist=[500, 502, 503, 504]
)
)
requests Performance Tuning
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# Optimized requests session
session = requests.Session()
retry_strategy = Retry(
total=3,
backoff_factor=0.1,
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(
pool_connections=50,
pool_maxsize=20,
max_retries=retry_strategy
)
session.mount("http://", adapter)
session.mount("https://", adapter)
Performance Monitoring
Measuring Request Performance
import time
import urllib3
from contextlib import contextmanager
@contextmanager
def timer():
start = time.perf_counter()
yield
end = time.perf_counter()
print(f"Execution time: {end - start:.4f} seconds")
# Monitor urllib3 performance
http = urllib3.PoolManager()
with timer():
response = http.request('GET', 'https://httpbin.org/json')
print(f"Status: {response.status}, Size: {len(response.data)} bytes")
JavaScript Alternative: Node.js HTTP Libraries
While Python dominates the web scraping space, Node.js offers comparable libraries:
// Node.js with axios (similar to requests)
const axios = require('axios');
async function fetchWithAxios() {
const start = Date.now();
const response = await axios.get('https://httpbin.org/json');
const end = Date.now();
console.log(`Axios: ${end - start}ms`);
return response.data;
}
// Node.js with native http (similar to urllib3)
const http = require('http');
const https = require('https');
function fetchWithNative(url) {
return new Promise((resolve, reject) => {
const start = Date.now();
https.get(url, (response) => {
let data = '';
response.on('data', chunk => data += chunk);
response.on('end', () => {
const end = Date.now();
console.log(`Native HTTP: ${end - start}ms`);
resolve(JSON.parse(data));
});
}).on('error', reject);
});
}
Best Practices for High-Performance Applications
- Use connection pooling: Always reuse connections for multiple requests
- Configure appropriate timeouts: Prevent hanging requests from degrading performance
- Implement proper retry logic: Handle transient failures efficiently
- Monitor memory usage: Especially important for long-running applications
- Consider async alternatives: For I/O-bound applications with high concurrency needs
Command Line Testing
Test performance differences using command-line tools:
# Install both libraries
pip install urllib3 requests
# Run performance comparison script
python -c "
import time
import urllib3
import requests
# Quick urllib3 test
http = urllib3.PoolManager()
start = time.time()
for i in range(100):
http.request('GET', 'https://httpbin.org/json')
print(f'urllib3: {time.time() - start:.2f}s')
# Quick requests test
session = requests.Session()
start = time.time()
for i in range(100):
session.get('https://httpbin.org/json')
print(f'requests: {time.time() - start:.2f}s')
"
When building web scraping applications or API integrations that require optimal performance, urllib3 provides the speed and efficiency advantages necessary for production workloads. However, for most development scenarios where ease of use is prioritized over raw performance, requests remains an excellent choice.
The key is understanding your application's specific requirements and choosing the library that best balances performance needs with development efficiency.