Persistent connections in urllib3
allow you to reuse HTTP connections across multiple requests, significantly improving performance by avoiding the overhead of establishing new connections. This is achieved through connection pooling, which is enabled by default in urllib3
.
Understanding Connection Pooling
Connection pooling maintains a pool of open connections that can be reused for subsequent requests to the same host. This reduces: - Connection establishment time - TCP handshake overhead - SSL/TLS negotiation time (for HTTPS) - Overall latency
Basic Usage with PoolManager
Creating a PoolManager
The PoolManager
class handles connection pooling automatically:
import urllib3
# Create a PoolManager instance
http = urllib3.PoolManager()
Making Requests
Once created, the PoolManager reuses connections automatically:
# First request - establishes connection
response1 = http.request('GET', 'https://httpbin.org/ip')
print(f"Response 1: {response1.data.decode()}")
# Second request - reuses existing connection
response2 = http.request('GET', 'https://httpbin.org/user-agent')
print(f"Response 2: {response2.data.decode()}")
# Third request to same host - connection reused again
response3 = http.request('POST', 'https://httpbin.org/post',
fields={'key': 'value'})
print(f"Response 3 status: {response3.status}")
Advanced Configuration
Customizing Pool Parameters
You can fine-tune the connection pool behavior:
import urllib3
from urllib3.util.retry import Retry
# Advanced PoolManager configuration
http = urllib3.PoolManager(
num_pools=10, # Number of different hosts to pool
maxsize=20, # Max connections per pool
block=False, # Don't block when pool is full
retries=Retry(
total=3, # Total retry attempts
backoff_factor=0.3, # Delay between retries
status_forcelist=[500, 502, 503, 504] # HTTP codes to retry
),
timeout=urllib3.Timeout(
connect=5.0, # Connection timeout
read=30.0 # Read timeout
)
)
Working with Multiple Hosts
import urllib3
import time
http = urllib3.PoolManager(num_pools=3, maxsize=5)
# Requests to different hosts - each gets its own pool
hosts = [
'https://httpbin.org',
'https://jsonplaceholder.typicode.com',
'https://api.github.com'
]
for host in hosts:
start_time = time.time()
# First request to each host
response = http.request('GET', f'{host}/headers' if 'httpbin' in host else host)
first_request_time = time.time() - start_time
# Second request to same host (should be faster due to connection reuse)
start_time = time.time()
response = http.request('GET', f'{host}/ip' if 'httpbin' in host else host)
second_request_time = time.time() - start_time
print(f"Host: {host}")
print(f"First request: {first_request_time:.3f}s")
print(f"Second request: {second_request_time:.3f}s")
print(f"Speed improvement: {((first_request_time - second_request_time) / first_request_time * 100):.1f}%\n")
HTTPConnectionPool for Single Hosts
For applications that primarily communicate with a single host, use HTTPConnectionPool
directly:
import urllib3
# Create a pool for a specific host
pool = urllib3.HTTPConnectionPool('httpbin.org', port=443,
maxsize=10, block=True)
# Make requests using the pool
response = pool.request('GET', '/json')
print(response.data.decode())
# Clean up
pool.close()
Connection Pool Management
Monitoring Pool Status
import urllib3
http = urllib3.PoolManager(maxsize=5)
# Make some requests
for i in range(10):
response = http.request('GET', f'https://httpbin.org/delay/{i%3}')
# Check pool statistics
for pool_key, pool in http.pools.items():
print(f"Pool {pool_key}:")
print(f" Pool size: {pool.pool.qsize()}")
print(f" Pool maxsize: {pool.maxsize}")
Proper Cleanup
Always clean up resources when done:
import urllib3
import atexit
http = urllib3.PoolManager()
# Register cleanup function
def cleanup():
http.clear()
print("Connection pools cleared")
atexit.register(cleanup)
# Your application code here
response = http.request('GET', 'https://httpbin.org/get')
Error Handling with Persistent Connections
import urllib3
from urllib3.exceptions import MaxRetryError, NewConnectionError, TimeoutError
http = urllib3.PoolManager(
retries=urllib3.Retry(total=3, backoff_factor=0.3),
timeout=urllib3.Timeout(connect=5.0, read=10.0)
)
try:
response = http.request('GET', 'https://httpbin.org/delay/2')
print(f"Status: {response.status}")
print(f"Data: {response.data.decode()}")
except MaxRetryError as e:
print(f"Max retries exceeded: {e}")
except NewConnectionError as e:
print(f"Connection failed: {e}")
except TimeoutError as e:
print(f"Request timed out: {e}")
Performance Comparison
Here's a practical example showing the performance benefits:
import urllib3
import time
import requests # For comparison
def test_urllib3_with_pooling():
http = urllib3.PoolManager()
start = time.time()
for i in range(10):
response = http.request('GET', 'https://httpbin.org/uuid')
return time.time() - start
def test_requests_without_session():
start = time.time()
for i in range(10):
response = requests.get('https://httpbin.org/uuid')
return time.time() - start
# Run tests
urllib3_time = test_urllib3_with_pooling()
requests_time = test_requests_without_session()
print(f"urllib3 with pooling: {urllib3_time:.2f}s")
print(f"requests without session: {requests_time:.2f}s")
print(f"Performance improvement: {((requests_time - urllib3_time) / requests_time * 100):.1f}%")
Best Practices
- Reuse PoolManager instances: Create one PoolManager and reuse it throughout your application
- Configure appropriate pool sizes: Set
maxsize
based on your concurrent request needs - Handle timeouts: Always set reasonable connection and read timeouts
- Implement retry logic: Use
urllib3.Retry
for robust error handling - Clean up resources: Call
clear()
when shutting down your application - Monitor pool usage: Keep track of pool statistics in production applications
- Use context managers: Consider wrapping PoolManager usage in context managers for automatic cleanup
By leveraging persistent connections in urllib3
, you can significantly improve the performance of your HTTP-based applications while maintaining clean, maintainable code.