Adding headers to HTTP requests in urllib3
is straightforward - you pass a dictionary of headers to the request()
method. Headers are essential for web scraping, API authentication, and controlling request behavior.
Quick Example
import urllib3
http = urllib3.PoolManager()
headers = {'User-Agent': 'Mozilla/5.0 (compatible; Python urllib3)'}
response = http.request('GET', 'https://example.com', headers=headers)
Basic Setup
Installation and Import
# Install urllib3 if needed
# pip install urllib3
import urllib3
Creating Headers Dictionary
Headers are passed as a Python dictionary where keys are header names and values are header values:
headers = {
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json',
'Content-Type': 'application/json'
}
Common Header Examples
Web Scraping Headers
import urllib3
http = urllib3.PoolManager()
# Common web scraping headers
scraping_headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Referer': 'https://google.com'
}
response = http.request('GET', 'https://example.com', headers=scraping_headers)
print(response.status)
API Authentication
# Bearer token authentication
api_headers = {
'Authorization': 'Bearer your-api-token-here',
'Content-Type': 'application/json',
'Accept': 'application/json'
}
# API key authentication
api_key_headers = {
'X-API-Key': 'your-api-key-here',
'User-Agent': 'MyApp/1.0'
}
response = http.request('GET', 'https://api.example.com/data', headers=api_headers)
Custom Headers for Different Methods
import urllib3
import json
http = urllib3.PoolManager()
# GET request with headers
get_headers = {
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json'
}
get_response = http.request('GET', 'https://api.example.com/users', headers=get_headers)
# POST request with JSON data
post_headers = {
'Content-Type': 'application/json',
'Accept': 'application/json',
'User-Agent': 'MyApp/1.0'
}
post_data = json.dumps({'name': 'John', 'email': 'john@example.com'})
post_response = http.request('POST', 'https://api.example.com/users',
headers=post_headers, body=post_data)
Advanced Usage
Multiple Requests with Same Headers
import urllib3
http = urllib3.PoolManager()
# Define headers once for multiple requests
common_headers = {
'User-Agent': 'MyBot/1.0',
'Accept': 'application/json',
'Authorization': 'Bearer your-token'
}
urls = ['https://api.example.com/users', 'https://api.example.com/posts']
for url in urls:
response = http.request('GET', url, headers=common_headers)
print(f"Status: {response.status}, URL: {url}")
Dynamic Headers
import urllib3
import os
http = urllib3.PoolManager()
# Headers with environment variables
headers = {
'User-Agent': 'MyApp/1.0',
'Authorization': f"Bearer {os.getenv('API_TOKEN')}",
'Accept': 'application/json'
}
# Add conditional headers
if os.getenv('DEBUG'):
headers['X-Debug'] = 'true'
response = http.request('GET', 'https://api.example.com/data', headers=headers)
Error Handling and Security
Proper Exception Handling
import urllib3
from urllib3.exceptions import MaxRetryError, TimeoutError
http = urllib3.PoolManager()
headers = {'User-Agent': 'MyApp/1.0'}
try:
response = http.request('GET', 'https://example.com',
headers=headers, timeout=10)
print(f"Success: {response.status}")
print(response.data.decode('utf-8'))
except MaxRetryError as e:
print(f"Connection failed: {e}")
except TimeoutError as e:
print(f"Request timed out: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Secure Header Management
import urllib3
import os
# Use environment variables for sensitive data
headers = {
'User-Agent': 'MyApp/1.0',
'Authorization': f"Bearer {os.getenv('API_TOKEN')}", # From environment
'Accept': 'application/json'
}
# Don't hardcode sensitive information
# BAD: 'Authorization': 'Bearer abc123token456'
# GOOD: 'Authorization': f"Bearer {os.getenv('API_TOKEN')}"
Best Practices
- Always include User-Agent: Many servers block requests without proper User-Agent headers
- Use environment variables: Store API keys and tokens securely
- Handle exceptions: Wrap requests in try-catch blocks
- Verify SSL certificates: Use proper SSL verification for production
- Rate limiting: Respect server rate limits and add delays if needed
import urllib3
import time
import os
# Recommended production setup
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=urllib3.util.ssl_.DEFAULT_CERTS
)
headers = {
'User-Agent': 'MyApp/1.0 (contact@example.com)',
'Accept': 'application/json',
'Authorization': f"Bearer {os.getenv('API_TOKEN')}"
}
try:
response = http.request('GET', 'https://api.example.com/data',
headers=headers, timeout=30)
if response.status == 200:
data = response.data.decode('utf-8')
print(data)
else:
print(f"Request failed with status: {response.status}")
except Exception as e:
print(f"Error: {e}")
This approach ensures your urllib3 requests include the necessary headers for successful web scraping and API interactions while maintaining security best practices.