How do I add headers to my requests in urllib3?

Adding headers to HTTP requests in urllib3 is straightforward - you pass a dictionary of headers to the request() method. Headers are essential for web scraping, API authentication, and controlling request behavior.

Quick Example

import urllib3

http = urllib3.PoolManager()
headers = {'User-Agent': 'Mozilla/5.0 (compatible; Python urllib3)'}
response = http.request('GET', 'https://example.com', headers=headers)

Basic Setup

Installation and Import

# Install urllib3 if needed
# pip install urllib3

import urllib3

Creating Headers Dictionary

Headers are passed as a Python dictionary where keys are header names and values are header values:

headers = {
    'User-Agent': 'MyApp/1.0',
    'Accept': 'application/json',
    'Content-Type': 'application/json'
}

Common Header Examples

Web Scraping Headers

import urllib3

http = urllib3.PoolManager()

# Common web scraping headers
scraping_headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'Referer': 'https://google.com'
}

response = http.request('GET', 'https://example.com', headers=scraping_headers)
print(response.status)

API Authentication

# Bearer token authentication
api_headers = {
    'Authorization': 'Bearer your-api-token-here',
    'Content-Type': 'application/json',
    'Accept': 'application/json'
}

# API key authentication
api_key_headers = {
    'X-API-Key': 'your-api-key-here',
    'User-Agent': 'MyApp/1.0'
}

response = http.request('GET', 'https://api.example.com/data', headers=api_headers)

Custom Headers for Different Methods

import urllib3
import json

http = urllib3.PoolManager()

# GET request with headers
get_headers = {
    'User-Agent': 'MyApp/1.0',
    'Accept': 'application/json'
}
get_response = http.request('GET', 'https://api.example.com/users', headers=get_headers)

# POST request with JSON data
post_headers = {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
    'User-Agent': 'MyApp/1.0'
}
post_data = json.dumps({'name': 'John', 'email': 'john@example.com'})
post_response = http.request('POST', 'https://api.example.com/users', 
                           headers=post_headers, body=post_data)

Advanced Usage

Multiple Requests with Same Headers

import urllib3

http = urllib3.PoolManager()

# Define headers once for multiple requests
common_headers = {
    'User-Agent': 'MyBot/1.0',
    'Accept': 'application/json',
    'Authorization': 'Bearer your-token'
}

urls = ['https://api.example.com/users', 'https://api.example.com/posts']

for url in urls:
    response = http.request('GET', url, headers=common_headers)
    print(f"Status: {response.status}, URL: {url}")

Dynamic Headers

import urllib3
import os

http = urllib3.PoolManager()

# Headers with environment variables
headers = {
    'User-Agent': 'MyApp/1.0',
    'Authorization': f"Bearer {os.getenv('API_TOKEN')}",
    'Accept': 'application/json'
}

# Add conditional headers
if os.getenv('DEBUG'):
    headers['X-Debug'] = 'true'

response = http.request('GET', 'https://api.example.com/data', headers=headers)

Error Handling and Security

Proper Exception Handling

import urllib3
from urllib3.exceptions import MaxRetryError, TimeoutError

http = urllib3.PoolManager()
headers = {'User-Agent': 'MyApp/1.0'}

try:
    response = http.request('GET', 'https://example.com', 
                          headers=headers, timeout=10)
    print(f"Success: {response.status}")
    print(response.data.decode('utf-8'))
except MaxRetryError as e:
    print(f"Connection failed: {e}")
except TimeoutError as e:
    print(f"Request timed out: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Secure Header Management

import urllib3
import os

# Use environment variables for sensitive data
headers = {
    'User-Agent': 'MyApp/1.0',
    'Authorization': f"Bearer {os.getenv('API_TOKEN')}",  # From environment
    'Accept': 'application/json'
}

# Don't hardcode sensitive information
# BAD: 'Authorization': 'Bearer abc123token456'
# GOOD: 'Authorization': f"Bearer {os.getenv('API_TOKEN')}"

Best Practices

  1. Always include User-Agent: Many servers block requests without proper User-Agent headers
  2. Use environment variables: Store API keys and tokens securely
  3. Handle exceptions: Wrap requests in try-catch blocks
  4. Verify SSL certificates: Use proper SSL verification for production
  5. Rate limiting: Respect server rate limits and add delays if needed
import urllib3
import time
import os

# Recommended production setup
http = urllib3.PoolManager(
    cert_reqs='CERT_REQUIRED',
    ca_certs=urllib3.util.ssl_.DEFAULT_CERTS
)

headers = {
    'User-Agent': 'MyApp/1.0 (contact@example.com)',
    'Accept': 'application/json',
    'Authorization': f"Bearer {os.getenv('API_TOKEN')}"
}

try:
    response = http.request('GET', 'https://api.example.com/data', 
                          headers=headers, timeout=30)
    if response.status == 200:
        data = response.data.decode('utf-8')
        print(data)
    else:
        print(f"Request failed with status: {response.status}")
except Exception as e:
    print(f"Error: {e}")

This approach ensures your urllib3 requests include the necessary headers for successful web scraping and API interactions while maintaining security best practices.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon