How do I Handle Custom HTTP Headers in Selenium Requests?
Custom HTTP headers are essential for web scraping as they allow you to mimic real browser behavior, authenticate with APIs, and bypass certain restrictions. Unlike direct HTTP libraries, Selenium WebDriver doesn't provide a straightforward method to set custom headers since it operates at the browser level. However, there are several effective approaches to accomplish this.
Understanding the Challenge
Selenium WebDriver controls browsers through the WebDriver protocol, which doesn't directly expose HTTP header manipulation. This limitation exists because Selenium is primarily designed for testing web applications rather than low-level HTTP operations. However, modern browsers provide developer tools protocols that can be leveraged to set custom headers.
Method 1: Chrome DevTools Protocol (CDP)
The most reliable method for setting custom HTTP headers in Selenium is using Chrome DevTools Protocol with ChromeDriver. This approach provides direct access to Chrome's networking capabilities.
Python Implementation
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import json
def setup_chrome_with_headers():
# Configure Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Optional: run in headless mode
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
# Initialize the driver
driver = webdriver.Chrome(options=chrome_options)
# Enable Network domain for CDP
driver.execute_cdp_cmd('Network.enable', {})
# Set custom headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Authorization": "Bearer your-token-here",
"X-Custom-Header": "custom-value"
}
# Apply headers using CDP
driver.execute_cdp_cmd('Network.setUserAgentOverride', {
"userAgent": headers["User-Agent"]
})
# For other headers, use request interception
driver.execute_cdp_cmd('Network.setRequestInterception', {
"patterns": [{"urlPattern": "*"}]
})
return driver
def intercept_requests(driver, custom_headers):
"""Function to handle request interception and add headers"""
def handle_request(request):
# Get the original headers
headers = request.get('headers', {})
# Add custom headers
headers.update(custom_headers)
# Continue the request with modified headers
driver.execute_cdp_cmd('Network.continueInterceptedRequest', {
'interceptionId': request['interceptionId'],
'headers': headers
})
return handle_request
# Usage example
driver = setup_chrome_with_headers()
try:
driver.get("https://httpbin.org/headers")
# Your scraping logic here
response = driver.page_source
print(response)
finally:
driver.quit()
JavaScript (Node.js) Implementation
const { Builder } = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
async function setupChromeWithHeaders() {
const options = new chrome.Options();
options.addArguments('--headless');
options.addArguments('--no-sandbox');
options.addArguments('--disable-dev-shm-usage');
const driver = await new Builder()
.forBrowser('chrome')
.setChromeOptions(options)
.build();
// Enable Network domain
await driver.sendDevToolsCommand('Network.enable', {});
// Set custom headers
const customHeaders = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept-Language': 'en-US,en;q=0.9',
'Authorization': 'Bearer your-token-here',
'X-Custom-Header': 'custom-value'
};
// Override User-Agent
await driver.sendDevToolsCommand('Network.setUserAgentOverride', {
userAgent: customHeaders['User-Agent']
});
// Enable request interception for other headers
await driver.sendDevToolsCommand('Network.setRequestInterception', {
patterns: [{ urlPattern: '*' }]
});
// Handle intercepted requests
await driver.onLogEvent(chrome.logging.Type.PERFORMANCE, async (entry) => {
const message = JSON.parse(entry.message);
if (message.message.method === 'Network.requestIntercepted') {
const params = message.message.params;
const headers = { ...params.request.headers, ...customHeaders };
await driver.sendDevToolsCommand('Network.continueInterceptedRequest', {
interceptionId: params.interceptionId,
headers: headers
});
}
});
return driver;
}
// Usage
(async () => {
const driver = await setupChromeWithHeaders();
try {
await driver.get('https://httpbin.org/headers');
const pageSource = await driver.getPageSource();
console.log(pageSource);
} finally {
await driver.quit();
}
})();
Method 2: Browser Extension Approach
Another approach involves creating a lightweight browser extension that modifies headers before requests are sent.
Creating a Header Modification Extension
First, create a manifest file for the extension:
{
"manifest_version": 3,
"name": "Header Modifier",
"version": "1.0",
"permissions": ["declarativeNetRequest"],
"host_permissions": ["<all_urls>"],
"background": {
"service_worker": "background.js"
}
}
Background script (background.js
):
chrome.declarativeNetRequest.updateDynamicRules({
removeRuleIds: [1],
addRules: [{
id: 1,
priority: 1,
action: {
type: "modifyHeaders",
requestHeaders: [
{
header: "User-Agent",
operation: "set",
value: "Custom User Agent String"
},
{
header: "Authorization",
operation: "set",
value: "Bearer your-token"
}
]
},
condition: {
urlFilter: "*"
}
}]
});
Then load this extension in your Selenium script:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def setup_chrome_with_extension():
chrome_options = Options()
chrome_options.add_extension("/path/to/extension.crx")
driver = webdriver.Chrome(options=chrome_options)
return driver
Method 3: Proxy-Based Header Injection
For more complex scenarios, you can use a proxy server that intercepts and modifies HTTP headers.
Using mitmproxy with Python
import subprocess
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def start_mitm_proxy():
"""Start mitmproxy with custom script"""
proxy_script = """
from mitmproxy import http
def request(flow: http.HTTPFlow) -> None:
# Add custom headers
flow.request.headers["Authorization"] = "Bearer your-token"
flow.request.headers["X-Custom-Header"] = "custom-value"
flow.request.headers["User-Agent"] = "Custom Selenium Bot"
"""
# Save script to file
with open("proxy_script.py", "w") as f:
f.write(proxy_script)
# Start mitmproxy
process = subprocess.Popen([
"mitmdump",
"-s", "proxy_script.py",
"-p", "8080",
"--set", "confdir=~/.mitmproxy"
])
time.sleep(3) # Wait for proxy to start
return process
def setup_chrome_with_proxy():
chrome_options = Options()
chrome_options.add_argument("--proxy-server=http://localhost:8080")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--ignore-ssl-errors")
driver = webdriver.Chrome(options=chrome_options)
return driver
# Usage
proxy_process = start_mitm_proxy()
try:
driver = setup_chrome_with_proxy()
driver.get("https://httpbin.org/headers")
# Your scraping logic
finally:
driver.quit()
proxy_process.terminate()
Method 4: Using Selenium Wire
Selenium Wire is a Python library that extends Selenium WebDriver to provide request/response inspection and modification capabilities.
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
def interceptor(request):
"""Modify requests before they are sent"""
request.headers['Authorization'] = 'Bearer your-token'
request.headers['X-Custom-Header'] = 'custom-value'
request.headers['User-Agent'] = 'Custom Selenium Wire Bot'
# Setup Chrome with Selenium Wire
chrome_options = Options()
chrome_options.add_argument("--headless")
# Configure wire options for proxy settings if needed
seleniumwire_options = {
'proxy': {
'http': 'http://username:password@host:port',
'https': 'https://username:password@host:port',
}
}
driver = webdriver.Chrome(
options=chrome_options,
seleniumwire_options=seleniumwire_options
)
# Set the request interceptor
driver.request_interceptor = interceptor
try:
driver.get("https://httpbin.org/headers")
# Access request/response details
for request in driver.requests:
if request.response:
print(f"Status: {request.response.status_code}")
print(f"Headers: {dict(request.headers)}")
finally:
driver.quit()
Advanced Header Management
Dynamic Header Injection
For scenarios requiring dynamic header values based on the target URL or other conditions:
def dynamic_header_interceptor(request):
"""Apply different headers based on request URL"""
if 'api.example.com' in request.url:
request.headers['Authorization'] = 'Bearer api-token'
request.headers['Content-Type'] = 'application/json'
elif 'auth.example.com' in request.url:
request.headers['X-Auth-Token'] = 'auth-specific-token'
# Always set a custom user agent
request.headers['User-Agent'] = 'Advanced Selenium Bot 1.0'
Handling Authentication Headers
import base64
def add_basic_auth_header(username, password):
"""Generate Basic Authentication header"""
credentials = f"{username}:{password}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()
return f"Basic {encoded_credentials}"
def auth_interceptor(request):
"""Add authentication headers"""
if 'secure-api.com' in request.url:
request.headers['Authorization'] = add_basic_auth_header('user', 'pass')
# Add API key for specific endpoints
if '/api/' in request.url:
request.headers['X-API-Key'] = 'your-api-key'
Best Practices and Considerations
1. Header Validation
Always validate that your headers are being sent correctly:
def validate_headers(driver):
"""Navigate to a header inspection service to verify headers"""
driver.get("https://httpbin.org/headers")
response = driver.page_source
# Parse the response to check if headers were applied
import json
try:
data = json.loads(response)
headers = data.get('headers', {})
print("Applied headers:", headers)
return headers
except:
print("Could not parse header response")
return None
2. Error Handling
Implement robust error handling for network operations:
from selenium.common.exceptions import TimeoutException, WebDriverException
def safe_request_with_headers(driver, url, timeout=30):
"""Make a request with proper error handling"""
try:
driver.set_page_load_timeout(timeout)
driver.get(url)
return True
except TimeoutException:
print(f"Request to {url} timed out")
return False
except WebDriverException as e:
print(f"WebDriver error: {e}")
return False
3. Performance Optimization
When using request interception, be mindful of performance impacts:
def optimized_interceptor(request):
"""Only modify headers for specific domains"""
target_domains = ['api.example.com', 'secure.example.com']
if any(domain in request.url for domain in target_domains):
request.headers['Authorization'] = 'Bearer token'
# Don't intercept static resources
if request.url.endswith(('.css', '.js', '.png', '.jpg')):
return
Conclusion
Handling custom HTTP headers in Selenium requires different approaches depending on your specific needs. The Chrome DevTools Protocol method offers the most control and reliability, while proxy-based solutions provide flexibility for complex scenarios. For Python developers, Selenium Wire offers an excellent balance of functionality and ease of use.
When implementing custom headers, always test thoroughly with header inspection services to ensure your headers are being applied correctly. Consider the performance implications of request interception and implement appropriate error handling for production environments.
For more advanced automation scenarios, you might also want to explore how to handle authentication in Puppeteer or learn about monitoring network requests in Puppeteer for alternative approaches to header management in web automation.