Table of contents

How do I Handle Custom HTTP Headers in Selenium Requests?

Custom HTTP headers are essential for web scraping as they allow you to mimic real browser behavior, authenticate with APIs, and bypass certain restrictions. Unlike direct HTTP libraries, Selenium WebDriver doesn't provide a straightforward method to set custom headers since it operates at the browser level. However, there are several effective approaches to accomplish this.

Understanding the Challenge

Selenium WebDriver controls browsers through the WebDriver protocol, which doesn't directly expose HTTP header manipulation. This limitation exists because Selenium is primarily designed for testing web applications rather than low-level HTTP operations. However, modern browsers provide developer tools protocols that can be leveraged to set custom headers.

Method 1: Chrome DevTools Protocol (CDP)

The most reliable method for setting custom HTTP headers in Selenium is using Chrome DevTools Protocol with ChromeDriver. This approach provides direct access to Chrome's networking capabilities.

Python Implementation

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import json

def setup_chrome_with_headers():
    # Configure Chrome options
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # Optional: run in headless mode
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")

    # Initialize the driver
    driver = webdriver.Chrome(options=chrome_options)

    # Enable Network domain for CDP
    driver.execute_cdp_cmd('Network.enable', {})

    # Set custom headers
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Authorization": "Bearer your-token-here",
        "X-Custom-Header": "custom-value"
    }

    # Apply headers using CDP
    driver.execute_cdp_cmd('Network.setUserAgentOverride', {
        "userAgent": headers["User-Agent"]
    })

    # For other headers, use request interception
    driver.execute_cdp_cmd('Network.setRequestInterception', {
        "patterns": [{"urlPattern": "*"}]
    })

    return driver

def intercept_requests(driver, custom_headers):
    """Function to handle request interception and add headers"""
    def handle_request(request):
        # Get the original headers
        headers = request.get('headers', {})

        # Add custom headers
        headers.update(custom_headers)

        # Continue the request with modified headers
        driver.execute_cdp_cmd('Network.continueInterceptedRequest', {
            'interceptionId': request['interceptionId'],
            'headers': headers
        })

    return handle_request

# Usage example
driver = setup_chrome_with_headers()
try:
    driver.get("https://httpbin.org/headers")
    # Your scraping logic here
    response = driver.page_source
    print(response)
finally:
    driver.quit()

JavaScript (Node.js) Implementation

const { Builder } = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');

async function setupChromeWithHeaders() {
    const options = new chrome.Options();
    options.addArguments('--headless');
    options.addArguments('--no-sandbox');
    options.addArguments('--disable-dev-shm-usage');

    const driver = await new Builder()
        .forBrowser('chrome')
        .setChromeOptions(options)
        .build();

    // Enable Network domain
    await driver.sendDevToolsCommand('Network.enable', {});

    // Set custom headers
    const customHeaders = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Authorization': 'Bearer your-token-here',
        'X-Custom-Header': 'custom-value'
    };

    // Override User-Agent
    await driver.sendDevToolsCommand('Network.setUserAgentOverride', {
        userAgent: customHeaders['User-Agent']
    });

    // Enable request interception for other headers
    await driver.sendDevToolsCommand('Network.setRequestInterception', {
        patterns: [{ urlPattern: '*' }]
    });

    // Handle intercepted requests
    await driver.onLogEvent(chrome.logging.Type.PERFORMANCE, async (entry) => {
        const message = JSON.parse(entry.message);
        if (message.message.method === 'Network.requestIntercepted') {
            const params = message.message.params;
            const headers = { ...params.request.headers, ...customHeaders };

            await driver.sendDevToolsCommand('Network.continueInterceptedRequest', {
                interceptionId: params.interceptionId,
                headers: headers
            });
        }
    });

    return driver;
}

// Usage
(async () => {
    const driver = await setupChromeWithHeaders();
    try {
        await driver.get('https://httpbin.org/headers');
        const pageSource = await driver.getPageSource();
        console.log(pageSource);
    } finally {
        await driver.quit();
    }
})();

Method 2: Browser Extension Approach

Another approach involves creating a lightweight browser extension that modifies headers before requests are sent.

Creating a Header Modification Extension

First, create a manifest file for the extension:

{
    "manifest_version": 3,
    "name": "Header Modifier",
    "version": "1.0",
    "permissions": ["declarativeNetRequest"],
    "host_permissions": ["<all_urls>"],
    "background": {
        "service_worker": "background.js"
    }
}

Background script (background.js):

chrome.declarativeNetRequest.updateDynamicRules({
    removeRuleIds: [1],
    addRules: [{
        id: 1,
        priority: 1,
        action: {
            type: "modifyHeaders",
            requestHeaders: [
                {
                    header: "User-Agent",
                    operation: "set",
                    value: "Custom User Agent String"
                },
                {
                    header: "Authorization",
                    operation: "set",
                    value: "Bearer your-token"
                }
            ]
        },
        condition: {
            urlFilter: "*"
        }
    }]
});

Then load this extension in your Selenium script:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def setup_chrome_with_extension():
    chrome_options = Options()
    chrome_options.add_extension("/path/to/extension.crx")

    driver = webdriver.Chrome(options=chrome_options)
    return driver

Method 3: Proxy-Based Header Injection

For more complex scenarios, you can use a proxy server that intercepts and modifies HTTP headers.

Using mitmproxy with Python

import subprocess
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def start_mitm_proxy():
    """Start mitmproxy with custom script"""
    proxy_script = """
from mitmproxy import http

def request(flow: http.HTTPFlow) -> None:
    # Add custom headers
    flow.request.headers["Authorization"] = "Bearer your-token"
    flow.request.headers["X-Custom-Header"] = "custom-value"
    flow.request.headers["User-Agent"] = "Custom Selenium Bot"
"""

    # Save script to file
    with open("proxy_script.py", "w") as f:
        f.write(proxy_script)

    # Start mitmproxy
    process = subprocess.Popen([
        "mitmdump", 
        "-s", "proxy_script.py",
        "-p", "8080",
        "--set", "confdir=~/.mitmproxy"
    ])

    time.sleep(3)  # Wait for proxy to start
    return process

def setup_chrome_with_proxy():
    chrome_options = Options()
    chrome_options.add_argument("--proxy-server=http://localhost:8080")
    chrome_options.add_argument("--ignore-certificate-errors")
    chrome_options.add_argument("--ignore-ssl-errors")

    driver = webdriver.Chrome(options=chrome_options)
    return driver

# Usage
proxy_process = start_mitm_proxy()
try:
    driver = setup_chrome_with_proxy()
    driver.get("https://httpbin.org/headers")
    # Your scraping logic
finally:
    driver.quit()
    proxy_process.terminate()

Method 4: Using Selenium Wire

Selenium Wire is a Python library that extends Selenium WebDriver to provide request/response inspection and modification capabilities.

from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options

def interceptor(request):
    """Modify requests before they are sent"""
    request.headers['Authorization'] = 'Bearer your-token'
    request.headers['X-Custom-Header'] = 'custom-value'
    request.headers['User-Agent'] = 'Custom Selenium Wire Bot'

# Setup Chrome with Selenium Wire
chrome_options = Options()
chrome_options.add_argument("--headless")

# Configure wire options for proxy settings if needed
seleniumwire_options = {
    'proxy': {
        'http': 'http://username:password@host:port',
        'https': 'https://username:password@host:port',
    }
}

driver = webdriver.Chrome(
    options=chrome_options,
    seleniumwire_options=seleniumwire_options
)

# Set the request interceptor
driver.request_interceptor = interceptor

try:
    driver.get("https://httpbin.org/headers")

    # Access request/response details
    for request in driver.requests:
        if request.response:
            print(f"Status: {request.response.status_code}")
            print(f"Headers: {dict(request.headers)}")

finally:
    driver.quit()

Advanced Header Management

Dynamic Header Injection

For scenarios requiring dynamic header values based on the target URL or other conditions:

def dynamic_header_interceptor(request):
    """Apply different headers based on request URL"""
    if 'api.example.com' in request.url:
        request.headers['Authorization'] = 'Bearer api-token'
        request.headers['Content-Type'] = 'application/json'
    elif 'auth.example.com' in request.url:
        request.headers['X-Auth-Token'] = 'auth-specific-token'

    # Always set a custom user agent
    request.headers['User-Agent'] = 'Advanced Selenium Bot 1.0'

Handling Authentication Headers

import base64

def add_basic_auth_header(username, password):
    """Generate Basic Authentication header"""
    credentials = f"{username}:{password}"
    encoded_credentials = base64.b64encode(credentials.encode()).decode()
    return f"Basic {encoded_credentials}"

def auth_interceptor(request):
    """Add authentication headers"""
    if 'secure-api.com' in request.url:
        request.headers['Authorization'] = add_basic_auth_header('user', 'pass')

    # Add API key for specific endpoints
    if '/api/' in request.url:
        request.headers['X-API-Key'] = 'your-api-key'

Best Practices and Considerations

1. Header Validation

Always validate that your headers are being sent correctly:

def validate_headers(driver):
    """Navigate to a header inspection service to verify headers"""
    driver.get("https://httpbin.org/headers")
    response = driver.page_source

    # Parse the response to check if headers were applied
    import json
    try:
        data = json.loads(response)
        headers = data.get('headers', {})
        print("Applied headers:", headers)
        return headers
    except:
        print("Could not parse header response")
        return None

2. Error Handling

Implement robust error handling for network operations:

from selenium.common.exceptions import TimeoutException, WebDriverException

def safe_request_with_headers(driver, url, timeout=30):
    """Make a request with proper error handling"""
    try:
        driver.set_page_load_timeout(timeout)
        driver.get(url)
        return True
    except TimeoutException:
        print(f"Request to {url} timed out")
        return False
    except WebDriverException as e:
        print(f"WebDriver error: {e}")
        return False

3. Performance Optimization

When using request interception, be mindful of performance impacts:

def optimized_interceptor(request):
    """Only modify headers for specific domains"""
    target_domains = ['api.example.com', 'secure.example.com']

    if any(domain in request.url for domain in target_domains):
        request.headers['Authorization'] = 'Bearer token'

    # Don't intercept static resources
    if request.url.endswith(('.css', '.js', '.png', '.jpg')):
        return

Conclusion

Handling custom HTTP headers in Selenium requires different approaches depending on your specific needs. The Chrome DevTools Protocol method offers the most control and reliability, while proxy-based solutions provide flexibility for complex scenarios. For Python developers, Selenium Wire offers an excellent balance of functionality and ease of use.

When implementing custom headers, always test thoroughly with header inspection services to ensure your headers are being applied correctly. Consider the performance implications of request interception and implement appropriate error handling for production environments.

For more advanced automation scenarios, you might also want to explore how to handle authentication in Puppeteer or learn about monitoring network requests in Puppeteer for alternative approaches to header management in web automation.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon