Table of contents

How do I set custom headers for all requests in a session?

Setting custom headers for all requests in a session is a fundamental requirement in web scraping and API interactions. Whether you need to authenticate, mimic browser behavior, or comply with API requirements, properly configuring session headers ensures consistent and successful HTTP communications.

Understanding Sessions and Headers

A session maintains certain parameters across multiple HTTP requests, including cookies, authentication tokens, and headers. Custom headers are particularly useful for:

  • Authentication: Bearer tokens, API keys, or custom auth schemes
  • User-Agent spoofing: Mimicking different browsers or devices
  • Content negotiation: Specifying preferred response formats
  • Rate limiting compliance: Including required tracking headers
  • CORS handling: Adding necessary cross-origin headers

Python Requests Library

The Python requests library provides the most straightforward approach to session management with custom headers.

Basic Session Configuration

import requests

# Create a session object
session = requests.Session()

# Set headers for all requests in this session
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'application/json, text/html, */*',
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
})

# All subsequent requests will include these headers
response1 = session.get('https://api.example.com/data')
response2 = session.post('https://api.example.com/submit', json={'key': 'value'})

Authentication Headers

import requests

session = requests.Session()

# API key authentication
session.headers.update({
    'Authorization': 'Bearer your-api-token-here',
    'X-API-Key': 'your-api-key-here',
    'Content-Type': 'application/json'
})

# OAuth 2.0 example
session.headers.update({
    'Authorization': f'Bearer {access_token}',
    'Accept': 'application/vnd.api+json'
})

# Make authenticated requests
user_data = session.get('https://api.example.com/user/profile')
update_response = session.patch('https://api.example.com/user/settings', 
                               json={'theme': 'dark'})

Dynamic Header Updates

class WebScrapingSession:
    def __init__(self):
        self.session = requests.Session()
        self._setup_default_headers()

    def _setup_default_headers(self):
        self.session.headers.update({
            'User-Agent': 'Custom-Scraper/1.0',
            'Accept': '*/*',
            'Accept-Language': 'en-US,en;q=0.5',
            'Cache-Control': 'no-cache',
            'Pragma': 'no-cache'
        })

    def set_authentication(self, token):
        self.session.headers['Authorization'] = f'Bearer {token}'

    def set_custom_header(self, key, value):
        self.session.headers[key] = value

    def get(self, url, **kwargs):
        return self.session.get(url, **kwargs)

    def post(self, url, **kwargs):
        return self.session.post(url, **kwargs)

# Usage
scraper = WebScrapingSession()
scraper.set_authentication('your-token')
scraper.set_custom_header('X-Client-Version', '2.1.0')

response = scraper.get('https://api.example.com/protected-endpoint')

JavaScript and Node.js

Using Axios

const axios = require('axios');

// Create an axios instance with default headers
const apiClient = axios.create({
  baseURL: 'https://api.example.com',
  headers: {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'X-Requested-With': 'XMLHttpRequest'
  },
  timeout: 10000
});

// Add authentication token to all requests
apiClient.defaults.headers.common['Authorization'] = 'Bearer your-token-here';

// Use the configured client
async function fetchData() {
  try {
    const response = await apiClient.get('/data');
    const postResponse = await apiClient.post('/submit', { data: 'value' });
    return { response, postResponse };
  } catch (error) {
    console.error('Request failed:', error.message);
  }
}

Using Fetch with Custom Headers

class HTTPSession {
  constructor(baseURL = '', defaultHeaders = {}) {
    this.baseURL = baseURL;
    this.headers = new Map(Object.entries(defaultHeaders));
  }

  setHeader(key, value) {
    this.headers.set(key, value);
  }

  setHeaders(headers) {
    Object.entries(headers).forEach(([key, value]) => {
      this.headers.set(key, value);
    });
  }

  async request(url, options = {}) {
    const fullURL = this.baseURL + url;
    const sessionHeaders = Object.fromEntries(this.headers);

    const config = {
      ...options,
      headers: {
        ...sessionHeaders,
        ...(options.headers || {})
      }
    };

    const response = await fetch(fullURL, config);

    if (!response.ok) {
      throw new Error(`HTTP Error: ${response.status}`);
    }

    return response;
  }

  async get(url, options = {}) {
    return this.request(url, { ...options, method: 'GET' });
  }

  async post(url, data, options = {}) {
    return this.request(url, {
      ...options,
      method: 'POST',
      body: JSON.stringify(data),
      headers: {
        'Content-Type': 'application/json',
        ...(options.headers || {})
      }
    });
  }
}

// Usage
const session = new HTTPSession('https://api.example.com', {
  'User-Agent': 'Custom-Client/1.0',
  'Accept': 'application/json',
  'X-Client-ID': 'web-client'
});

session.setHeader('Authorization', 'Bearer your-token');

// Make requests
const data = await session.get('/users');
const result = await session.post('/users', { name: 'John Doe' });

Advanced Session Management

Request Interceptors and Middleware

import requests
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class CustomSession(requests.Session):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.setup_session()

    def setup_session(self):
        # Default headers
        self.headers.update({
            'User-Agent': 'Advanced-Scraper/2.0',
            'Accept': 'application/json, text/html, */*',
            'Accept-Language': 'en-US,en;q=0.9',
            'Cache-Control': 'no-cache'
        })

        # Retry strategy
        retry_strategy = Retry(
            total=3,
            backoff_factor=1,
            status_forcelist=[429, 500, 502, 503, 504]
        )

        adapter = HTTPAdapter(max_retries=retry_strategy)
        self.mount("http://", adapter)
        self.mount("https://", adapter)

    def request(self, method, url, **kwargs):
        # Add timestamp header to all requests
        self.headers['X-Request-Time'] = str(int(time.time()))

        # Log request
        print(f"Making {method} request to {url}")

        return super().request(method, url, **kwargs)

# Usage with context manager
def scrape_with_session():
    with CustomSession() as session:
        session.headers['Authorization'] = 'Bearer token'

        response = session.get('https://api.example.com/data')
        return response.json()

Browser Session Simulation

For more complex scenarios involving browser session handling, you might need to coordinate between HTTP sessions and browser automation:

import requests
from selenium import webdriver

class BrowserHTTPSession:
    def __init__(self):
        self.http_session = requests.Session()
        self.driver = None

    def start_browser(self):
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        self.driver = webdriver.Chrome(options=options)

    def sync_cookies_to_http(self):
        if self.driver:
            # Transfer cookies from browser to HTTP session
            for cookie in self.driver.get_cookies():
                self.http_session.cookies.set(
                    cookie['name'], 
                    cookie['value'],
                    domain=cookie.get('domain')
                )

    def set_headers(self, headers):
        self.http_session.headers.update(headers)

        # Also set headers in browser if needed
        if self.driver:
            self.driver.execute_cdp_cmd('Network.setUserAgentOverride', {
                "userAgent": headers.get('User-Agent', '')
            })

# Usage
session = BrowserHTTPSession()
session.set_headers({
    'Authorization': 'Bearer token',
    'User-Agent': 'Custom Browser Agent'
})

Best Practices and Security Considerations

Header Security

import os
from requests import Session

class SecureSession(Session):
    def __init__(self):
        super().__init__()
        self.setup_secure_headers()

    def setup_secure_headers(self):
        # Never hardcode sensitive headers
        api_key = os.getenv('API_KEY')
        if api_key:
            self.headers['Authorization'] = f'Bearer {api_key}'

        # Security headers
        self.headers.update({
            'User-Agent': 'SecureClient/1.0',
            'X-Content-Type-Options': 'nosniff',
            'X-Frame-Options': 'DENY',
            'X-XSS-Protection': '1; mode=block'
        })

    def rotate_user_agent(self):
        """Rotate user agent to avoid detection"""
        user_agents = [
            'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
            'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
        ]
        import random
        self.headers['User-Agent'] = random.choice(user_agents)

Error Handling and Debugging

import logging
from requests import Session
from requests.exceptions import RequestException

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class DebugSession(Session):
    def request(self, method, url, **kwargs):
        # Log request headers
        logger.info(f"Request Headers: {dict(self.headers)}")

        try:
            response = super().request(method, url, **kwargs)
            logger.info(f"Response Status: {response.status_code}")
            return response
        except RequestException as e:
            logger.error(f"Request failed: {e}")
            raise

# Environment-specific configuration
def create_session(environment='production'):
    session = DebugSession()

    if environment == 'development':
        session.headers.update({
            'X-Debug-Mode': 'true',
            'X-Environment': 'dev'
        })
    elif environment == 'production':
        session.headers.update({
            'X-Environment': 'prod',
            'X-Client-Version': '1.0.0'
        })

    return session

Working with Different HTTP Libraries

cURL Command Line

# Create a session-like behavior with cURL using a cookie jar
curl -c cookies.txt -H "Authorization: Bearer your-token" \
     -H "User-Agent: Custom-Client/1.0" \
     https://api.example.com/login

# Subsequent requests will use the same cookies and headers
curl -b cookies.txt -H "Authorization: Bearer your-token" \
     -H "User-Agent: Custom-Client/1.0" \
     https://api.example.com/data

PHP with Guzzle

<?php
use GuzzleHttp\Client;

$client = new Client([
    'base_uri' => 'https://api.example.com',
    'headers' => [
        'User-Agent' => 'Custom-PHP-Client/1.0',
        'Accept' => 'application/json',
        'Authorization' => 'Bearer your-token'
    ]
]);

// All requests will include the default headers
$response = $client->get('/data');
$postResponse = $client->post('/submit', [
    'json' => ['key' => 'value']
]);
?>

Testing Session Headers

import unittest
from unittest.mock import patch, Mock
import requests

class TestSessionHeaders(unittest.TestCase):
    def setUp(self):
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': 'Bearer test-token',
            'User-Agent': 'Test-Client/1.0'
        })

    @patch('requests.Session.request')
    def test_headers_included_in_request(self, mock_request):
        mock_response = Mock()
        mock_response.status_code = 200
        mock_request.return_value = mock_response

        self.session.get('https://api.example.com/test')

        # Verify headers were passed
        call_kwargs = mock_request.call_args[1]
        self.assertIn('Authorization', self.session.headers)
        self.assertEqual(
            self.session.headers['Authorization'], 
            'Bearer test-token'
        )

if __name__ == '__main__':
    unittest.main()

Performance Considerations

Connection Pooling and Keep-Alive

import requests
from requests.adapters import HTTPAdapter

session = requests.Session()

# Configure connection pooling
adapter = HTTPAdapter(
    pool_connections=100,  # Number of connection pools
    pool_maxsize=100,      # Max connections per pool
    max_retries=3
)

session.mount('http://', adapter)
session.mount('https://', adapter)

# Set persistent headers
session.headers.update({
    'Connection': 'keep-alive',
    'Keep-Alive': 'timeout=5, max=1000',
    'User-Agent': 'Persistent-Client/1.0'
})

Monitoring and Analytics

import time
from collections import defaultdict

class AnalyticsSession(requests.Session):
    def __init__(self):
        super().__init__()
        self.request_count = defaultdict(int)
        self.response_times = []

    def request(self, method, url, **kwargs):
        start_time = time.time()

        # Add analytics headers
        self.headers['X-Request-ID'] = f"req_{int(time.time())}"
        self.headers['X-Client-Session'] = 'analytics-enabled'

        response = super().request(method, url, **kwargs)

        # Track metrics
        elapsed = time.time() - start_time
        self.request_count[method] += 1
        self.response_times.append(elapsed)

        return response

    def get_stats(self):
        return {
            'total_requests': sum(self.request_count.values()),
            'methods': dict(self.request_count),
            'avg_response_time': sum(self.response_times) / len(self.response_times) if self.response_times else 0
        }

Conclusion

Setting custom headers for all requests in a session is essential for effective web scraping and API integration. Whether using Python's requests library, JavaScript's axios, or implementing custom session managers, the key principles remain consistent: establish default headers, provide methods for dynamic updates, and implement proper error handling.

For more complex scenarios involving browser automation and AJAX request handling, consider combining HTTP sessions with headless browser tools to maintain consistent session state across different interaction methods.

Remember to always secure sensitive header information, implement proper logging for debugging, and test your session configuration thoroughly to ensure reliable web scraping operations. When working with authentication flows, proper browser session management becomes crucial for maintaining state across complex user interactions.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon