How do I set custom headers for all requests in a session?
Setting custom headers for all requests in a session is a fundamental requirement in web scraping and API interactions. Whether you need to authenticate, mimic browser behavior, or comply with API requirements, properly configuring session headers ensures consistent and successful HTTP communications.
Understanding Sessions and Headers
A session maintains certain parameters across multiple HTTP requests, including cookies, authentication tokens, and headers. Custom headers are particularly useful for:
- Authentication: Bearer tokens, API keys, or custom auth schemes
- User-Agent spoofing: Mimicking different browsers or devices
- Content negotiation: Specifying preferred response formats
- Rate limiting compliance: Including required tracking headers
- CORS handling: Adding necessary cross-origin headers
Python Requests Library
The Python requests
library provides the most straightforward approach to session management with custom headers.
Basic Session Configuration
import requests
# Create a session object
session = requests.Session()
# Set headers for all requests in this session
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json, text/html, */*',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
})
# All subsequent requests will include these headers
response1 = session.get('https://api.example.com/data')
response2 = session.post('https://api.example.com/submit', json={'key': 'value'})
Authentication Headers
import requests
session = requests.Session()
# API key authentication
session.headers.update({
'Authorization': 'Bearer your-api-token-here',
'X-API-Key': 'your-api-key-here',
'Content-Type': 'application/json'
})
# OAuth 2.0 example
session.headers.update({
'Authorization': f'Bearer {access_token}',
'Accept': 'application/vnd.api+json'
})
# Make authenticated requests
user_data = session.get('https://api.example.com/user/profile')
update_response = session.patch('https://api.example.com/user/settings',
json={'theme': 'dark'})
Dynamic Header Updates
class WebScrapingSession:
def __init__(self):
self.session = requests.Session()
self._setup_default_headers()
def _setup_default_headers(self):
self.session.headers.update({
'User-Agent': 'Custom-Scraper/1.0',
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.5',
'Cache-Control': 'no-cache',
'Pragma': 'no-cache'
})
def set_authentication(self, token):
self.session.headers['Authorization'] = f'Bearer {token}'
def set_custom_header(self, key, value):
self.session.headers[key] = value
def get(self, url, **kwargs):
return self.session.get(url, **kwargs)
def post(self, url, **kwargs):
return self.session.post(url, **kwargs)
# Usage
scraper = WebScrapingSession()
scraper.set_authentication('your-token')
scraper.set_custom_header('X-Client-Version', '2.1.0')
response = scraper.get('https://api.example.com/protected-endpoint')
JavaScript and Node.js
Using Axios
const axios = require('axios');
// Create an axios instance with default headers
const apiClient = axios.create({
baseURL: 'https://api.example.com',
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-Requested-With': 'XMLHttpRequest'
},
timeout: 10000
});
// Add authentication token to all requests
apiClient.defaults.headers.common['Authorization'] = 'Bearer your-token-here';
// Use the configured client
async function fetchData() {
try {
const response = await apiClient.get('/data');
const postResponse = await apiClient.post('/submit', { data: 'value' });
return { response, postResponse };
} catch (error) {
console.error('Request failed:', error.message);
}
}
Using Fetch with Custom Headers
class HTTPSession {
constructor(baseURL = '', defaultHeaders = {}) {
this.baseURL = baseURL;
this.headers = new Map(Object.entries(defaultHeaders));
}
setHeader(key, value) {
this.headers.set(key, value);
}
setHeaders(headers) {
Object.entries(headers).forEach(([key, value]) => {
this.headers.set(key, value);
});
}
async request(url, options = {}) {
const fullURL = this.baseURL + url;
const sessionHeaders = Object.fromEntries(this.headers);
const config = {
...options,
headers: {
...sessionHeaders,
...(options.headers || {})
}
};
const response = await fetch(fullURL, config);
if (!response.ok) {
throw new Error(`HTTP Error: ${response.status}`);
}
return response;
}
async get(url, options = {}) {
return this.request(url, { ...options, method: 'GET' });
}
async post(url, data, options = {}) {
return this.request(url, {
...options,
method: 'POST',
body: JSON.stringify(data),
headers: {
'Content-Type': 'application/json',
...(options.headers || {})
}
});
}
}
// Usage
const session = new HTTPSession('https://api.example.com', {
'User-Agent': 'Custom-Client/1.0',
'Accept': 'application/json',
'X-Client-ID': 'web-client'
});
session.setHeader('Authorization', 'Bearer your-token');
// Make requests
const data = await session.get('/users');
const result = await session.post('/users', { name: 'John Doe' });
Advanced Session Management
Request Interceptors and Middleware
import requests
import time
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
class CustomSession(requests.Session):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.setup_session()
def setup_session(self):
# Default headers
self.headers.update({
'User-Agent': 'Advanced-Scraper/2.0',
'Accept': 'application/json, text/html, */*',
'Accept-Language': 'en-US,en;q=0.9',
'Cache-Control': 'no-cache'
})
# Retry strategy
retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
self.mount("http://", adapter)
self.mount("https://", adapter)
def request(self, method, url, **kwargs):
# Add timestamp header to all requests
self.headers['X-Request-Time'] = str(int(time.time()))
# Log request
print(f"Making {method} request to {url}")
return super().request(method, url, **kwargs)
# Usage with context manager
def scrape_with_session():
with CustomSession() as session:
session.headers['Authorization'] = 'Bearer token'
response = session.get('https://api.example.com/data')
return response.json()
Browser Session Simulation
For more complex scenarios involving browser session handling, you might need to coordinate between HTTP sessions and browser automation:
import requests
from selenium import webdriver
class BrowserHTTPSession:
def __init__(self):
self.http_session = requests.Session()
self.driver = None
def start_browser(self):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
self.driver = webdriver.Chrome(options=options)
def sync_cookies_to_http(self):
if self.driver:
# Transfer cookies from browser to HTTP session
for cookie in self.driver.get_cookies():
self.http_session.cookies.set(
cookie['name'],
cookie['value'],
domain=cookie.get('domain')
)
def set_headers(self, headers):
self.http_session.headers.update(headers)
# Also set headers in browser if needed
if self.driver:
self.driver.execute_cdp_cmd('Network.setUserAgentOverride', {
"userAgent": headers.get('User-Agent', '')
})
# Usage
session = BrowserHTTPSession()
session.set_headers({
'Authorization': 'Bearer token',
'User-Agent': 'Custom Browser Agent'
})
Best Practices and Security Considerations
Header Security
import os
from requests import Session
class SecureSession(Session):
def __init__(self):
super().__init__()
self.setup_secure_headers()
def setup_secure_headers(self):
# Never hardcode sensitive headers
api_key = os.getenv('API_KEY')
if api_key:
self.headers['Authorization'] = f'Bearer {api_key}'
# Security headers
self.headers.update({
'User-Agent': 'SecureClient/1.0',
'X-Content-Type-Options': 'nosniff',
'X-Frame-Options': 'DENY',
'X-XSS-Protection': '1; mode=block'
})
def rotate_user_agent(self):
"""Rotate user agent to avoid detection"""
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
]
import random
self.headers['User-Agent'] = random.choice(user_agents)
Error Handling and Debugging
import logging
from requests import Session
from requests.exceptions import RequestException
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class DebugSession(Session):
def request(self, method, url, **kwargs):
# Log request headers
logger.info(f"Request Headers: {dict(self.headers)}")
try:
response = super().request(method, url, **kwargs)
logger.info(f"Response Status: {response.status_code}")
return response
except RequestException as e:
logger.error(f"Request failed: {e}")
raise
# Environment-specific configuration
def create_session(environment='production'):
session = DebugSession()
if environment == 'development':
session.headers.update({
'X-Debug-Mode': 'true',
'X-Environment': 'dev'
})
elif environment == 'production':
session.headers.update({
'X-Environment': 'prod',
'X-Client-Version': '1.0.0'
})
return session
Working with Different HTTP Libraries
cURL Command Line
# Create a session-like behavior with cURL using a cookie jar
curl -c cookies.txt -H "Authorization: Bearer your-token" \
-H "User-Agent: Custom-Client/1.0" \
https://api.example.com/login
# Subsequent requests will use the same cookies and headers
curl -b cookies.txt -H "Authorization: Bearer your-token" \
-H "User-Agent: Custom-Client/1.0" \
https://api.example.com/data
PHP with Guzzle
<?php
use GuzzleHttp\Client;
$client = new Client([
'base_uri' => 'https://api.example.com',
'headers' => [
'User-Agent' => 'Custom-PHP-Client/1.0',
'Accept' => 'application/json',
'Authorization' => 'Bearer your-token'
]
]);
// All requests will include the default headers
$response = $client->get('/data');
$postResponse = $client->post('/submit', [
'json' => ['key' => 'value']
]);
?>
Testing Session Headers
import unittest
from unittest.mock import patch, Mock
import requests
class TestSessionHeaders(unittest.TestCase):
def setUp(self):
self.session = requests.Session()
self.session.headers.update({
'Authorization': 'Bearer test-token',
'User-Agent': 'Test-Client/1.0'
})
@patch('requests.Session.request')
def test_headers_included_in_request(self, mock_request):
mock_response = Mock()
mock_response.status_code = 200
mock_request.return_value = mock_response
self.session.get('https://api.example.com/test')
# Verify headers were passed
call_kwargs = mock_request.call_args[1]
self.assertIn('Authorization', self.session.headers)
self.assertEqual(
self.session.headers['Authorization'],
'Bearer test-token'
)
if __name__ == '__main__':
unittest.main()
Performance Considerations
Connection Pooling and Keep-Alive
import requests
from requests.adapters import HTTPAdapter
session = requests.Session()
# Configure connection pooling
adapter = HTTPAdapter(
pool_connections=100, # Number of connection pools
pool_maxsize=100, # Max connections per pool
max_retries=3
)
session.mount('http://', adapter)
session.mount('https://', adapter)
# Set persistent headers
session.headers.update({
'Connection': 'keep-alive',
'Keep-Alive': 'timeout=5, max=1000',
'User-Agent': 'Persistent-Client/1.0'
})
Monitoring and Analytics
import time
from collections import defaultdict
class AnalyticsSession(requests.Session):
def __init__(self):
super().__init__()
self.request_count = defaultdict(int)
self.response_times = []
def request(self, method, url, **kwargs):
start_time = time.time()
# Add analytics headers
self.headers['X-Request-ID'] = f"req_{int(time.time())}"
self.headers['X-Client-Session'] = 'analytics-enabled'
response = super().request(method, url, **kwargs)
# Track metrics
elapsed = time.time() - start_time
self.request_count[method] += 1
self.response_times.append(elapsed)
return response
def get_stats(self):
return {
'total_requests': sum(self.request_count.values()),
'methods': dict(self.request_count),
'avg_response_time': sum(self.response_times) / len(self.response_times) if self.response_times else 0
}
Conclusion
Setting custom headers for all requests in a session is essential for effective web scraping and API integration. Whether using Python's requests library, JavaScript's axios, or implementing custom session managers, the key principles remain consistent: establish default headers, provide methods for dynamic updates, and implement proper error handling.
For more complex scenarios involving browser automation and AJAX request handling, consider combining HTTP sessions with headless browser tools to maintain consistent session state across different interaction methods.
Remember to always secure sensitive header information, implement proper logging for debugging, and test your session configuration thoroughly to ensure reliable web scraping operations. When working with authentication flows, proper browser session management becomes crucial for maintaining state across complex user interactions.