How to Set Custom Headers in Puppeteer Requests
Setting custom headers in Puppeteer requests is essential for various web scraping scenarios, including authentication, API access, mobile device emulation, and bypassing certain restrictions. This guide provides comprehensive methods to configure custom headers in your Puppeteer applications.
Understanding HTTP Headers in Puppeteer
HTTP headers are key-value pairs sent with HTTP requests that provide additional information about the request or the client. In web scraping, custom headers help you:
- Authenticate with APIs or protected resources
- Mimic different browsers or devices
- Pass additional metadata to servers
- Bypass basic bot detection mechanisms
Method 1: Setting Headers Using page.setExtraHTTPHeaders()
The most common way to set custom headers in Puppeteer is using the page.setExtraHTTPHeaders()
method. This sets headers for all subsequent requests made by the page.
Basic Implementation
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set custom headers
await page.setExtraHTTPHeaders({
'Authorization': 'Bearer your-token-here',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json',
'X-Custom-Header': 'custom-value'
});
await page.goto('https://example.com/api/data');
// Your scraping logic here
const content = await page.content();
console.log(content);
await browser.close();
})();
Advanced Example with Multiple Headers
const puppeteer = require('puppeteer');
async function scrapeWithCustomHeaders() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// Set comprehensive custom headers
await page.setExtraHTTPHeaders({
'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'X-Requested-With': 'XMLHttpRequest',
'X-API-Key': 'your-api-key-here',
'Referer': 'https://example.com'
});
try {
await page.goto('https://api.example.com/protected-endpoint', {
waitUntil: 'networkidle2'
});
const data = await page.evaluate(() => {
return document.querySelector('pre').textContent;
});
console.log('API Response:', JSON.parse(data));
} catch (error) {
console.error('Error:', error);
}
await browser.close();
}
scrapeWithCustomHeaders();
Method 2: Using Request Interception
For more granular control over headers, you can use request interception to modify headers on a per-request basis.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
page.on('request', (request) => {
// Modify headers for specific requests
const headers = Object.assign({}, request.headers(), {
'Authorization': 'Bearer your-dynamic-token',
'X-Custom-Header': 'value-for-this-request'
});
request.continue({
headers: headers
});
});
await page.goto('https://example.com');
await browser.close();
})();
Conditional Header Setting
const puppeteer = require('puppeteer');
async function scrapeWithConditionalHeaders() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
const url = request.url();
let headers = request.headers();
// Set different headers based on URL patterns
if (url.includes('/api/')) {
headers['Authorization'] = 'Bearer api-token';
headers['Content-Type'] = 'application/json';
} else if (url.includes('/images/')) {
headers['Accept'] = 'image/webp,image/apng,image/*,*/*;q=0.8';
} else {
headers['User-Agent'] = 'Mozilla/5.0 (compatible; CustomBot/1.0)';
}
request.continue({ headers });
});
await page.goto('https://example.com');
await browser.close();
}
scrapeWithConditionalHeaders();
Method 3: Setting Headers During Browser Launch
You can also set default headers at the browser level using launch arguments:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: [
'--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'--accept-language=en-US,en;q=0.9'
]
});
const page = await browser.newPage();
await page.goto('https://example.com');
await browser.close();
})();
Common Use Cases and Examples
Authentication Headers
// API Key Authentication
await page.setExtraHTTPHeaders({
'X-API-Key': 'your-api-key',
'Authorization': 'Bearer ' + process.env.ACCESS_TOKEN
});
// Basic Authentication
const credentials = Buffer.from('username:password').toString('base64');
await page.setExtraHTTPHeaders({
'Authorization': 'Basic ' + credentials
});
Device and Browser Emulation
// Mobile device headers
await page.setExtraHTTPHeaders({
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate'
});
// Chrome browser headers
await page.setExtraHTTPHeaders({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="91", "Chromium";v="91", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-fetch-dest': 'document',
'sec-fetch-mode': 'navigate',
'sec-fetch-site': 'none'
});
Content Type and Accept Headers
// JSON API requests
await page.setExtraHTTPHeaders({
'Content-Type': 'application/json',
'Accept': 'application/json',
'Cache-Control': 'no-cache'
});
// Form submission headers
await page.setExtraHTTPHeaders({
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'text/html,application/xhtml+xml',
'Origin': 'https://example.com'
});
Best Practices and Tips
1. Header Consistency
Ensure your custom headers are consistent with the browser you're trying to emulate:
const chromeHeaders = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
};
await page.setExtraHTTPHeaders(chromeHeaders);
2. Dynamic Header Updates
Update headers dynamically based on application state:
async function updateAuthHeaders(page, newToken) {
await page.setExtraHTTPHeaders({
'Authorization': `Bearer ${newToken}`,
'X-Timestamp': Date.now().toString()
});
}
// Usage
await updateAuthHeaders(page, await getNewAccessToken());
3. Error Handling
Always implement proper error handling when setting headers:
try {
await page.setExtraHTTPHeaders({
'Authorization': 'Bearer ' + token,
'X-Custom-Header': customValue
});
await page.goto(url);
} catch (error) {
console.error('Failed to set headers or navigate:', error);
// Handle the error appropriately
}
Troubleshooting Common Issues
Headers Not Being Applied
If your headers aren't being applied, ensure you're setting them before navigation:
// Correct order
await page.setExtraHTTPHeaders({ 'Authorization': 'Bearer token' });
await page.goto('https://example.com');
// Incorrect order
await page.goto('https://example.com');
await page.setExtraHTTPHeaders({ 'Authorization': 'Bearer token' }); // Too late!
Case Sensitivity
HTTP headers are case-insensitive, but some servers may be particular about casing:
// Both are valid, but be consistent
await page.setExtraHTTPHeaders({
'User-Agent': 'CustomBot/1.0', // Pascal case
'user-agent': 'CustomBot/1.0' // Lowercase
});
Integration with Other Tools
When working with headless browser automation, you might also want to explore similar header setting capabilities in Playwright for cross-browser compatibility. Additionally, understanding how to handle cookies and sessions can complement your header management strategy.
Conclusion
Setting custom headers in Puppeteer is crucial for successful web scraping and automation. Whether you're dealing with authentication, API access, or browser emulation, the methods outlined in this guide provide flexible solutions for various scenarios. Remember to always test your header configurations thoroughly and implement proper error handling for production applications.
The key is to choose the right method based on your specific needs: use setExtraHTTPHeaders()
for simple, page-wide header settings, and request interception for more complex, conditional header management.