Table of contents

How to Set Custom Headers in Puppeteer Requests

Setting custom headers in Puppeteer requests is essential for various web scraping scenarios, including authentication, API access, mobile device emulation, and bypassing certain restrictions. This guide provides comprehensive methods to configure custom headers in your Puppeteer applications.

Understanding HTTP Headers in Puppeteer

HTTP headers are key-value pairs sent with HTTP requests that provide additional information about the request or the client. In web scraping, custom headers help you:

  • Authenticate with APIs or protected resources
  • Mimic different browsers or devices
  • Pass additional metadata to servers
  • Bypass basic bot detection mechanisms

Method 1: Setting Headers Using page.setExtraHTTPHeaders()

The most common way to set custom headers in Puppeteer is using the page.setExtraHTTPHeaders() method. This sets headers for all subsequent requests made by the page.

Basic Implementation

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Set custom headers
  await page.setExtraHTTPHeaders({
    'Authorization': 'Bearer your-token-here',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
    'Accept': 'application/json',
    'X-Custom-Header': 'custom-value'
  });

  await page.goto('https://example.com/api/data');

  // Your scraping logic here
  const content = await page.content();
  console.log(content);

  await browser.close();
})();

Advanced Example with Multiple Headers

const puppeteer = require('puppeteer');

async function scrapeWithCustomHeaders() {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  // Set comprehensive custom headers
  await page.setExtraHTTPHeaders({
    'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Accept-Encoding': 'gzip, deflate',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'X-Requested-With': 'XMLHttpRequest',
    'X-API-Key': 'your-api-key-here',
    'Referer': 'https://example.com'
  });

  try {
    await page.goto('https://api.example.com/protected-endpoint', {
      waitUntil: 'networkidle2'
    });

    const data = await page.evaluate(() => {
      return document.querySelector('pre').textContent;
    });

    console.log('API Response:', JSON.parse(data));
  } catch (error) {
    console.error('Error:', error);
  }

  await browser.close();
}

scrapeWithCustomHeaders();

Method 2: Using Request Interception

For more granular control over headers, you can use request interception to modify headers on a per-request basis.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Enable request interception
  await page.setRequestInterception(true);

  page.on('request', (request) => {
    // Modify headers for specific requests
    const headers = Object.assign({}, request.headers(), {
      'Authorization': 'Bearer your-dynamic-token',
      'X-Custom-Header': 'value-for-this-request'
    });

    request.continue({
      headers: headers
    });
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Conditional Header Setting

const puppeteer = require('puppeteer');

async function scrapeWithConditionalHeaders() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (request) => {
    const url = request.url();
    let headers = request.headers();

    // Set different headers based on URL patterns
    if (url.includes('/api/')) {
      headers['Authorization'] = 'Bearer api-token';
      headers['Content-Type'] = 'application/json';
    } else if (url.includes('/images/')) {
      headers['Accept'] = 'image/webp,image/apng,image/*,*/*;q=0.8';
    } else {
      headers['User-Agent'] = 'Mozilla/5.0 (compatible; CustomBot/1.0)';
    }

    request.continue({ headers });
  });

  await page.goto('https://example.com');
  await browser.close();
}

scrapeWithConditionalHeaders();

Method 3: Setting Headers During Browser Launch

You can also set default headers at the browser level using launch arguments:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    args: [
      '--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      '--accept-language=en-US,en;q=0.9'
    ]
  });

  const page = await browser.newPage();
  await page.goto('https://example.com');
  await browser.close();
})();

Common Use Cases and Examples

Authentication Headers

// API Key Authentication
await page.setExtraHTTPHeaders({
  'X-API-Key': 'your-api-key',
  'Authorization': 'Bearer ' + process.env.ACCESS_TOKEN
});

// Basic Authentication
const credentials = Buffer.from('username:password').toString('base64');
await page.setExtraHTTPHeaders({
  'Authorization': 'Basic ' + credentials
});

Device and Browser Emulation

// Mobile device headers
await page.setExtraHTTPHeaders({
  'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en-US,en;q=0.5',
  'Accept-Encoding': 'gzip, deflate'
});

// Chrome browser headers
await page.setExtraHTTPHeaders({
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
  'sec-ch-ua': '"Google Chrome";v="91", "Chromium";v="91", ";Not A Brand";v="99"',
  'sec-ch-ua-mobile': '?0',
  'sec-fetch-dest': 'document',
  'sec-fetch-mode': 'navigate',
  'sec-fetch-site': 'none'
});

Content Type and Accept Headers

// JSON API requests
await page.setExtraHTTPHeaders({
  'Content-Type': 'application/json',
  'Accept': 'application/json',
  'Cache-Control': 'no-cache'
});

// Form submission headers
await page.setExtraHTTPHeaders({
  'Content-Type': 'application/x-www-form-urlencoded',
  'Accept': 'text/html,application/xhtml+xml',
  'Origin': 'https://example.com'
});

Best Practices and Tips

1. Header Consistency

Ensure your custom headers are consistent with the browser you're trying to emulate:

const chromeHeaders = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
  'Accept-Language': 'en-US,en;q=0.5',
  'Accept-Encoding': 'gzip, deflate',
  'Connection': 'keep-alive',
  'Upgrade-Insecure-Requests': '1'
};

await page.setExtraHTTPHeaders(chromeHeaders);

2. Dynamic Header Updates

Update headers dynamically based on application state:

async function updateAuthHeaders(page, newToken) {
  await page.setExtraHTTPHeaders({
    'Authorization': `Bearer ${newToken}`,
    'X-Timestamp': Date.now().toString()
  });
}

// Usage
await updateAuthHeaders(page, await getNewAccessToken());

3. Error Handling

Always implement proper error handling when setting headers:

try {
  await page.setExtraHTTPHeaders({
    'Authorization': 'Bearer ' + token,
    'X-Custom-Header': customValue
  });

  await page.goto(url);
} catch (error) {
  console.error('Failed to set headers or navigate:', error);
  // Handle the error appropriately
}

Troubleshooting Common Issues

Headers Not Being Applied

If your headers aren't being applied, ensure you're setting them before navigation:

// Correct order
await page.setExtraHTTPHeaders({ 'Authorization': 'Bearer token' });
await page.goto('https://example.com');

// Incorrect order
await page.goto('https://example.com');
await page.setExtraHTTPHeaders({ 'Authorization': 'Bearer token' }); // Too late!

Case Sensitivity

HTTP headers are case-insensitive, but some servers may be particular about casing:

// Both are valid, but be consistent
await page.setExtraHTTPHeaders({
  'User-Agent': 'CustomBot/1.0',     // Pascal case
  'user-agent': 'CustomBot/1.0'      // Lowercase
});

Integration with Other Tools

When working with headless browser automation, you might also want to explore similar header setting capabilities in Playwright for cross-browser compatibility. Additionally, understanding how to handle cookies and sessions can complement your header management strategy.

Conclusion

Setting custom headers in Puppeteer is crucial for successful web scraping and automation. Whether you're dealing with authentication, API access, or browser emulation, the methods outlined in this guide provide flexible solutions for various scenarios. Remember to always test your header configurations thoroughly and implement proper error handling for production applications.

The key is to choose the right method based on your specific needs: use setExtraHTTPHeaders() for simple, page-wide header settings, and request interception for more complex, conditional header management.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon