Table of contents

How to Intercept and Modify HTTP requests in Puppeteer

Intercepting and modifying HTTP requests in Puppeteer is a powerful feature that allows you to control network traffic, modify request headers, block certain resources, or redirect requests to different endpoints. This capability is essential for web scraping, testing, and automation scenarios where you need fine-grained control over network interactions.

Understanding Request Interception

Request interception in Puppeteer works by enabling the requestInterception feature on a page, which allows you to intercept all outgoing HTTP requests before they're sent to the server. Once intercepted, you can examine, modify, or completely block these requests.

Basic Request Interception Setup

To start intercepting requests, you need to enable request interception on the page:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Enable request interception
  await page.setRequestInterception(true);

  // Listen for requests
  page.on('request', (request) => {
    console.log('Request URL:', request.url());
    console.log('Request method:', request.method());
    console.log('Request headers:', request.headers());

    // Continue with the original request
    request.continue();
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Modifying Request Headers

You can modify request headers to add authentication tokens, change user agents, or add custom headers:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (request) => {
    const headers = {
      ...request.headers(),
      'Authorization': 'Bearer your-token-here',
      'X-Custom-Header': 'custom-value',
      'User-Agent': 'CustomBot/1.0'
    };

    request.continue({ headers });
  });

  await page.goto('https://api.example.com');
  await browser.close();
})();

Blocking Specific Resources

Block unnecessary resources like images, stylesheets, or ads to improve performance:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (request) => {
    const resourceType = request.resourceType();
    const url = request.url();

    // Block images, stylesheets, and fonts
    if (['image', 'stylesheet', 'font'].includes(resourceType)) {
      request.abort();
      return;
    }

    // Block specific domains (e.g., ads, analytics)
    if (url.includes('google-analytics.com') || 
        url.includes('doubleclick.net') || 
        url.includes('facebook.com/tr')) {
      request.abort();
      return;
    }

    request.continue();
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Modifying Request URLs and Methods

You can redirect requests to different URLs or change HTTP methods:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (request) => {
    const url = request.url();

    // Redirect API calls to a local mock server
    if (url.includes('api.example.com')) {
      const newUrl = url.replace('api.example.com', 'localhost:3000');
      request.continue({ url: newUrl });
      return;
    }

    // Change GET requests to POST for specific endpoints
    if (url.includes('/search') && request.method() === 'GET') {
      request.continue({
        method: 'POST',
        headers: {
          ...request.headers(),
          'Content-Type': 'application/json'
        },
        postData: JSON.stringify({ query: 'modified search' })
      });
      return;
    }

    request.continue();
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Modifying POST Data

Intercept and modify POST request data:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (request) => {
    if (request.method() === 'POST' && request.url().includes('/api/login')) {
      const postData = request.postData();

      if (postData) {
        try {
          const data = JSON.parse(postData);
          // Modify the login data
          data.username = 'modified_username';
          data.additional_field = 'injected_value';

          request.continue({
            postData: JSON.stringify(data),
            headers: {
              ...request.headers(),
              'Content-Type': 'application/json'
            }
          });
          return;
        } catch (e) {
          console.error('Error parsing POST data:', e);
        }
      }
    }

    request.continue();
  });

  await page.goto('https://example.com/login');
  await browser.close();
})();

Advanced Request Interception with Response Mocking

You can also mock responses by intercepting requests and providing custom responses:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', (request) => {
    if (request.url().includes('/api/data')) {
      // Mock the API response
      request.respond({
        status: 200,
        contentType: 'application/json',
        body: JSON.stringify({
          success: true,
          data: [
            { id: 1, name: 'Mocked Item 1' },
            { id: 2, name: 'Mocked Item 2' }
          ]
        })
      });
      return;
    }

    request.continue();
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Logging and Debugging Requests

Create a comprehensive logging system for requests:

const puppeteer = require('puppeteer');
const fs = require('fs');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  const requestLog = [];

  page.on('request', (request) => {
    const requestData = {
      url: request.url(),
      method: request.method(),
      headers: request.headers(),
      postData: request.postData(),
      timestamp: new Date().toISOString()
    };

    requestLog.push(requestData);
    console.log(`${request.method()} ${request.url()}`);

    request.continue();
  });

  page.on('response', (response) => {
    console.log(`Response: ${response.status()} ${response.url()}`);
  });

  await page.goto('https://example.com');

  // Save request log to file
  fs.writeFileSync('request_log.json', JSON.stringify(requestLog, null, 2));

  await browser.close();
})();

Handling Authentication and Sessions

Intercept requests to add authentication tokens or manage sessions:

const puppeteer = require('puppeteer');

class AuthenticatedScraper {
  constructor() {
    this.authToken = null;
  }

  async login(page, username, password) {
    // Perform login and extract token
    await page.goto('https://example.com/login');
    // ... login logic
    this.authToken = 'extracted-token';
  }

  async setupInterception(page) {
    await page.setRequestInterception(true);

    page.on('request', (request) => {
      const url = request.url();

      // Add authentication to API requests
      if (url.includes('/api/') && this.authToken) {
        const headers = {
          ...request.headers(),
          'Authorization': `Bearer ${this.authToken}`
        };

        request.continue({ headers });
        return;
      }

      request.continue();
    });
  }

  async scrape() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await this.login(page, 'username', 'password');
    await this.setupInterception(page);

    // Now all API requests will include authentication
    await page.goto('https://example.com/protected-page');

    await browser.close();
  }
}

const scraper = new AuthenticatedScraper();
scraper.scrape();

Performance Optimization

Optimize request interception for better performance:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  // Create a set of blocked domains for faster lookup
  const blockedDomains = new Set([
    'google-analytics.com',
    'googletagmanager.com',
    'doubleclick.net',
    'facebook.com',
    'twitter.com'
  ]);

  page.on('request', (request) => {
    const url = new URL(request.url());

    // Quick domain check
    if (blockedDomains.has(url.hostname)) {
      request.abort();
      return;
    }

    // Block non-essential resources
    const resourceType = request.resourceType();
    if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
      request.abort();
      return;
    }

    request.continue();
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Python Implementation with Pyppeteer

For Python developers, here's how to implement request interception using Pyppeteer:

import asyncio
import json
from pyppeteer import launch

async def intercept_requests():
    browser = await launch()
    page = await browser.newPage()

    # Enable request interception
    await page.setRequestInterception(True)

    async def handle_request(request):
        # Log request details
        print(f"Request: {request.method} {request.url}")

        # Modify headers
        headers = request.headers.copy()
        headers['User-Agent'] = 'Python-Scraper/1.0'
        headers['X-Custom-Header'] = 'modified-by-python'

        # Block images and stylesheets
        if request.resourceType in ['image', 'stylesheet']:
            await request.abort()
            return

        # Continue with modified headers
        await request.continue_({'headers': headers})

    page.on('request', handle_request)

    await page.goto('https://example.com')
    await browser.close()

# Run the async function
asyncio.run(intercept_requests())

Error Handling and Fallbacks

Implement robust error handling for request interception:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  page.on('request', async (request) => {
    try {
      const url = request.url();

      // Attempt to modify request
      if (url.includes('/api/')) {
        const headers = {
          ...request.headers(),
          'X-API-Key': 'your-api-key'
        };

        await request.continue({ headers });
      } else {
        await request.continue();
      }
    } catch (error) {
      console.error('Request interception error:', error);

      // Fallback: continue with original request
      try {
        await request.continue();
      } catch (fallbackError) {
        console.error('Fallback continue failed:', fallbackError);
      }
    }
  });

  await page.goto('https://example.com');
  await browser.close();
})();

Testing Request Interception

Create unit tests for your request interception logic:

const puppeteer = require('puppeteer');
const assert = require('assert');

async function testRequestInterception() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.setRequestInterception(true);

  const interceptedRequests = [];

  page.on('request', (request) => {
    interceptedRequests.push({
      url: request.url(),
      method: request.method(),
      headers: request.headers()
    });

    // Add custom header
    const headers = {
      ...request.headers(),
      'X-Test-Header': 'test-value'
    };

    request.continue({ headers });
  });

  await page.goto('https://httpbin.org/get');

  // Verify that requests were intercepted
  assert(interceptedRequests.length > 0, 'No requests were intercepted');

  // Verify that the main request was modified
  const mainRequest = interceptedRequests.find(req => 
    req.url.includes('httpbin.org/get')
  );

  assert(mainRequest, 'Main request not found');
  console.log('Test passed: Request interception working correctly');

  await browser.close();
}

testRequestInterception().catch(console.error);

Best Practices

  1. Always handle requests: Every intercepted request must be handled with continue(), abort(), or respond().

  2. Use efficient filtering: Implement fast filtering logic to avoid performance issues.

  3. Handle errors gracefully: Wrap request modifications in try-catch blocks.

  4. Monitor performance: Request interception can slow down page loading, so monitor and optimize accordingly.

  5. Clean up resources: Always close browsers and clean up event listeners.

  6. Test thoroughly: Create comprehensive tests for your request interception logic.

Integration with Web Scraping APIs

For complex web scraping scenarios, consider using specialized APIs that handle request interception and modification at scale. Services like WebScraping.AI provide robust infrastructure for handling complex request patterns, while Playwright offers similar capabilities for cross-browser automation.

Conclusion

Request interception in Puppeteer provides powerful capabilities for controlling network traffic, modifying requests, and creating sophisticated automation scenarios. Whether you're building web scrapers, testing applications, or creating development tools, mastering request interception will significantly enhance your ability to interact with web applications programmatically.

The key to successful request interception is understanding the request lifecycle, implementing efficient filtering logic, and handling edge cases gracefully. With these techniques, you can create robust and efficient web automation solutions that can handle complex real-world scenarios.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon