How to Intercept and Modify HTTP requests in Puppeteer
Intercepting and modifying HTTP requests in Puppeteer is a powerful feature that allows you to control network traffic, modify request headers, block certain resources, or redirect requests to different endpoints. This capability is essential for web scraping, testing, and automation scenarios where you need fine-grained control over network interactions.
Understanding Request Interception
Request interception in Puppeteer works by enabling the requestInterception
feature on a page, which allows you to intercept all outgoing HTTP requests before they're sent to the server. Once intercepted, you can examine, modify, or completely block these requests.
Basic Request Interception Setup
To start intercepting requests, you need to enable request interception on the page:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
// Listen for requests
page.on('request', (request) => {
console.log('Request URL:', request.url());
console.log('Request method:', request.method());
console.log('Request headers:', request.headers());
// Continue with the original request
request.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
Modifying Request Headers
You can modify request headers to add authentication tokens, change user agents, or add custom headers:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
const headers = {
...request.headers(),
'Authorization': 'Bearer your-token-here',
'X-Custom-Header': 'custom-value',
'User-Agent': 'CustomBot/1.0'
};
request.continue({ headers });
});
await page.goto('https://api.example.com');
await browser.close();
})();
Blocking Specific Resources
Block unnecessary resources like images, stylesheets, or ads to improve performance:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
const resourceType = request.resourceType();
const url = request.url();
// Block images, stylesheets, and fonts
if (['image', 'stylesheet', 'font'].includes(resourceType)) {
request.abort();
return;
}
// Block specific domains (e.g., ads, analytics)
if (url.includes('google-analytics.com') ||
url.includes('doubleclick.net') ||
url.includes('facebook.com/tr')) {
request.abort();
return;
}
request.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
Modifying Request URLs and Methods
You can redirect requests to different URLs or change HTTP methods:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
const url = request.url();
// Redirect API calls to a local mock server
if (url.includes('api.example.com')) {
const newUrl = url.replace('api.example.com', 'localhost:3000');
request.continue({ url: newUrl });
return;
}
// Change GET requests to POST for specific endpoints
if (url.includes('/search') && request.method() === 'GET') {
request.continue({
method: 'POST',
headers: {
...request.headers(),
'Content-Type': 'application/json'
},
postData: JSON.stringify({ query: 'modified search' })
});
return;
}
request.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
Modifying POST Data
Intercept and modify POST request data:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
if (request.method() === 'POST' && request.url().includes('/api/login')) {
const postData = request.postData();
if (postData) {
try {
const data = JSON.parse(postData);
// Modify the login data
data.username = 'modified_username';
data.additional_field = 'injected_value';
request.continue({
postData: JSON.stringify(data),
headers: {
...request.headers(),
'Content-Type': 'application/json'
}
});
return;
} catch (e) {
console.error('Error parsing POST data:', e);
}
}
}
request.continue();
});
await page.goto('https://example.com/login');
await browser.close();
})();
Advanced Request Interception with Response Mocking
You can also mock responses by intercepting requests and providing custom responses:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
if (request.url().includes('/api/data')) {
// Mock the API response
request.respond({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
success: true,
data: [
{ id: 1, name: 'Mocked Item 1' },
{ id: 2, name: 'Mocked Item 2' }
]
})
});
return;
}
request.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
Logging and Debugging Requests
Create a comprehensive logging system for requests:
const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
const requestLog = [];
page.on('request', (request) => {
const requestData = {
url: request.url(),
method: request.method(),
headers: request.headers(),
postData: request.postData(),
timestamp: new Date().toISOString()
};
requestLog.push(requestData);
console.log(`${request.method()} ${request.url()}`);
request.continue();
});
page.on('response', (response) => {
console.log(`Response: ${response.status()} ${response.url()}`);
});
await page.goto('https://example.com');
// Save request log to file
fs.writeFileSync('request_log.json', JSON.stringify(requestLog, null, 2));
await browser.close();
})();
Handling Authentication and Sessions
Intercept requests to add authentication tokens or manage sessions:
const puppeteer = require('puppeteer');
class AuthenticatedScraper {
constructor() {
this.authToken = null;
}
async login(page, username, password) {
// Perform login and extract token
await page.goto('https://example.com/login');
// ... login logic
this.authToken = 'extracted-token';
}
async setupInterception(page) {
await page.setRequestInterception(true);
page.on('request', (request) => {
const url = request.url();
// Add authentication to API requests
if (url.includes('/api/') && this.authToken) {
const headers = {
...request.headers(),
'Authorization': `Bearer ${this.authToken}`
};
request.continue({ headers });
return;
}
request.continue();
});
}
async scrape() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await this.login(page, 'username', 'password');
await this.setupInterception(page);
// Now all API requests will include authentication
await page.goto('https://example.com/protected-page');
await browser.close();
}
}
const scraper = new AuthenticatedScraper();
scraper.scrape();
Performance Optimization
Optimize request interception for better performance:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
// Create a set of blocked domains for faster lookup
const blockedDomains = new Set([
'google-analytics.com',
'googletagmanager.com',
'doubleclick.net',
'facebook.com',
'twitter.com'
]);
page.on('request', (request) => {
const url = new URL(request.url());
// Quick domain check
if (blockedDomains.has(url.hostname)) {
request.abort();
return;
}
// Block non-essential resources
const resourceType = request.resourceType();
if (['image', 'stylesheet', 'font', 'media'].includes(resourceType)) {
request.abort();
return;
}
request.continue();
});
await page.goto('https://example.com');
await browser.close();
})();
Python Implementation with Pyppeteer
For Python developers, here's how to implement request interception using Pyppeteer:
import asyncio
import json
from pyppeteer import launch
async def intercept_requests():
browser = await launch()
page = await browser.newPage()
# Enable request interception
await page.setRequestInterception(True)
async def handle_request(request):
# Log request details
print(f"Request: {request.method} {request.url}")
# Modify headers
headers = request.headers.copy()
headers['User-Agent'] = 'Python-Scraper/1.0'
headers['X-Custom-Header'] = 'modified-by-python'
# Block images and stylesheets
if request.resourceType in ['image', 'stylesheet']:
await request.abort()
return
# Continue with modified headers
await request.continue_({'headers': headers})
page.on('request', handle_request)
await page.goto('https://example.com')
await browser.close()
# Run the async function
asyncio.run(intercept_requests())
Error Handling and Fallbacks
Implement robust error handling for request interception:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', async (request) => {
try {
const url = request.url();
// Attempt to modify request
if (url.includes('/api/')) {
const headers = {
...request.headers(),
'X-API-Key': 'your-api-key'
};
await request.continue({ headers });
} else {
await request.continue();
}
} catch (error) {
console.error('Request interception error:', error);
// Fallback: continue with original request
try {
await request.continue();
} catch (fallbackError) {
console.error('Fallback continue failed:', fallbackError);
}
}
});
await page.goto('https://example.com');
await browser.close();
})();
Testing Request Interception
Create unit tests for your request interception logic:
const puppeteer = require('puppeteer');
const assert = require('assert');
async function testRequestInterception() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
const interceptedRequests = [];
page.on('request', (request) => {
interceptedRequests.push({
url: request.url(),
method: request.method(),
headers: request.headers()
});
// Add custom header
const headers = {
...request.headers(),
'X-Test-Header': 'test-value'
};
request.continue({ headers });
});
await page.goto('https://httpbin.org/get');
// Verify that requests were intercepted
assert(interceptedRequests.length > 0, 'No requests were intercepted');
// Verify that the main request was modified
const mainRequest = interceptedRequests.find(req =>
req.url.includes('httpbin.org/get')
);
assert(mainRequest, 'Main request not found');
console.log('Test passed: Request interception working correctly');
await browser.close();
}
testRequestInterception().catch(console.error);
Best Practices
Always handle requests: Every intercepted request must be handled with
continue()
,abort()
, orrespond()
.Use efficient filtering: Implement fast filtering logic to avoid performance issues.
Handle errors gracefully: Wrap request modifications in try-catch blocks.
Monitor performance: Request interception can slow down page loading, so monitor and optimize accordingly.
Clean up resources: Always close browsers and clean up event listeners.
Test thoroughly: Create comprehensive tests for your request interception logic.
Integration with Web Scraping APIs
For complex web scraping scenarios, consider using specialized APIs that handle request interception and modification at scale. Services like WebScraping.AI provide robust infrastructure for handling complex request patterns, while Playwright offers similar capabilities for cross-browser automation.
Conclusion
Request interception in Puppeteer provides powerful capabilities for controlling network traffic, modifying requests, and creating sophisticated automation scenarios. Whether you're building web scrapers, testing applications, or creating development tools, mastering request interception will significantly enhance your ability to interact with web applications programmatically.
The key to successful request interception is understanding the request lifecycle, implementing efficient filtering logic, and handling edge cases gracefully. With these techniques, you can create robust and efficient web automation solutions that can handle complex real-world scenarios.