How do I set up proxy configuration in Playwright?
Setting up proxy configuration in Playwright is essential for web scraping projects that require IP rotation, bypassing geo-restrictions, or routing traffic through specific servers. Playwright provides flexible proxy support for HTTP, HTTPS, and SOCKS proxies at both browser and context levels.
Basic Proxy Configuration
Browser-Level Proxy Setup
The most common approach is to configure the proxy when launching the browser. This applies the proxy settings to all contexts and pages within that browser instance.
const { chromium } = require('playwright');
const browser = await chromium.launch({
proxy: {
server: 'http://proxy-server.com:8080'
}
});
const context = await browser.newContext();
const page = await context.newPage();
// All requests will now go through the proxy
await page.goto('https://httpbin.org/ip');
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={
'server': 'http://proxy-server.com:8080'
}
)
context = browser.new_context()
page = context.new_page()
page.goto('https://httpbin.org/ip')
browser.close()
Context-Level Proxy Configuration
For more granular control, you can configure proxies at the context level, allowing different contexts to use different proxies.
const { chromium } = require('playwright');
const browser = await chromium.launch();
const context = await browser.newContext({
proxy: {
server: 'http://proxy-server.com:8080'
}
});
const page = await context.newPage();
await page.goto('https://example.com');
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context(
proxy={
'server': 'http://proxy-server.com:8080'
}
)
page = context.new_page()
page.goto('https://example.com')
browser.close()
Proxy Authentication
Many proxy services require authentication. Playwright supports both username/password and token-based authentication.
Username and Password Authentication
const browser = await chromium.launch({
proxy: {
server: 'http://proxy-server.com:8080',
username: 'your-username',
password: 'your-password'
}
});
browser = p.chromium.launch(
proxy={
'server': 'http://proxy-server.com:8080',
'username': 'your-username',
'password': 'your-password'
}
)
Advanced Authentication Headers
For more complex authentication scenarios, you can set custom headers:
const context = await browser.newContext({
proxy: {
server: 'http://proxy-server.com:8080'
},
extraHTTPHeaders: {
'Proxy-Authorization': 'Bearer your-token-here'
}
});
Different Proxy Types
HTTP/HTTPS Proxies
// HTTP proxy
const browser = await chromium.launch({
proxy: {
server: 'http://proxy-server.com:8080'
}
});
// HTTPS proxy
const browser = await chromium.launch({
proxy: {
server: 'https://secure-proxy.com:8080'
}
});
SOCKS Proxies
Playwright supports both SOCKS4 and SOCKS5 proxies:
// SOCKS5 proxy
const browser = await chromium.launch({
proxy: {
server: 'socks5://proxy-server.com:1080'
}
});
// SOCKS4 proxy
const browser = await chromium.launch({
proxy: {
server: 'socks4://proxy-server.com:1080'
}
});
# SOCKS5 proxy
browser = p.chromium.launch(
proxy={
'server': 'socks5://proxy-server.com:1080'
}
)
Proxy Bypass Configuration
You can configure Playwright to bypass the proxy for specific URLs or domains:
const browser = await chromium.launch({
proxy: {
server: 'http://proxy-server.com:8080',
bypass: 'localhost,127.0.0.1,*.internal.com'
}
});
browser = p.chromium.launch(
proxy={
'server': 'http://proxy-server.com:8080',
'bypass': 'localhost,127.0.0.1,*.internal.com'
}
)
Multiple Proxy Configuration
For advanced web scraping scenarios, you might need to rotate between multiple proxies. Here's how to implement proxy rotation:
const proxies = [
{ server: 'http://proxy1.com:8080', username: 'user1', password: 'pass1' },
{ server: 'http://proxy2.com:8080', username: 'user2', password: 'pass2' },
{ server: 'http://proxy3.com:8080', username: 'user3', password: 'pass3' }
];
async function scrapeWithProxyRotation(urls) {
const { chromium } = require('playwright');
for (let i = 0; i < urls.length; i++) {
const proxy = proxies[i % proxies.length];
const browser = await chromium.launch({ proxy });
const context = await browser.newContext();
const page = await context.newPage();
try {
await page.goto(urls[i]);
// Process the page
const content = await page.content();
console.log(`Scraped ${urls[i]} via ${proxy.server}`);
} catch (error) {
console.error(`Failed to scrape ${urls[i]}:`, error.message);
} finally {
await browser.close();
}
}
}
Proxy Health Checking
It's important to verify that your proxy is working correctly. Here's a utility function to test proxy connectivity:
async function testProxy(proxyConfig) {
const { chromium } = require('playwright');
try {
const browser = await chromium.launch({ proxy: proxyConfig });
const context = await browser.newContext();
const page = await context.newPage();
// Test the proxy by checking IP
await page.goto('https://httpbin.org/ip', { timeout: 10000 });
const response = await page.textContent('body');
const ipData = JSON.parse(response);
console.log('Proxy working. Current IP:', ipData.origin);
await browser.close();
return true;
} catch (error) {
console.error('Proxy test failed:', error.message);
return false;
}
}
// Usage
const proxyConfig = {
server: 'http://proxy-server.com:8080',
username: 'your-username',
password: 'your-password'
};
await testProxy(proxyConfig);
Environment-Based Proxy Configuration
For production applications, it's best practice to store proxy configuration in environment variables:
const proxyConfig = {
server: process.env.PROXY_SERVER,
username: process.env.PROXY_USERNAME,
password: process.env.PROXY_PASSWORD
};
const browser = await chromium.launch({
proxy: proxyConfig.server ? proxyConfig : undefined
});
import os
proxy_config = {
'server': os.getenv('PROXY_SERVER'),
'username': os.getenv('PROXY_USERNAME'),
'password': os.getenv('PROXY_PASSWORD')
}
# Only use proxy if server is configured
proxy_settings = proxy_config if proxy_config['server'] else None
browser = p.chromium.launch(proxy=proxy_settings)
Troubleshooting Common Issues
Connection Timeouts
If you're experiencing connection timeouts, increase the timeout values:
const page = await context.newPage();
await page.goto('https://example.com', {
timeout: 30000, // 30 seconds
waitUntil: 'networkidle'
});
Proxy Authentication Errors
For authentication issues, verify your credentials and try different authentication methods:
// Try different authentication approaches
const configs = [
{
server: 'http://proxy-server.com:8080',
username: 'user',
password: 'pass'
},
{
server: 'http://user:pass@proxy-server.com:8080'
}
];
SSL Certificate Issues
For HTTPS proxies with SSL issues, you might need to ignore SSL errors:
const context = await browser.newContext({
proxy: {
server: 'https://proxy-server.com:8080'
},
ignoreHTTPSErrors: true
});
Best Practices
- Test Proxy Configuration: Always test your proxy setup before running production scraping tasks
- Handle Failures Gracefully: Implement retry logic and fallback mechanisms
- Monitor Proxy Performance: Track response times and success rates
- Rotate Proxies: Use multiple proxies to distribute load and avoid rate limiting
- Secure Credentials: Store proxy credentials securely using environment variables or secret management systems
Integration with Web Scraping Workflows
When building robust web scraping solutions, proper proxy configuration is crucial for avoiding detection and maintaining consistent access to target websites. Similar to how you might handle browser sessions in Puppeteer, managing proxy connections requires careful planning and error handling.
For complex scraping scenarios involving multiple pages or extensive data extraction, consider implementing proxy rotation strategies alongside other anti-detection measures. This approach works particularly well when monitoring network requests in Puppeteer to understand traffic patterns and optimize your scraping strategy.
By properly configuring proxies in Playwright, you can create more resilient web scraping applications that can handle various network conditions and access restrictions while maintaining the reliability and performance your projects require.