How can I configure custom user agents in Playwright?
Configuring custom user agents in Playwright is essential for web scraping projects where you need to simulate different browsers, devices, or avoid detection. Playwright provides multiple ways to set user agents at different levels - from browser context to individual pages. This guide covers all the methods and best practices for user agent configuration.
What is a User Agent?
A user agent is a string that identifies the browser, operating system, and device making the request to a web server. Websites often use user agents to serve different content based on the client's capabilities or to block automated requests. When web scraping, customizing user agents helps you:
- Simulate real browser behavior
- Access mobile or desktop-specific content
- Bypass basic bot detection
- Test how websites respond to different browsers
Setting User Agent at Browser Context Level
The most common approach is to set the user agent when creating a browser context. This applies the user agent to all pages within that context:
JavaScript/Node.js Example
const { chromium } = require('playwright');
async function scrapeWithCustomUserAgent() {
const browser = await chromium.launch();
// Set custom user agent for the entire context
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
});
const page = await context.newPage();
await page.goto('https://httpbin.org/user-agent');
// The page will use the custom user agent
const userAgent = await page.textContent('body');
console.log('User Agent:', userAgent);
await browser.close();
}
scrapeWithCustomUserAgent();
Python Example
from playwright.sync_api import sync_playwright
def scrape_with_custom_user_agent():
with sync_playwright() as p:
browser = p.chromium.launch()
# Set custom user agent at context level
context = browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
)
page = context.new_page()
page.goto("https://httpbin.org/user-agent")
# Extract and print the user agent
user_agent_text = page.text_content("body")
print(f"User Agent: {user_agent_text}")
browser.close()
scrape_with_custom_user_agent()
Setting User Agent at Page Level
You can also set user agents for individual pages using the setExtraHTTPHeaders
method:
JavaScript Example
const { chromium } = require('playwright');
async function setPageUserAgent() {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
// Set user agent for this specific page
await page.setExtraHTTPHeaders({
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Mobile/15E148 Safari/604.1'
});
await page.goto('https://httpbin.org/headers');
const headers = await page.textContent('pre');
console.log('Headers:', headers);
await browser.close();
}
setPageUserAgent();
Python Example
from playwright.sync_api import sync_playwright
def set_page_user_agent():
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
page = context.new_page()
# Set user agent for this specific page
page.set_extra_http_headers({
"User-Agent": "Mozilla/5.0 (iPad; CPU OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Mobile/15E148 Safari/604.1"
})
page.goto("https://httpbin.org/headers")
headers = page.text_content("pre")
print(f"Headers: {headers}")
browser.close()
set_page_user_agent()
Using Predefined Device User Agents
Playwright provides predefined device configurations that include appropriate user agents. This is particularly useful for mobile emulation:
JavaScript Example
const { chromium, devices } = require('playwright');
async function useDeviceUserAgent() {
const browser = await chromium.launch();
// Use iPhone 12 configuration (includes user agent)
const iPhone12 = devices['iPhone 12'];
const context = await browser.newContext({
...iPhone12,
});
const page = await context.newPage();
await page.goto('https://httpbin.org/user-agent');
const userAgent = await page.textContent('body');
console.log('iPhone 12 User Agent:', userAgent);
await browser.close();
}
useDeviceUserAgent();
Python Example
from playwright.sync_api import sync_playwright
def use_device_user_agent():
with sync_playwright() as p:
browser = p.chromium.launch()
# Use iPhone 12 Pro configuration
iphone_12_pro = p.devices['iPhone 12 Pro']
context = browser.new_context(**iphone_12_pro)
page = context.new_page()
page.goto("https://httpbin.org/user-agent")
user_agent = page.text_content("body")
print(f"iPhone 12 Pro User Agent: {user_agent}")
browser.close()
use_device_user_agent()
Dynamic User Agent Rotation
For advanced web scraping scenarios, you might want to rotate user agents to avoid detection. Here's how to implement user agent rotation:
JavaScript Example
const { chromium } = require('playwright');
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0'
];
async function rotateUserAgents() {
const browser = await chromium.launch();
for (let i = 0; i < userAgents.length; i++) {
const context = await browser.newContext({
userAgent: userAgents[i]
});
const page = await context.newPage();
await page.goto('https://httpbin.org/user-agent');
const userAgent = await page.textContent('body');
console.log(`Request ${i + 1} User Agent:`, userAgent);
await context.close();
}
await browser.close();
}
rotateUserAgents();
User Agent Best Practices
1. Use Realistic User Agents
Always use real, current user agent strings from actual browsers. Avoid outdated or obviously fake user agents:
// Good - Real Chrome user agent
const goodUA = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36';
// Bad - Fake or outdated user agent
const badUA = 'MyBot/1.0 (Web Scraper)';
2. Match User Agent with Other Headers
When setting custom user agents, ensure other headers are consistent. For example, when using a mobile user agent, also set appropriate viewport and headers:
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15',
viewport: { width: 375, height: 667 },
extraHTTPHeaders: {
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br'
}
});
3. Test User Agent Detection
Always verify that your custom user agent is being sent correctly:
// Check if user agent is properly set
const userAgent = await page.evaluate(() => navigator.userAgent);
console.log('Browser User Agent:', userAgent);
// Also check server-side detection
await page.goto('https://httpbin.org/user-agent');
const serverUA = await page.textContent('body');
console.log('Server-detected User Agent:', serverUA);
Common User Agent Strings
Here are some commonly used user agent strings for different browsers and devices:
Desktop Browsers
const desktopUserAgents = {
chrome: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
firefox: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0',
safari: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15',
edge: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0.864.59'
};
Mobile Browsers
const mobileUserAgents = {
iphone: 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Mobile/15E148 Safari/604.1',
android: 'Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Mobile Safari/537.36',
ipad: 'Mozilla/5.0 (iPad; CPU OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Mobile/15E148 Safari/604.1'
};
Combining User Agents with Browser Context Options
For comprehensive browser fingerprinting, combine user agent settings with other context options:
const { chromium } = require('playwright');
async function comprehensiveBrowserEmulation() {
const browser = await chromium.launch();
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
viewport: { width: 1366, height: 768 },
locale: 'en-US',
timezoneId: 'America/New_York',
extraHTTPHeaders: {
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document'
}
});
const page = await context.newPage();
await page.goto('https://httpbin.org/headers');
const headers = await page.textContent('pre');
console.log('Complete Headers:', headers);
await browser.close();
}
comprehensiveBrowserEmulation();
Troubleshooting User Agent Issues
Issue 1: User Agent Not Being Applied
If your custom user agent isn't working, check:
- Set the user agent before navigating to the page
- Verify the user agent string is properly formatted
- Check if the website is detecting other browser fingerprints
Issue 2: Inconsistent Behavior
Some websites check multiple factors beyond user agents. Consider also setting:
- Viewport size
- Accept headers
- Accept-Language headers
- Platform-specific features
Issue 3: Mobile User Agent Detection
When using mobile user agents, also configure:
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15',
viewport: { width: 375, height: 667 },
hasTouch: true,
isMobile: true
});
Using User Agents with Proxy Servers
When combining user agents with proxy servers, ensure consistency:
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
proxy: {
server: 'http://proxy-server:port',
username: 'username',
password: 'password'
}
});
Integration with Web Scraping APIs
When using web scraping services, you can often specify custom user agents. Similar to how you might handle browser sessions in Puppeteer, many scraping APIs allow user agent customization through parameters.
For more advanced scenarios involving handling authentication in Puppeteer, custom user agents can be combined with other headers and session management techniques.
Conclusion
Configuring custom user agents in Playwright is straightforward and essential for effective web scraping. Whether you're setting them at the browser context level, page level, or using predefined device configurations, the key is to use realistic user agent strings that match your scraping requirements.
Remember to: - Use current, realistic user agent strings - Match user agents with appropriate headers and viewport settings - Test your configuration to ensure it works as expected - Consider rotating user agents for large-scale scraping operations - Combine user agents with other browser context options for comprehensive emulation - Respect website terms of service and robots.txt files
By following these practices, you'll be able to effectively simulate different browsers and devices in your Playwright automation scripts, leading to more successful web scraping outcomes.