How do I set up user agent strings in Headless Chromium?
Setting up user agent strings in Headless Chromium is essential for web scraping projects that need to mimic different browsers, devices, or avoid detection. The user agent string identifies your browser to web servers and can significantly impact how websites respond to your requests.
What is a User Agent String?
A user agent string is an HTTP header that identifies the client making the request, including the browser type, version, operating system, and device information. Websites often use this information to serve different content or block automated requests.
Setting User Agent in Different Frameworks
Using Puppeteer (Node.js)
Puppeteer provides the most straightforward way to set user agent strings in Headless Chromium:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage();
// Set a custom user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
await page.goto('https://httpbin.org/user-agent');
const userAgent = await page.evaluate(() => {
return document.body.textContent;
});
console.log('Current user agent:', userAgent);
await browser.close();
})();
Setting User Agent at Browser Launch
You can also set the user agent globally for all pages when launching the browser:
const browser = await puppeteer.launch({
headless: true,
args: [
'--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
]
});
Using Playwright
Playwright offers similar functionality with additional customization options:
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
headless: true
});
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
});
const page = await context.newPage();
await page.goto('https://httpbin.org/user-agent');
const content = await page.textContent('body');
console.log('User agent:', content);
await browser.close();
})();
Using Selenium with ChromeDriver
For Selenium users, you can set the user agent through Chrome options:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
driver = webdriver.Chrome(options=chrome_options)
driver.get("https://httpbin.org/user-agent")
print("User agent:", driver.page_source)
driver.quit()
Popular User Agent Strings
Desktop Browsers
const desktopUserAgents = {
chrome_windows: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
chrome_mac: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
firefox_windows: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/121.0',
safari_mac: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15'
};
Mobile User Agents
const mobileUserAgents = {
iphone: 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1',
android_chrome: 'Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36',
ipad: 'Mozilla/5.0 (iPad; CPU OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1'
};
Dynamic User Agent Rotation
For advanced web scraping scenarios, you might want to rotate user agents to avoid detection:
const puppeteer = require('puppeteer');
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/121.0'
];
async function scrapeWithRandomUserAgent(url) {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
// Select random user agent
const randomUserAgent = userAgents[Math.floor(Math.random() * userAgents.length)];
await page.setUserAgent(randomUserAgent);
console.log(`Using user agent: ${randomUserAgent}`);
await page.goto(url);
// Your scraping logic here
const title = await page.title();
console.log(`Page title: ${title}`);
await browser.close();
}
// Usage
scrapeWithRandomUserAgent('https://example.com');
Advanced Configuration with Additional Headers
Combining user agent strings with other headers can make your requests more realistic:
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
await page.setExtraHTTPHeaders({
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
});
Verifying Your User Agent
Always verify that your user agent is being set correctly:
async function verifyUserAgent(page) {
const userAgent = await page.evaluate(() => navigator.userAgent);
console.log('Browser user agent:', userAgent);
// Also check what the server sees
await page.goto('https://httpbin.org/headers');
const headers = await page.evaluate(() => {
return JSON.parse(document.body.textContent);
});
console.log('Server sees user agent:', headers.headers['User-Agent']);
}
Best Practices for User Agent Management
1. Use Recent and Common User Agents
Always use current, widely-used user agent strings. Outdated or unusual user agents can trigger anti-bot measures.
2. Match User Agent with Viewport
When setting viewport dimensions, ensure they match your user agent:
// For mobile user agent
await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15...');
await page.setViewport({ width: 375, height: 667, isMobile: true });
// For desktop user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...');
await page.setViewport({ width: 1920, height: 1080 });
3. Consider Geographic Consistency
Use user agents that match your target region's common browsers and operating systems.
4. Test User Agent Effectiveness
Some websites perform additional checks beyond the user agent string:
async function testBrowserFingerprint(page) {
await page.goto('https://bot.sannysoft.com/');
await page.waitForTimeout(5000);
const results = await page.evaluate(() => {
const tests = document.querySelectorAll('.test-result');
return Array.from(tests).map(test => ({
name: test.querySelector('.test-name')?.textContent,
result: test.querySelector('.test-status')?.textContent
}));
});
console.log('Bot detection results:', results);
}
Handling User Agent in Different Scenarios
For API Testing
When testing APIs that serve different content based on user agents:
curl -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15" \
https://api.example.com/mobile-endpoint
For Load Testing
When handling browser sessions across multiple pages, maintain consistent user agents:
const context = await browser.createIncognitoBrowserContext();
await context.overridePermissions('https://example.com', ['geolocation']);
const page1 = await context.newPage();
const page2 = await context.newPage();
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...';
await page1.setUserAgent(userAgent);
await page2.setUserAgent(userAgent);
Troubleshooting Common Issues
User Agent Not Applied
If your user agent isn't being set, check the timing:
// ❌ Wrong - setting after navigation
await page.goto('https://example.com');
await page.setUserAgent('...');
// ✅ Correct - setting before navigation
await page.setUserAgent('...');
await page.goto('https://example.com');
Detection Despite Correct User Agent
Modern anti-bot systems check more than just user agents. Consider additional factors like: - JavaScript execution patterns - Mouse movement and timing - WebRTC fingerprinting - Canvas fingerprinting
Performance Considerations
Setting user agents has minimal performance impact, but consider these optimizations:
// Reuse browser contexts for better performance
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...'
});
// Multiple pages will inherit the user agent
const page1 = await context.newPage();
const page2 = await context.newPage();
Using Command Line Arguments
You can set user agents directly when launching Headless Chromium from the command line:
# Launch with custom user agent
chromium --headless --disable-gpu --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" --dump-dom https://httpbin.org/user-agent
# For mobile user agent
chromium --headless --disable-gpu --user-agent="Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1" --dump-dom https://httpbin.org/user-agent
Integration with WebScraping.AI
When using WebScraping.AI API, you can set custom user agents through the API parameters:
import requests
api_key = "YOUR_API_KEY"
url = "https://example.com"
response = requests.get(
"https://api.webscraping.ai/html",
params={
"api_key": api_key,
"url": url,
"user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"device": "desktop" # or "mobile" for mobile user agents
}
)
print(response.text)
Conclusion
Setting up user agent strings in Headless Chromium is straightforward but requires attention to detail for effective web scraping. Choose appropriate user agents for your target websites, combine them with realistic headers and viewport settings, and always test your configuration. When dealing with sophisticated anti-bot systems, consider using specialized tools or services that handle browser fingerprinting comprehensively.
Remember to respect websites' robots.txt files and terms of service while implementing user agent strategies in your web scraping projects. For production use, consider using managed services that handle user agent rotation and anti-detection measures automatically.