How do I set up user agent strings in Headless Chromium?

Setting up user agent strings in Headless Chromium is essential for web scraping projects that need to mimic different browsers, devices, or avoid detection. The user agent string identifies your browser to web servers and can significantly impact how websites respond to your requests.

What is a User Agent String?

A user agent string is an HTTP header that identifies the client making the request, including the browser type, version, operating system, and device information. Websites often use this information to serve different content or block automated requests.

Setting User Agent in Different Frameworks

Using Puppeteer (Node.js)

Puppeteer provides the most straightforward way to set user agent strings in Headless Chromium:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: true
  });

  const page = await browser.newPage();

  // Set a custom user agent
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');

  await page.goto('https://httpbin.org/user-agent');

  const userAgent = await page.evaluate(() => {
    return document.body.textContent;
  });

  console.log('Current user agent:', userAgent);

  await browser.close();
})();

Setting User Agent at Browser Launch

You can also set the user agent globally for all pages when launching the browser:

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  ]
});

Using Playwright

Playwright offers similar functionality with additional customization options:

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch({
    headless: true
  });

  const context = await browser.newContext({
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  });

  const page = await context.newPage();
  await page.goto('https://httpbin.org/user-agent');

  const content = await page.textContent('body');
  console.log('User agent:', content);

  await browser.close();
})();

Using Selenium with ChromeDriver

For Selenium users, you can set the user agent through Chrome options:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")

driver = webdriver.Chrome(options=chrome_options)
driver.get("https://httpbin.org/user-agent")

print("User agent:", driver.page_source)
driver.quit()

Popular User Agent Strings

Desktop Browsers

const desktopUserAgents = {
  chrome_windows: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  chrome_mac: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  firefox_windows: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/121.0',
  safari_mac: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15'
};

Mobile User Agents

const mobileUserAgents = {
  iphone: 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1',
  android_chrome: 'Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36',
  ipad: 'Mozilla/5.0 (iPad; CPU OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1'
};

Dynamic User Agent Rotation

For advanced web scraping scenarios, you might want to rotate user agents to avoid detection:

const puppeteer = require('puppeteer');

const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/121.0'
];

async function scrapeWithRandomUserAgent(url) {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  // Select random user agent
  const randomUserAgent = userAgents[Math.floor(Math.random() * userAgents.length)];
  await page.setUserAgent(randomUserAgent);

  console.log(`Using user agent: ${randomUserAgent}`);

  await page.goto(url);

  // Your scraping logic here
  const title = await page.title();
  console.log(`Page title: ${title}`);

  await browser.close();
}

// Usage
scrapeWithRandomUserAgent('https://example.com');

Advanced Configuration with Additional Headers

Combining user agent strings with other headers can make your requests more realistic:

const page = await browser.newPage();

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');

await page.setExtraHTTPHeaders({
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
  'Accept-Language': 'en-US,en;q=0.5',
  'Accept-Encoding': 'gzip, deflate, br',
  'DNT': '1',
  'Connection': 'keep-alive',
  'Upgrade-Insecure-Requests': '1'
});

Verifying Your User Agent

Always verify that your user agent is being set correctly:

async function verifyUserAgent(page) {
  const userAgent = await page.evaluate(() => navigator.userAgent);
  console.log('Browser user agent:', userAgent);

  // Also check what the server sees
  await page.goto('https://httpbin.org/headers');
  const headers = await page.evaluate(() => {
    return JSON.parse(document.body.textContent);
  });

  console.log('Server sees user agent:', headers.headers['User-Agent']);
}

Best Practices for User Agent Management

1. Use Recent and Common User Agents

Always use current, widely-used user agent strings. Outdated or unusual user agents can trigger anti-bot measures.

2. Match User Agent with Viewport

When setting viewport dimensions, ensure they match your user agent:

// For mobile user agent
await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15...');
await page.setViewport({ width: 375, height: 667, isMobile: true });

// For desktop user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...');
await page.setViewport({ width: 1920, height: 1080 });

3. Consider Geographic Consistency

Use user agents that match your target region's common browsers and operating systems.

4. Test User Agent Effectiveness

Some websites perform additional checks beyond the user agent string:

async function testBrowserFingerprint(page) {
  await page.goto('https://bot.sannysoft.com/');
  await page.waitForTimeout(5000);

  const results = await page.evaluate(() => {
    const tests = document.querySelectorAll('.test-result');
    return Array.from(tests).map(test => ({
      name: test.querySelector('.test-name')?.textContent,
      result: test.querySelector('.test-status')?.textContent
    }));
  });

  console.log('Bot detection results:', results);
}

Handling User Agent in Different Scenarios

For API Testing

When testing APIs that serve different content based on user agents:

curl -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15" \
     https://api.example.com/mobile-endpoint

For Load Testing

When handling browser sessions across multiple pages, maintain consistent user agents:

const context = await browser.createIncognitoBrowserContext();
await context.overridePermissions('https://example.com', ['geolocation']);

const page1 = await context.newPage();
const page2 = await context.newPage();

const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...';
await page1.setUserAgent(userAgent);
await page2.setUserAgent(userAgent);

Troubleshooting Common Issues

User Agent Not Applied

If your user agent isn't being set, check the timing:

// ❌ Wrong - setting after navigation
await page.goto('https://example.com');
await page.setUserAgent('...');

// ✅ Correct - setting before navigation
await page.setUserAgent('...');
await page.goto('https://example.com');

Detection Despite Correct User Agent

Modern anti-bot systems check more than just user agents. Consider additional factors like: - JavaScript execution patterns - Mouse movement and timing - WebRTC fingerprinting - Canvas fingerprinting

Performance Considerations

Setting user agents has minimal performance impact, but consider these optimizations:

// Reuse browser contexts for better performance
const context = await browser.newContext({
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...'
});

// Multiple pages will inherit the user agent
const page1 = await context.newPage();
const page2 = await context.newPage();

Using Command Line Arguments

You can set user agents directly when launching Headless Chromium from the command line:

# Launch with custom user agent
chromium --headless --disable-gpu --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" --dump-dom https://httpbin.org/user-agent

# For mobile user agent
chromium --headless --disable-gpu --user-agent="Mozilla/5.0 (iPhone; CPU iPhone OS 17_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1" --dump-dom https://httpbin.org/user-agent

Integration with WebScraping.AI

When using WebScraping.AI API, you can set custom user agents through the API parameters:

import requests

api_key = "YOUR_API_KEY"
url = "https://example.com"

response = requests.get(
    "https://api.webscraping.ai/html",
    params={
        "api_key": api_key,
        "url": url,
        "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "device": "desktop"  # or "mobile" for mobile user agents
    }
)

print(response.text)

Conclusion

Setting up user agent strings in Headless Chromium is straightforward but requires attention to detail for effective web scraping. Choose appropriate user agents for your target websites, combine them with realistic headers and viewport settings, and always test your configuration. When dealing with sophisticated anti-bot systems, consider using specialized tools or services that handle browser fingerprinting comprehensively.

Remember to respect websites' robots.txt files and terms of service while implementing user agent strategies in your web scraping projects. For production use, consider using managed services that handle user agent rotation and anti-detection measures automatically.

Table of contents