Table of contents

How do I prevent detection of Headless Chromium by websites?

Websites increasingly use sophisticated detection methods to identify and block headless browsers. Modern bot detection systems analyze browser fingerprints, behavioral patterns, and JavaScript properties to distinguish automated scripts from real users.

This guide covers proven techniques to make your headless Chromium browser appear more human-like and bypass common detection mechanisms.

Understanding Detection Methods

Before implementing evasion techniques, it's important to understand how websites detect headless browsers:

  • Navigator properties (e.g., navigator.webdriver = true)
  • Missing browser features (plugins, WebGL, canvas)
  • Behavioral patterns (no mouse movements, consistent timing)
  • HTTP fingerprinting (headers, TLS signatures)
  • Canvas and WebGL fingerprinting

Core Evasion Techniques

1. Disable Automation Indicators

The most effective first step is disabling Chromium's automation control features:

# Python with Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless=new")  # Use new headless mode
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome(options=options)

# Execute script to hide webdriver property
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
// Node.js with Puppeteer
const puppeteer = require('puppeteer');

const browser = await puppeteer.launch({
    headless: 'new',
    args: [
        '--disable-blink-features=AutomationControlled',
        '--disable-dev-shm-usage',
        '--no-sandbox'
    ],
    ignoreDefaultArgs: ["--enable-automation"]
});

const page = await browser.newPage();

// Hide webdriver property
await page.evaluateOnNewDocument(() => {
    Object.defineProperty(navigator, 'webdriver', {
        get: () => undefined,
    });
});

2. User Agent and Headers Management

Set realistic user agents and HTTP headers that match real browsers:

# Python - Dynamic user agent rotation
import random

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
]

options.add_argument(f"--user-agent={random.choice(user_agents)}")

# Add realistic viewport
options.add_argument("--window-size=1366,768")
options.add_argument("--start-maximized")
// JavaScript - Set headers and viewport
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');

await page.setViewport({
    width: 1366,
    height: 768,
    deviceScaleFactor: 1,
    hasTouch: false,
    isLandscape: true,
    isMobile: false
});

// Set additional headers
await page.setExtraHTTPHeaders({
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1'
});

3. Advanced Stealth with Puppeteer Extra

For Node.js projects, puppeteer-extra-plugin-stealth provides comprehensive evasion:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');

// Configure stealth plugin
puppeteer.use(StealthPlugin({
    // Enable all evasion techniques
    enabledEvasions: new Set([
        'chrome.app',
        'chrome.csi',
        'chrome.loadTimes',
        'chrome.runtime',
        'defaultArgs',
        'iframe.contentWindow',
        'media.codecs',
        'navigator.hardwareConcurrency',
        'navigator.languages',
        'navigator.permissions',
        'navigator.plugins',
        'navigator.webdriver',
        'sourceurl',
        'user-agent-override',
        'webgl.vendor',
        'window.outerdimensions'
    ])
}));

const browser = await puppeteer.launch({
    headless: 'new',
    args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-infobars',
        '--window-position=0,0',
        '--ignore-certificate-errors',
        '--ignore-certificate-errors-spki-list',
        '--ignore-ssl-errors'
    ]
});

4. Behavioral Simulation

Make your automation behave more like a human user:

# Python - Human-like interactions
import time
import random
from selenium.webdriver.common.action_chains import ActionChains

def human_like_delay():
    time.sleep(random.uniform(1.5, 4.0))

def simulate_human_behavior(driver):
    # Random mouse movements
    actions = ActionChains(driver)

    # Move to random positions
    for _ in range(random.randint(2, 5)):
        x = random.randint(100, 1200)
        y = random.randint(100, 700)
        actions.move_by_offset(x, y)
        actions.pause(random.uniform(0.5, 1.5))

    actions.perform()
    human_like_delay()

# Usage
driver.get("https://example.com")
simulate_human_behavior(driver)
// JavaScript - Mouse movements and scrolling
async function humanLikeInteraction(page) {
    // Random mouse movements
    for (let i = 0; i < Math.floor(Math.random() * 5) + 2; i++) {
        await page.mouse.move(
            Math.random() * 1200,
            Math.random() * 700
        );
        await page.waitForTimeout(Math.random() * 1000 + 500);
    }

    // Random scrolling
    await page.evaluate(() => {
        const scrollHeight = document.body.scrollHeight;
        const scrollStep = Math.random() * 500 + 200;
        window.scrollTo(0, scrollStep);
    });

    await page.waitForTimeout(Math.random() * 2000 + 1000);
}

5. Fingerprint Randomization

Randomize browser fingerprints to avoid pattern detection:

// JavaScript - Canvas and WebGL fingerprint evasion
await page.evaluateOnNewDocument(() => {
    // Canvas fingerprint randomization
    const getImageData = HTMLCanvasElement.prototype.toDataURL;
    HTMLCanvasElement.prototype.toDataURL = function(type) {
        if (type === 'image/png' && this.width === 280 && this.height === 60) {
            // Add slight randomization to canvas data
            const context = this.getContext('2d');
            const imageData = context.getImageData(0, 0, this.width, this.height);
            for (let i = 0; i < imageData.data.length; i += 4) {
                imageData.data[i] += Math.floor(Math.random() * 3) - 1;
            }
            context.putImageData(imageData, 0, 0);
        }
        return getImageData.apply(this, arguments);
    };

    // WebGL fingerprint evasion
    const getParameter = WebGLRenderingContext.prototype.getParameter;
    WebGLRenderingContext.prototype.getParameter = function(parameter) {
        if (parameter === 37445) {
            return 'Intel Inc.'; // Generic GPU vendor
        }
        if (parameter === 37446) {
            return 'Intel(R) HD Graphics'; // Generic GPU renderer
        }
        return getParameter.apply(this, arguments);
    };
});

6. Proxy and Network Management

Implement proper proxy rotation and network patterns:

# Python - Proxy rotation with realistic timing
import itertools
import requests

class ProxyRotator:
    def __init__(self, proxy_list):
        self.proxies = itertools.cycle(proxy_list)
        self.current_proxy = None

    def get_next_proxy(self):
        self.current_proxy = next(self.proxies)
        return {
            'http': f'http://{self.current_proxy}',
            'https': f'http://{self.current_proxy}'
        }

    def configure_selenium(self, options):
        if self.current_proxy:
            options.add_argument(f'--proxy-server=http://{self.current_proxy}')

# Usage
proxy_list = ['proxy1:port', 'proxy2:port', 'proxy3:port']
rotator = ProxyRotator(proxy_list)

for url in urls_to_scrape:
    proxy_config = rotator.get_next_proxy()

    # Configure new browser instance with proxy
    options = Options()
    rotator.configure_selenium(options)

    driver = webdriver.Chrome(options=options)
    # ... scraping logic ...
    driver.quit()

    # Human-like delay between requests
    time.sleep(random.uniform(10, 30))

Complete Stealth Configuration Example

Here's a comprehensive example combining all techniques:

# Python - Complete stealth setup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import random
import time

def create_stealth_driver():
    options = Options()

    # Basic stealth options
    options.add_argument("--headless=new")
    options.add_argument("--disable-blink-features=AutomationControlled")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    # Realistic browser behavior
    options.add_argument("--disable-dev-shm-usage")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-gpu")
    options.add_argument("--remote-debugging-port=9222")

    # Random viewport
    viewports = [(1366, 768), (1920, 1080), (1440, 900), (1280, 720)]
    width, height = random.choice(viewports)
    options.add_argument(f"--window-size={width},{height}")

    # Random user agent
    user_agents = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
    ]
    options.add_argument(f"--user-agent={random.choice(user_agents)}")

    driver = webdriver.Chrome(options=options)

    # Execute stealth scripts
    stealth_js = """
        Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
        Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5]});
        Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
        window.chrome = {runtime: {}};
    """
    driver.execute_script(stealth_js)

    return driver

# Usage
driver = create_stealth_driver()
driver.get("https://bot-detection-test.com")

Detection Testing and Validation

Test your stealth configuration against bot detection services:

// Test against common detection services
const testUrls = [
    'https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html',
    'https://arh.antoinevastel.com/bots/areyouheadless',
    'https://bot.sannysoft.com/'
];

for (const url of testUrls) {
    console.log(`Testing: ${url}`);
    await page.goto(url, { waitUntil: 'networkidle2' });

    // Take screenshot to verify results
    await page.screenshot({
        path: `test-${url.split('/').pop()}.png`,
        fullPage: true
    });

    await page.waitForTimeout(5000);
}

Best Practices and Considerations

Compliance and Ethics

  • Always respect robots.txt and website terms of service
  • Implement rate limiting to avoid overwhelming servers
  • Use official APIs when available instead of scraping
  • Consider legal implications in your jurisdiction

Performance Optimization

  • Reuse browser instances when possible to reduce overhead
  • Implement connection pooling for better resource management
  • Monitor memory usage to prevent crashes during long sessions

Monitoring and Maintenance

  • Log detection events to identify pattern failures
  • Update user agents regularly to match current browser versions
  • Test configurations periodically as detection methods evolve

Alternative Solutions

For large-scale or mission-critical scraping, consider:

  • Residential proxy services with automatic rotation
  • Browser automation services with built-in stealth features
  • Web scraping APIs that handle detection evasion automatically
  • CAPTCHA solving services for interactive challenges

Remember that detection techniques continuously evolve, so no evasion method is permanently effective. The key is combining multiple techniques and staying updated with the latest developments in both bot detection and evasion methods.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon