Table of contents

How do I configure proxy settings in Headless Chromium?

Configuring proxy settings in Headless Chromium is essential for web scraping scenarios where you need to route traffic through proxy servers for anonymity, geographic location changes, or bypassing rate limits. This guide covers various methods to configure proxy settings using different programming languages and tools.

Understanding Proxy Types

Before diving into configuration, it's important to understand the different types of proxies you can use with Headless Chromium:

  • HTTP Proxy: Routes HTTP traffic through a proxy server
  • HTTPS Proxy: Routes HTTPS traffic through a proxy server
  • SOCKS Proxy: Routes all traffic through a SOCKS proxy server (SOCKS4 or SOCKS5)
  • PAC (Proxy Auto-Configuration): Uses a script to determine proxy settings

Configuring Proxies with Puppeteer (Node.js)

Basic HTTP Proxy Configuration

const puppeteer = require('puppeteer');

async function launchBrowserWithProxy() {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--proxy-server=http://proxy-server.com:8080'
    ]
  });

  const page = await browser.newPage();

  // Optional: Set proxy authentication
  await page.authenticate({
    username: 'your-username',
    password: 'your-password'
  });

  await page.goto('https://httpbin.org/ip');
  const content = await page.content();
  console.log(content);

  await browser.close();
}

launchBrowserWithProxy();

SOCKS Proxy Configuration

const puppeteer = require('puppeteer');

async function launchWithSOCKSProxy() {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--proxy-server=socks5://127.0.0.1:1080'
    ]
  });

  const page = await browser.newPage();
  await page.goto('https://httpbin.org/ip');

  // Check if proxy is working
  const response = await page.evaluate(() => {
    return document.body.innerText;
  });

  console.log('Response:', response);
  await browser.close();
}

launchWithSOCKSProxy();

Advanced Proxy Configuration with Multiple Protocols

const puppeteer = require('puppeteer');

async function launchWithAdvancedProxy() {
  const browser = await puppeteer.launch({
    headless: true,
    args: [
      '--proxy-server=http=proxy1.com:8080;https=proxy2.com:8080;ftp=proxy3.com:8080',
      '--proxy-bypass-list=localhost,127.0.0.1'
    ]
  });

  const page = await browser.newPage();

  // Handle proxy authentication if required
  await page.authenticate({
    username: 'username',
    password: 'password'
  });

  try {
    await page.goto('https://example.com', { waitUntil: 'networkidle2' });
    console.log('Successfully loaded page through proxy');
  } catch (error) {
    console.error('Failed to load page:', error.message);
  }

  await browser.close();
}

launchWithAdvancedProxy();

Python Implementation with Selenium

Basic Proxy Setup with Selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

def create_driver_with_proxy(proxy_host, proxy_port, username=None, password=None):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')

    # Configure proxy
    chrome_options.add_argument(f'--proxy-server=http://{proxy_host}:{proxy_port}')

    # If authentication is required, you need to use a proxy extension
    if username and password:
        proxy_auth_extension = create_proxy_auth_extension(
            proxy_host, proxy_port, username, password
        )
        chrome_options.add_extension(proxy_auth_extension)

    driver = webdriver.Chrome(options=chrome_options)
    return driver

def create_proxy_auth_extension(proxy_host, proxy_port, username, password):
    import zipfile
    import os

    manifest_json = """
    {
        "version": "1.0.0",
        "manifest_version": 2,
        "name": "Chrome Proxy",
        "permissions": [
            "proxy",
            "tabs",
            "unlimitedStorage",
            "storage",
            "<all_urls>",
            "webRequest",
            "webRequestBlocking"
        ],
        "background": {
            "scripts": ["background.js"]
        }
    }
    """

    background_js = f"""
    var config = {{
        mode: "fixed_servers",
        rules: {{
            singleProxy: {{
                scheme: "http",
                host: "{proxy_host}",
                port: parseInt({proxy_port})
            }},
            bypassList: ["localhost"]
        }}
    }};

    chrome.proxy.settings.set({{value: config, scope: "regular"}}, function() {{}});

    function callbackFn(details) {{
        return {{
            authCredentials: {{
                username: "{username}",
                password: "{password}"
            }}
        }};
    }}

    chrome.webRequest.onAuthRequired.addListener(
        callbackFn,
        {{urls: ["<all_urls>"]}},
        ['blocking']
    );
    """

    extension_path = '/tmp/proxy_auth_extension.zip'
    with zipfile.ZipFile(extension_path, 'w') as zp:
        zp.writestr("manifest.json", manifest_json)
        zp.writestr("background.js", background_js)

    return extension_path

# Usage example
driver = create_driver_with_proxy('proxy.example.com', 8080, 'username', 'password')
driver.get('https://httpbin.org/ip')
print(driver.page_source)
driver.quit()

SOCKS Proxy with Python

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def create_driver_with_socks_proxy(proxy_host, proxy_port):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument(f'--proxy-server=socks5://{proxy_host}:{proxy_port}')

    # Additional arguments for stability
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--disable-gpu')

    driver = webdriver.Chrome(options=chrome_options)
    return driver

# Usage
driver = create_driver_with_socks_proxy('127.0.0.1', 1080)
driver.get('https://httpbin.org/ip')
print(driver.page_source)
driver.quit()

Command Line Configuration

Direct Chrome/Chromium Launch

You can also launch Headless Chromium directly from the command line with proxy settings:

# HTTP Proxy
google-chrome --headless --disable-gpu --proxy-server=http://proxy.example.com:8080 --dump-dom https://httpbin.org/ip

# SOCKS Proxy
google-chrome --headless --disable-gpu --proxy-server=socks5://127.0.0.1:1080 --dump-dom https://httpbin.org/ip

# Multiple proxy types
google-chrome --headless --disable-gpu --proxy-server="http=proxy1.com:8080;https=proxy2.com:8080" --dump-dom https://example.com

Using with Docker

FROM node:16-alpine

RUN apk add --no-cache \
    chromium \
    nss \
    freetype \
    freetype-dev \
    harfbuzz \
    ca-certificates \
    ttf-freefont

ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
    PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

WORKDIR /app
COPY package*.json ./
RUN npm install

COPY . .

# Launch with proxy
CMD ["node", "script.js", "--proxy-server=http://proxy.example.com:8080"]

Proxy Authentication Handling

Handling Authentication with Puppeteer

const puppeteer = require('puppeteer');

async function handleProxyAuth() {
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--proxy-server=http://proxy.example.com:8080']
  });

  const page = await browser.newPage();

  // Set up authentication
  await page.authenticate({
    username: 'your-username',
    password: 'your-password'
  });

  // Monitor network requests for debugging
  page.on('response', response => {
    console.log(`Response: ${response.status()} ${response.url()}`);
  });

  page.on('requestfailed', request => {
    console.error(`Failed request: ${request.url()} ${request.failure().errorText}`);
  });

  await page.goto('https://httpbin.org/ip');
  await browser.close();
}

handleProxyAuth();

Testing Proxy Configuration

Verification Script

const puppeteer = require('puppeteer');

async function testProxyConfiguration(proxyUrl) {
  console.log(`Testing proxy: ${proxyUrl}`);

  const browser = await puppeteer.launch({
    headless: true,
    args: [`--proxy-server=${proxyUrl}`]
  });

  const page = await browser.newPage();

  try {
    // Test IP detection
    await page.goto('https://httpbin.org/ip', { timeout: 30000 });
    const ipResponse = await page.evaluate(() => document.body.innerText);
    console.log('IP Response:', ipResponse);

    // Test headers
    await page.goto('https://httpbin.org/headers', { timeout: 30000 });
    const headersResponse = await page.evaluate(() => document.body.innerText);
    console.log('Headers Response:', headersResponse);

    console.log('Proxy test successful!');
  } catch (error) {
    console.error('Proxy test failed:', error.message);
  } finally {
    await browser.close();
  }
}

// Test different proxy types
testProxyConfiguration('http://proxy.example.com:8080');
testProxyConfiguration('socks5://127.0.0.1:1080');

Common Proxy Configuration Issues

Troubleshooting Connection Problems

  1. Proxy Authentication Failures: Ensure credentials are correctly set using page.authenticate()
  2. Timeout Issues: Increase timeout values when working with slow proxies
  3. SSL Certificate Errors: Use --ignore-certificate-errors flag for testing (not recommended for production)
  4. DNS Resolution: Some proxies may require specific DNS settings

Error Handling Best Practices

const puppeteer = require('puppeteer');

async function robustProxyConnection(proxyUrl) {
  let browser;

  try {
    browser = await puppeteer.launch({
      headless: true,
      args: [
        `--proxy-server=${proxyUrl}`,
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-web-security',
        '--ignore-certificate-errors'
      ],
      timeout: 30000
    });

    const page = await browser.newPage();

    // Set longer timeouts for proxy connections
    page.setDefaultTimeout(60000);
    page.setDefaultNavigationTimeout(60000);

    // Handle authentication if needed
    await page.authenticate({
      username: process.env.PROXY_USERNAME,
      password: process.env.PROXY_PASSWORD
    });

    await page.goto('https://example.com', { 
      waitUntil: 'networkidle2',
      timeout: 60000 
    });

    return page;
  } catch (error) {
    console.error('Failed to establish proxy connection:', error);
    if (browser) await browser.close();
    throw error;
  }
}

Integration with Web Scraping Workflows

When implementing proxy settings in your web scraping projects, consider integrating with browser session management techniques to maintain consistent proxy connections across multiple requests. Additionally, you may want to combine proxy configuration with error handling strategies to gracefully manage proxy failures and connection timeouts.

Best Practices

  1. Proxy Rotation: Implement proxy rotation to avoid rate limiting
  2. Connection Pooling: Reuse browser instances when possible to reduce overhead
  3. Timeout Management: Set appropriate timeouts for proxy connections
  4. Error Handling: Implement retry logic for failed proxy connections
  5. Security: Never hardcode proxy credentials in your source code
  6. Testing: Always test proxy configurations before deploying to production

Configuring proxy settings in Headless Chromium provides powerful capabilities for web scraping while maintaining anonymity and bypassing geographical restrictions. Choose the appropriate method based on your specific requirements and programming environment.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon