Table of contents

What are the best practices for error handling in Headless Chromium scripts?

Error handling is crucial for building robust and reliable Headless Chromium scripts. Whether you're web scraping, automating testing, or performing other browser automation tasks, implementing proper error handling ensures your scripts can gracefully handle unexpected situations and recover from failures.

Understanding Common Headless Chromium Errors

Before diving into best practices, it's important to understand the types of errors you'll encounter:

  • Network errors: Connection timeouts, DNS failures, HTTP errors
  • Navigation errors: Page load failures, redirect issues
  • Element interaction errors: Missing elements, stale element references
  • JavaScript errors: Runtime exceptions in injected scripts
  • Resource errors: Memory exhaustion, browser crashes
  • Timeout errors: Page loads or operations taking too long

1. Implement Comprehensive Try-Catch Blocks

Always wrap your Headless Chromium operations in try-catch blocks to handle exceptions gracefully:

Python (Selenium) Example

from selenium import webdriver
from selenium.common.exceptions import TimeoutException, WebDriverException
from selenium.webdriver.chrome.options import Options
import logging

def setup_headless_browser():
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')

    try:
        driver = webdriver.Chrome(options=options)
        return driver
    except WebDriverException as e:
        logging.error(f"Failed to initialize browser: {e}")
        raise

def scrape_page_safely(url):
    driver = None
    try:
        driver = setup_headless_browser()
        driver.get(url)

        # Your scraping logic here
        title = driver.title
        return title

    except TimeoutException:
        logging.error(f"Timeout while loading {url}")
        return None
    except WebDriverException as e:
        logging.error(f"WebDriver error: {e}")
        return None
    except Exception as e:
        logging.error(f"Unexpected error: {e}")
        return None
    finally:
        if driver:
            try:
                driver.quit()
            except:
                pass  # Ignore cleanup errors

JavaScript (Puppeteer) Example

const puppeteer = require('puppeteer');

async function scrapePageSafely(url) {
    let browser;
    let page;

    try {
        browser = await puppeteer.launch({
            headless: true,
            args: ['--no-sandbox', '--disable-setuid-sandbox']
        });

        page = await browser.newPage();

        // Set timeout for navigation
        await page.goto(url, { 
            waitUntil: 'networkidle2',
            timeout: 30000 
        });

        const title = await page.title();
        return title;

    } catch (error) {
        if (error.name === 'TimeoutError') {
            console.error(`Timeout while loading ${url}`);
        } else if (error.message.includes('net::ERR_')) {
            console.error(`Network error: ${error.message}`);
        } else {
            console.error(`Unexpected error: ${error.message}`);
        }
        return null;
    } finally {
        if (page) {
            try {
                await page.close();
            } catch (e) {
                console.warn('Error closing page:', e.message);
            }
        }
        if (browser) {
            try {
                await browser.close();
            } catch (e) {
                console.warn('Error closing browser:', e.message);
            }
        }
    }
}

2. Configure Appropriate Timeouts

Setting proper timeouts prevents your scripts from hanging indefinitely. Configure timeouts at multiple levels:

Page Load Timeouts

# Python/Selenium
driver.set_page_load_timeout(30)  # 30 seconds for page loads
driver.implicitly_wait(10)        # 10 seconds for element searches
// JavaScript/Puppeteer
await page.setDefaultTimeout(30000); // 30 seconds default
await page.setDefaultNavigationTimeout(45000); // 45 seconds for navigation

Element Interaction Timeouts

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

def wait_for_element_safely(driver, selector, timeout=10):
    try:
        element = WebDriverWait(driver, timeout).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, selector))
        )
        return element
    except TimeoutException:
        logging.error(f"Element {selector} not found within {timeout} seconds")
        return None

Understanding how to handle timeouts effectively is essential for building reliable automation scripts.

3. Implement Retry Logic with Exponential Backoff

Network issues and temporary failures are common. Implement retry logic with exponential backoff:

import time
import random

def retry_with_backoff(func, max_retries=3, base_delay=1):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise e

            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            logging.warning(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.2f}s")
            time.sleep(delay)

# Usage
def scrape_with_retry(url):
    return retry_with_backoff(lambda: scrape_page_safely(url))
async function retryWithBackoff(fn, maxRetries = 3, baseDelay = 1000) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            return await fn();
        } catch (error) {
            if (attempt === maxRetries - 1) {
                throw error;
            }

            const delay = baseDelay * Math.pow(2, attempt) + Math.random() * 1000;
            console.warn(`Attempt ${attempt + 1} failed: ${error.message}. Retrying in ${delay}ms`);
            await new Promise(resolve => setTimeout(resolve, delay));
        }
    }
}

// Usage
const result = await retryWithBackoff(() => scrapePageSafely(url));

4. Handle Network and Navigation Errors

Network issues are particularly common in web automation. Implement specific handling for different types of network errors:

async function handleNavigationSafely(page, url) {
    try {
        const response = await page.goto(url, {
            waitUntil: 'networkidle2',
            timeout: 30000
        });

        if (!response.ok()) {
            throw new Error(`HTTP ${response.status()}: ${response.statusText()}`);
        }

        return response;
    } catch (error) {
        if (error.message.includes('net::ERR_INTERNET_DISCONNECTED')) {
            throw new Error('No internet connection available');
        } else if (error.message.includes('net::ERR_NAME_NOT_RESOLVED')) {
            throw new Error(`DNS resolution failed for ${url}`);
        } else if (error.message.includes('net::ERR_CONNECTION_REFUSED')) {
            throw new Error(`Connection refused by ${url}`);
        } else if (error.name === 'TimeoutError') {
            throw new Error(`Page load timeout for ${url}`);
        } else {
            throw error;
        }
    }
}

For more complex navigation scenarios, learn about handling page redirections effectively.

5. Implement Robust Element Interaction

Element interactions often fail due to timing issues or dynamic content. Always verify element existence and state:

from selenium.webdriver.support import expected_conditions as EC

def click_element_safely(driver, selector):
    try:
        # Wait for element to be clickable
        element = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.CSS_SELECTOR, selector))
        )

        # Scroll element into view
        driver.execute_script("arguments[0].scrollIntoView(true);", element)

        # Add small delay for scroll to complete
        time.sleep(0.5)

        element.click()
        return True

    except TimeoutException:
        logging.error(f"Element {selector} not clickable within timeout")
        return False
    except Exception as e:
        logging.error(f"Failed to click element {selector}: {e}")
        return False
async function clickElementSafely(page, selector) {
    try {
        // Wait for element to be available
        await page.waitForSelector(selector, { timeout: 10000 });

        // Check if element is visible
        const isVisible = await page.evaluate((sel) => {
            const element = document.querySelector(sel);
            return element && element.offsetParent !== null;
        }, selector);

        if (!isVisible) {
            throw new Error(`Element ${selector} is not visible`);
        }

        // Scroll to element and click
        await page.evaluate((sel) => {
            document.querySelector(sel).scrollIntoView();
        }, selector);

        await page.click(selector);
        return true;

    } catch (error) {
        console.error(`Failed to click element ${selector}:`, error.message);
        return false;
    }
}

6. Monitor and Log Browser Console Errors

Browser console errors can provide valuable debugging information:

async function setupConsoleMonitoring(page) {
    page.on('console', msg => {
        const type = msg.type();
        if (type === 'error') {
            console.error('Browser console error:', msg.text());
        } else if (type === 'warning') {
            console.warn('Browser console warning:', msg.text());
        }
    });

    page.on('pageerror', error => {
        console.error('Page error:', error.message);
    });

    page.on('requestfailed', request => {
        console.error('Request failed:', request.url(), request.failure().errorText);
    });
}

7. Implement Circuit Breaker Pattern

For production systems, implement a circuit breaker to prevent cascading failures:

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open" 
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func()
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e

    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage
circuit_breaker = CircuitBreaker(failure_threshold=3, timeout=30)

def protected_scraping(url):
    return circuit_breaker.call(lambda: scrape_page_safely(url))

8. Resource Cleanup and Memory Management

Always ensure proper cleanup of browser resources:

class HeadlessBrowserManager:
    def __init__(self):
        self.driver = None

    def __enter__(self):
        try:
            options = Options()
            options.add_argument('--headless')
            options.add_argument('--no-sandbox')
            self.driver = webdriver.Chrome(options=options)
            return self.driver
        except Exception as e:
            logging.error(f"Failed to create browser: {e}")
            raise

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.driver:
            try:
                self.driver.quit()
            except Exception as e:
                logging.warning(f"Error during cleanup: {e}")

# Usage
with HeadlessBrowserManager() as driver:
    driver.get("https://example.com")
    # Your scraping logic here

9. Health Checks and Monitoring

Implement health checks to monitor browser instance health:

class BrowserHealthMonitor {
    constructor(browser) {
        this.browser = browser;
        this.isHealthy = true;
        this.startMonitoring();
    }

    async startMonitoring() {
        setInterval(async () => {
            try {
                const pages = await this.browser.pages();
                if (pages.length === 0) {
                    await this.browser.newPage();
                }
                this.isHealthy = true;
            } catch (error) {
                console.error('Browser health check failed:', error.message);
                this.isHealthy = false;
            }
        }, 30000); // Check every 30 seconds
    }

    async ensureHealthy() {
        if (!this.isHealthy) {
            throw new Error('Browser is not healthy');
        }
    }
}

Conclusion

Effective error handling in Headless Chromium scripts requires a multi-layered approach combining proper exception handling, timeout management, retry logic, and monitoring. By implementing these best practices, you'll create more robust and reliable automation scripts that can handle the unpredictable nature of web environments.

Remember to always test your error handling logic under various failure conditions and continuously monitor your scripts in production to identify and address new error patterns as they emerge. For specific error handling scenarios, also consider how to handle errors in Puppeteer for additional insights.

Regular monitoring, logging, and alerting will help you maintain healthy automation scripts and quickly respond to issues when they occur.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon