Table of contents

How do I debug issues with Headless Chromium automation scripts?

Debugging Headless Chromium automation scripts can be challenging since you can't visually see what's happening in the browser. However, there are numerous effective techniques and tools available to help you identify and resolve issues in your automation scripts.

Visual Debugging Techniques

1. Enable Non-Headless Mode

The easiest way to debug is to temporarily disable headless mode to see what's actually happening in the browser:

# Python with Selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
# Comment out headless mode for debugging
# options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome(options=options)
// JavaScript with Puppeteer
const puppeteer = require('puppeteer');

const browser = await puppeteer.launch({
  headless: false, // Set to false for debugging
  slowMo: 250,     // Slow down operations for visibility
  devtools: true   // Open DevTools automatically
});

2. Take Screenshots at Key Points

Screenshots help you understand the page state at different stages of your automation:

# Python with Selenium
driver.get('https://example.com')
driver.save_screenshot('step1_initial_load.png')

# Perform some actions
element = driver.find_element(By.ID, 'submit-button')
element.click()
driver.save_screenshot('step2_after_click.png')
// JavaScript with Puppeteer
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'step1_initial_load.png' });

await page.click('#submit-button');
await page.screenshot({ path: 'step2_after_click.png' });

3. Record Videos

For complex workflows, video recording can provide better insights:

// Puppeteer with puppeteer-screen-recorder
const { PuppeteerScreenRecorder } = require('puppeteer-screen-recorder');

const recorder = new PuppeteerScreenRecorder(page);
await recorder.start('./debug_session.mp4');

// Your automation code here

await recorder.stop();

Console and Network Debugging

1. Monitor Console Messages

Capture JavaScript console messages to identify client-side errors:

# Python with Selenium
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

caps = DesiredCapabilities.CHROME
caps['goog:loggingPrefs'] = {'browser': 'ALL'}
driver = webdriver.Chrome(desired_capabilities=caps)

# After navigation, get console logs
logs = driver.get_log('browser')
for log in logs:
    print(f"[{log['level']}] {log['message']}")
// Puppeteer console monitoring
page.on('console', msg => {
  console.log(`PAGE LOG [${msg.type()}]: ${msg.text()}`);
});

page.on('pageerror', error => {
  console.log(`PAGE ERROR: ${error.message}`);
});

2. Network Request Monitoring

Track network requests to identify API failures or slow responses:

// Puppeteer network monitoring
page.on('request', request => {
  console.log(`Request: ${request.method()} ${request.url()}`);
});

page.on('response', response => {
  console.log(`Response: ${response.status()} ${response.url()}`);
  if (response.status() >= 400) {
    console.log(`Error response: ${response.status()} for ${response.url()}`);
  }
});

When working with complex navigation scenarios, understanding how to handle browser sessions in Puppeteer can be crucial for maintaining state during debugging.

Element Debugging Strategies

1. Verify Element Existence and Visibility

Before interacting with elements, confirm they exist and are visible:

# Python with Selenium
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "my-element"))
    )
    print(f"Element found: {element.tag_name}")
    print(f"Element visible: {element.is_displayed()}")
    print(f"Element enabled: {element.is_enabled()}")
except TimeoutException:
    print("Element not found within timeout period")
// Puppeteer element verification
try {
  await page.waitForSelector('#my-element', { timeout: 10000 });

  const element = await page.$('#my-element');
  const isVisible = await element.isIntersectingViewport();
  const boundingBox = await element.boundingBox();

  console.log(`Element visible: ${isVisible}`);
  console.log(`Element position:`, boundingBox);
} catch (error) {
  console.log('Element not found:', error.message);
}

2. Inspect Element Properties

Gather detailed information about elements that aren't behaving as expected:

// Puppeteer element inspection
const elementInfo = await page.evaluate((selector) => {
  const element = document.querySelector(selector);
  if (!element) return null;

  return {
    tagName: element.tagName,
    id: element.id,
    className: element.className,
    textContent: element.textContent,
    innerHTML: element.innerHTML,
    computedStyle: window.getComputedStyle(element),
    attributes: Array.from(element.attributes).map(attr => ({
      name: attr.name,
      value: attr.value
    }))
  };
}, '#my-element');

console.log('Element details:', elementInfo);

Timing and Synchronization Issues

1. Dynamic Content Loading

Debug issues with dynamically loaded content by implementing proper wait strategies:

# Python - Wait for specific conditions
def wait_for_ajax_complete(driver):
    WebDriverWait(driver, 30).until(
        lambda d: d.execute_script("return jQuery.active == 0")
    )

def wait_for_element_text_change(driver, selector, initial_text):
    WebDriverWait(driver, 30).until(
        lambda d: d.find_element(By.CSS_SELECTOR, selector).text != initial_text
    )

For handling complex AJAX scenarios, refer to our guide on how to handle AJAX requests using Puppeteer.

2. Custom Wait Functions

Create specific wait conditions for your application's behavior:

// Puppeteer custom wait functions
async function waitForNetworkIdle(page, timeout = 30000) {
  return page.waitForLoadState('networkidle', { timeout });
}

async function waitForCustomCondition(page, condition, timeout = 30000) {
  return page.waitForFunction(condition, { timeout });
}

// Usage example
await waitForCustomCondition(page, () => {
  return document.querySelector('.loading-spinner') === null;
});

Error Handling and Recovery

1. Comprehensive Exception Handling

Implement robust error handling with detailed logging:

# Python error handling
import logging
from selenium.common.exceptions import TimeoutException, NoSuchElementException

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

try:
    driver.get(url)
    logger.info(f"Successfully loaded page: {url}")

    element = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.ID, "submit-btn"))
    )
    element.click()
    logger.info("Button clicked successfully")

except TimeoutException as e:
    logger.error(f"Timeout waiting for element: {e}")
    driver.save_screenshot('timeout_error.png')

except NoSuchElementException as e:
    logger.error(f"Element not found: {e}")
    logger.info(f"Current page source length: {len(driver.page_source)}")

except Exception as e:
    logger.error(f"Unexpected error: {e}")
    driver.save_screenshot('unexpected_error.png')

2. Retry Mechanisms

Implement retry logic for flaky operations:

// JavaScript retry function
async function retryOperation(operation, maxRetries = 3, delay = 1000) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error) {
      console.log(`Attempt ${attempt} failed: ${error.message}`);

      if (attempt === maxRetries) {
        throw error;
      }

      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

// Usage
await retryOperation(async () => {
  await page.click('#unreliable-button');
  await page.waitForSelector('.success-message');
});

Performance Debugging

1. Monitor Resource Usage

Track memory and CPU usage to identify performance bottlenecks:

// Puppeteer performance monitoring
const performanceMetrics = await page.metrics();
console.log('Performance metrics:', performanceMetrics);

// Monitor memory usage over time
setInterval(async () => {
  const metrics = await page.metrics();
  console.log(`Memory: ${Math.round(metrics.JSHeapUsedSize / 1024 / 1024)}MB`);
}, 5000);

2. Page Load Analysis

Analyze page loading performance:

// Measure page load times
const start = Date.now();
await page.goto('https://example.com');
const loadTime = Date.now() - start;
console.log(`Page loaded in ${loadTime}ms`);

// Get detailed timing information
const timingInfo = await page.evaluate(() => {
  const timing = performance.timing;
  return {
    domContentLoaded: timing.domContentLoadedEventEnd - timing.navigationStart,
    loadComplete: timing.loadEventEnd - timing.navigationStart,
    firstPaint: performance.getEntriesByType('paint')[0]?.startTime
  };
});
console.log('Timing details:', timingInfo);

DevTools Integration

1. Remote Debugging

Enable remote debugging to inspect the browser state:

# Launch Chrome with remote debugging
google-chrome --remote-debugging-port=9222 --no-first-run --no-default-browser-check
# Connect to remote debugging instance
options = Options()
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(options=options)

2. Protocol-Level Debugging

Access Chrome DevTools Protocol directly:

// Puppeteer CDP access
const client = await page.target().createCDPSession();

// Enable runtime domain
await client.send('Runtime.enable');

// Listen to console API calls
client.on('Runtime.consoleAPICalled', (event) => {
  console.log('Console API called:', event.args);
});

Understanding how to handle timeouts in Puppeteer is essential when debugging timing-related issues in your automation scripts.

Common Debugging Scenarios

1. Element Not Found Issues

# Debug missing elements
def debug_missing_element(driver, selector):
    print(f"Looking for element: {selector}")
    print(f"Current URL: {driver.current_url}")
    print(f"Page title: {driver.title}")

    # Check if page is fully loaded
    ready_state = driver.execute_script("return document.readyState")
    print(f"Document ready state: {ready_state}")

    # Check for similar elements
    similar_elements = driver.find_elements(By.CSS_SELECTOR, selector[:-1] + "*")
    print(f"Found {len(similar_elements)} similar elements")

    # Save page source for inspection
    with open('debug_page_source.html', 'w') as f:
        f.write(driver.page_source)

2. Authentication Flow Debugging

// Debug login flows
async function debugLogin(page, username, password) {
  console.log('Starting login process...');

  await page.goto('https://example.com/login');
  await page.screenshot({ path: 'login_page.png' });

  await page.type('#username', username);
  await page.type('#password', password);
  await page.screenshot({ path: 'filled_form.png' });

  await page.click('#login-button');

  // Wait for navigation or error message
  try {
    await page.waitForNavigation({ timeout: 5000 });
    console.log('Login successful - redirected to:', page.url());
  } catch (error) {
    await page.screenshot({ path: 'login_error.png' });
    const errorMsg = await page.$eval('.error-message', el => el.textContent);
    console.log('Login failed:', errorMsg);
  }
}

Best Practices for Debugging

  1. Start Simple: Begin with basic functionality and gradually add complexity
  2. Use Meaningful Names: Name your screenshots and logs descriptively
  3. Log Everything: Include timestamps and context in your debug output
  4. Test Incrementally: Debug one feature at a time
  5. Use Version Control: Track changes to identify when issues were introduced
  6. Document Issues: Keep a record of common problems and their solutions

Conclusion

Effective debugging of Headless Chromium automation scripts requires a systematic approach combining visual inspection, logging, performance monitoring, and error handling. By implementing these techniques and tools, you can significantly reduce the time spent troubleshooting issues and improve the reliability of your automation scripts. Remember to remove debugging code and restore headless mode before deploying to production environments.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon