How do I manage browser contexts in Headless Chromium?

Managing browser contexts in Headless Chromium can be an essential feature when you want to run parallel sessions with isolated environments for cookies, local storage, and cache. This can be useful for testing multi-session scenarios, automating various user interactions without interference, or scraping websites without shared state.

Google's Puppeteer library for Node.js is a popular tool that provides a high-level API to control headless Chrome or Chromium. Similarly, for Python, there are libraries like Pyppeteer (a port of Puppeteer) and playwright-python (a Python version of Microsoft's Playwright which supports Chromium, Firefox, and WebKit).

JavaScript (Puppeteer)

Here's an example of how to manage browser contexts using Puppeteer in JavaScript:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: true });

  // Create a new incognito browser context
  const context = await browser.createIncognitoBrowserContext();

  // Create a new page in a pristine context.
  const page = await context.newPage();

  await page.goto('https://example.com');
  // ... do some actions in the page ...

  // Close the context
  await context.close();

  // You can create multiple contexts, each with its own isolated environment
  const context2 = await browser.createIncognitoBrowserContext();
  const page2 = await context2.newPage();
  await page2.goto('https://example.com');
  // ... do some actions in the page ...

  // Close the second context
  await context2.close();

  // Finally, close the browser
  await browser.close();
})();

Python (Pyppeteer)

For Python, you can achieve similar functionality using Pyppeteer:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch(headless=True)

    # Create a new incognito browser context
    context = await browser.createIncognitoBrowserContext()

    # Create a new page in a pristine context.
    page = await context.newPage()

    await page.goto('https://example.com')
    # ... do some actions in the page ...

    # Close the context
    await context.close()

    # You can create multiple contexts, each with its own isolated environment
    context2 = await browser.createIncognitoBrowserContext()
    page2 = await context2.newPage()
    await page2.goto('https://example.com')
    # ... do some actions in the page ...

    # Close the second context
    await context2.close()

    # Finally, close the browser
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Python (playwright-python)

Alternatively, you can use Playwright in Python to manage browser contexts:

from playwright.sync_api import sync_playwright

def run(playwright):
    browser = playwright.chromium.launch(headless=True)

    # Create a new incognito browser context
    context = browser.new_context()

    # Create a new page inside context
    page = context.new_page()

    page.goto('https://example.com')
    # ... do some actions in the page ...

    # Close the context
    context.close()

    # You can create multiple contexts, each with its own isolated environment
    context2 = browser.new_context()
    page2 = context2.new_page()
    page2.goto('https://example.com')
    # ... do some actions in the page ...

    # Close the second context
    context2.close()

    # Finally, close the browser
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

Make sure to handle exceptions and proper cleanup to avoid leaving zombie processes. Also, check the documentation of the libraries for the latest updates and best practices.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon