Yes, it is possible to control multiple headless Chromium instances in parallel. This can be done using various tools and libraries in different programming languages. Below, I will show you how to do this in Python and JavaScript (Node.js), which are popular languages for web scraping and automation tasks.
Python with Selenium
In Python, you can use the Selenium WebDriver with the ChromeDriver to control headless Chrome instances. You'll need to have Selenium installed, along with the ChromeDriver that matches the version of Chrome you're using.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from threading import Thread
def run_chrome_instance(instance_number):
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
service = Service('/path/to/chromedriver')
driver = webdriver.Chrome(service=service, options=chrome_options)
try:
driver.get('http://example.com')
print(f"Instance {instance_number}: Page title is {driver.title}")
finally:
driver.quit()
threads = []
number_of_instances = 5 # For example, create 5 headless Chrome instances.
# Start multiple Chrome instances in separate threads.
for i in range(number_of_instances):
thread = Thread(target=run_chrome_instance, args=(i,))
threads.append(thread)
thread.start()
# Wait for all threads to complete.
for thread in threads:
thread.join()
JavaScript (Node.js) with Puppeteer
In Node.js, Puppeteer is a popular library to control headless Chrome. To run multiple instances, you can use Promise.all
to handle parallel execution.
First, you need to install Puppeteer:
npm install puppeteer
Then, you can control multiple instances as follows:
const puppeteer = require('puppeteer');
(async () => {
const number_of_instances = 5; // For example, create 5 headless Chrome instances.
const browsers = [];
for (let i = 0; i < number_of_instances; i++) {
browsers.push(puppeteer.launch({ headless: true }));
}
const browserInstances = await Promise.all(browsers);
try {
await Promise.all(browserInstances.map(async (browser, index) => {
const page = await browser.newPage();
await page.goto('http://example.com');
const title = await page.title();
console.log(`Instance ${index}: Page title is ${title}`);
await browser.close();
}));
} catch (error) {
console.error('Error running headless instances:', error);
}
})();
In both examples, we're launching several instances of headless Chrome and visiting a web page in parallel. The Python example uses threading to achieve concurrency, while the Node.js example uses Promise.all
for parallel execution of async functions.
It's important to be aware that running multiple instances of a headless browser can be resource-intensive, so the number of instances you can run in parallel may be limited by the capabilities of your machine. If you plan to run a large number of instances, you might need to manage resources carefully or use a more sophisticated approach, such as distributing the workload across multiple servers.