How to use selectors in Puppeteer?

Puppeteer is a Node library that provides a high-level API to control Google Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default but can be configured to run full (non-headless) Chrome or Chromium. In Puppeteer, selectors are used to select DOM elements on the web page.

Here is how you can use selectors in Puppeteer:

1. Using page.$(selector):

This function returns the first element within the document that matches the specified selector. It's equivalent to document.querySelector.

Here is an example of using this method to get an element with the id 'example':

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://example.com');

  const exampleElement = await page.$('#example');

  await browser.close();
})();

2. Using page.$$(selector):

This function returns all elements within the document that match the specified selector. It's equivalent to document.querySelectorAll.

Here is an example of using this method to get all div elements:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://example.com');

  const divs = await page.$$('div');

  await browser.close();
})();

3. Using page.$eval(selector, pageFunction[, ...args]):

This function runs document.querySelector within the page and passes it as the first argument to pageFunction. If no element matches the selector, the method throws an error.

Here is an example of using this method to get the text content of an element with the id 'example':

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://example.com');

  const exampleText = await page.$eval('#example', el => el.textContent);

  console.log(exampleText);

  await browser.close();
})();

4. Using page.$$eval(selector, pageFunction[, ...args]):

This function runs document.querySelectorAll within the page and passes it as the first argument to pageFunction.

Here is an example of using this method to get the text content of all div elements:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://example.com');

  const divTexts = await page.$$eval('div', divs => divs.map(div => div.textContent));

  console.log(divTexts);

  await browser.close();
})();

Please note that all the above methods will return JavaScript handles to the DOM elements. If you want to retrieve the properties of these elements, you need to use the property function on the handle.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon