Table of contents

How to use selectors in Puppeteer?

Selectors in Puppeteer are essential for targeting specific DOM elements on web pages. Puppeteer supports both CSS selectors and XPath expressions, providing powerful tools for element selection during web scraping and automation tasks.

Selector Methods Overview

Puppeteer provides four main methods for working with selectors:

  • page.$(selector) - Returns the first matching element
  • page.$$(selector) - Returns all matching elements
  • page.$eval(selector, pageFunction) - Executes a function on the first matching element
  • page.$$eval(selector, pageFunction) - Executes a function on all matching elements

1. Using page.$(selector)

Returns the first element that matches the specified selector, similar to document.querySelector().

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  // Select by ID
  const titleElement = await page.$('#main-title');

  // Select by class
  const buttonElement = await page.$('.submit-button');

  // Select by attribute
  const linkElement = await page.$('a[href="/about"]');

  // Check if element exists
  const element = await page.$('.non-existent');
  if (element) {
    console.log('Element found');
  } else {
    console.log('Element not found');
  }

  await browser.close();
})();

2. Using page.$$(selector)

Returns all elements that match the specified selector, similar to document.querySelectorAll().

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  // Get all links
  const allLinks = await page.$$('a');
  console.log(`Found ${allLinks.length} links`);

  // Get all list items
  const listItems = await page.$$('ul li');

  // Get all elements with specific class
  const cards = await page.$$('.card');

  await browser.close();
})();

3. Using page.$eval(selector, pageFunction)

Executes a function within the page context on the first matching element. Throws an error if no element is found.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  // Get text content
  const titleText = await page.$eval('h1', el => el.textContent.trim());
  console.log('Title:', titleText);

  // Get attribute value
  const linkHref = await page.$eval('a', el => el.href);
  console.log('Link URL:', linkHref);

  // Get element properties
  const elementInfo = await page.$eval('#main-content', el => ({
    tagName: el.tagName,
    className: el.className,
    innerHTML: el.innerHTML,
    clientHeight: el.clientHeight
  }));

  console.log('Element info:', elementInfo);

  await browser.close();
})();

4. Using page.$$eval(selector, pageFunction)

Executes a function within the page context on all matching elements.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  // Get text from all matching elements
  const allHeadings = await page.$$eval('h1, h2, h3', 
    elements => elements.map(el => el.textContent.trim())
  );
  console.log('Headings:', allHeadings);

  // Get links and their URLs
  const linksData = await page.$$eval('a', links => 
    links.map(link => ({
      text: link.textContent.trim(),
      url: link.href,
      target: link.target
    }))
  );

  console.log('Links data:', linksData);

  await browser.close();
})();

XPath Selectors

Puppeteer also supports XPath expressions using page.$x():

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');

  // Find element by text content
  const elements = await page.$x("//button[contains(text(), 'Submit')]");

  // Find element by attribute
  const linkElements = await page.$x("//a[@class='external-link']");

  // Complex XPath expression
  const specificElements = await page.$x(
    "//div[@class='container']//p[position()=2]"
  );

  if (elements.length > 0) {
    // Get text content from XPath result
    const text = await page.evaluate(el => el.textContent, elements[0]);
    console.log('Button text:', text);
  }

  await browser.close();
})();

Advanced Selector Techniques

Waiting for Elements

// Wait for element to appear
await page.waitForSelector('.dynamic-content');

// Wait for element with timeout
await page.waitForSelector('.slow-loading', { timeout: 5000 });

// Wait for element to be hidden
await page.waitForSelector('.loading-spinner', { hidden: true });

Element Interaction

// Click using selector
await page.click('.submit-button');

// Type into input field
await page.type('#email-input', 'user@example.com');

// Select option from dropdown
await page.select('#country-select', 'US');

Error Handling

try {
  const element = await page.$eval('.required-element', el => el.textContent);
  console.log('Element text:', element);
} catch (error) {
  console.log('Element not found or other error:', error.message);
}

// Safe element selection
const element = await page.$('.optional-element');
if (element) {
  const text = await element.evaluate(el => el.textContent);
  console.log('Optional element text:', text);
}

Best Practices

  1. Use specific selectors: Prefer IDs and unique classes over generic tag names
  2. Handle missing elements: Always check if elements exist before using them
  3. Use waitForSelector: Wait for dynamic content to load before selecting
  4. Prefer $eval over $ + evaluate: It's more efficient for simple data extraction
  5. Use try-catch blocks: Handle cases where elements might not exist

These selector methods provide comprehensive tools for targeting and manipulating DOM elements in Puppeteer, making web scraping and automation tasks more reliable and efficient.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon