Selectors in Puppeteer are essential for targeting specific DOM elements on web pages. Puppeteer supports both CSS selectors and XPath expressions, providing powerful tools for element selection during web scraping and automation tasks.
Selector Methods Overview
Puppeteer provides four main methods for working with selectors:
page.$(selector)
- Returns the first matching elementpage.$$(selector)
- Returns all matching elementspage.$eval(selector, pageFunction)
- Executes a function on the first matching elementpage.$$eval(selector, pageFunction)
- Executes a function on all matching elements
1. Using page.$(selector)
Returns the first element that matches the specified selector, similar to document.querySelector()
.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Select by ID
const titleElement = await page.$('#main-title');
// Select by class
const buttonElement = await page.$('.submit-button');
// Select by attribute
const linkElement = await page.$('a[href="/about"]');
// Check if element exists
const element = await page.$('.non-existent');
if (element) {
console.log('Element found');
} else {
console.log('Element not found');
}
await browser.close();
})();
2. Using page.$$(selector)
Returns all elements that match the specified selector, similar to document.querySelectorAll()
.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Get all links
const allLinks = await page.$$('a');
console.log(`Found ${allLinks.length} links`);
// Get all list items
const listItems = await page.$$('ul li');
// Get all elements with specific class
const cards = await page.$$('.card');
await browser.close();
})();
3. Using page.$eval(selector, pageFunction)
Executes a function within the page context on the first matching element. Throws an error if no element is found.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Get text content
const titleText = await page.$eval('h1', el => el.textContent.trim());
console.log('Title:', titleText);
// Get attribute value
const linkHref = await page.$eval('a', el => el.href);
console.log('Link URL:', linkHref);
// Get element properties
const elementInfo = await page.$eval('#main-content', el => ({
tagName: el.tagName,
className: el.className,
innerHTML: el.innerHTML,
clientHeight: el.clientHeight
}));
console.log('Element info:', elementInfo);
await browser.close();
})();
4. Using page.$$eval(selector, pageFunction)
Executes a function within the page context on all matching elements.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Get text from all matching elements
const allHeadings = await page.$$eval('h1, h2, h3',
elements => elements.map(el => el.textContent.trim())
);
console.log('Headings:', allHeadings);
// Get links and their URLs
const linksData = await page.$$eval('a', links =>
links.map(link => ({
text: link.textContent.trim(),
url: link.href,
target: link.target
}))
);
console.log('Links data:', linksData);
await browser.close();
})();
XPath Selectors
Puppeteer also supports XPath expressions using page.$x()
:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Find element by text content
const elements = await page.$x("//button[contains(text(), 'Submit')]");
// Find element by attribute
const linkElements = await page.$x("//a[@class='external-link']");
// Complex XPath expression
const specificElements = await page.$x(
"//div[@class='container']//p[position()=2]"
);
if (elements.length > 0) {
// Get text content from XPath result
const text = await page.evaluate(el => el.textContent, elements[0]);
console.log('Button text:', text);
}
await browser.close();
})();
Advanced Selector Techniques
Waiting for Elements
// Wait for element to appear
await page.waitForSelector('.dynamic-content');
// Wait for element with timeout
await page.waitForSelector('.slow-loading', { timeout: 5000 });
// Wait for element to be hidden
await page.waitForSelector('.loading-spinner', { hidden: true });
Element Interaction
// Click using selector
await page.click('.submit-button');
// Type into input field
await page.type('#email-input', 'user@example.com');
// Select option from dropdown
await page.select('#country-select', 'US');
Error Handling
try {
const element = await page.$eval('.required-element', el => el.textContent);
console.log('Element text:', element);
} catch (error) {
console.log('Element not found or other error:', error.message);
}
// Safe element selection
const element = await page.$('.optional-element');
if (element) {
const text = await element.evaluate(el => el.textContent);
console.log('Optional element text:', text);
}
Best Practices
- Use specific selectors: Prefer IDs and unique classes over generic tag names
- Handle missing elements: Always check if elements exist before using them
- Use waitForSelector: Wait for dynamic content to load before selecting
- Prefer $eval over $ + evaluate: It's more efficient for simple data extraction
- Use try-catch blocks: Handle cases where elements might not exist
These selector methods provide comprehensive tools for targeting and manipulating DOM elements in Puppeteer, making web scraping and automation tasks more reliable and efficient.