How do you filter elements by their text content in Cheerio?

In Cheerio, you can filter elements based on their text content by using the .filter() method. This method allows you to use a custom function to determine which elements to include in the resulting collection, based on the text they contain.

Here is the general outline of how to use the .filter() method for text content:

const cheerio = require('cheerio');

// Load your HTML into Cheerio
const $ = cheerio.load('<html>...your HTML...</html>');

// Use the .filter() method to filter elements by their text content
const filteredElements = $('selector').filter(function() {
  // `this` refers to the current element in the iteration
  // Use $(this).text() to get the text content of the element
  return $(this).text().trim() === 'Your desired text';
});

// Do something with the filtered elements
filteredElements.each(function() {
  console.log($(this).html());
});

Here's an example where you have a list of items and you want to select only those list items (<li>) that have the text "Item 2":

HTML Example:

<ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
  <li>Item 2</li>
</ul>

Cheerio Code:

const cheerio = require('cheerio');
const html = `
<ul>
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
  <li>Item 2</li>
</ul>
`;

// Load the HTML
const $ = cheerio.load(html);

// Filter <li> elements that have text "Item 2"
const itemsWithTextItem2 = $('li').filter(function() {
  return $(this).text().trim() === 'Item 2';
});

// Output the HTML of elements that match the filter
itemsWithTextItem2.each(function() {
  console.log($(this).html()); // Should log: "Item 2"
});

In this code, we used the .filter() method to iterate over all <li> elements and compare their text content, after trimming white spaces, to the string "Item 2". Only those elements with matching text are included in itemsWithTextItem2.

Remember that .text() will get the combined text contents of the element and its descendants. If you need to match against text that includes child elements, you might need to manipulate the string or use additional selectors and methods to get the desired result.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon