How do you get the inner HTML content of an element using Cheerio?

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server in Node.js. To get the inner HTML content of an element using Cheerio, you would first need to load the HTML content into Cheerio and then use the .html() method on the selected element.

Here's how you can do it:

  1. Install Cheerio if you haven't already by running npm install cheerio.

  2. Load your HTML content into Cheerio.

  3. Use the .html() method to get the inner HTML of an element.

Here's an example in Node.js:

const cheerio = require('cheerio');

// Sample HTML content
const html = `
<html>
  <body>
    <div id="content">
      <p>This is a paragraph inside a div.</p>
      <ul>
        <li>List item 1</li>
        <li>List item 2</li>
      </ul>
    </div>
  </body>
</html>
`;

// Load the HTML content into Cheerio
const $ = cheerio.load(html);

// Select the element and get its inner HTML
const innerHTML = $('#content').html();

console.log(innerHTML);
// Output will be:
// <p>This is a paragraph inside a div.</p>
// <ul>
//   <li>List item 1</li>
//   <li>List item 2</li>
// </ul>

In the example above, we used the #content selector to select the element with id="content" and then retrieved its inner HTML content, which includes all the HTML inside the <div> tag.

If you only want the text content without any HTML tags, you can use the .text() method instead:

const textContent = $('#content').text();
console.log(textContent);
// Output will be:
// This is a paragraph inside a div.
// List item 1
// List item 2

Remember that Cheerio provides a jQuery-like API, so most of the traversing and manipulation methods you're familiar with from jQuery are available in Cheerio as well.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon