How do you get the inner HTML content of an element using Cheerio?

To get the inner HTML content of an element using Cheerio, use the .html() method on your selected element. Cheerio is a server-side implementation of jQuery core, making DOM manipulation in Node.js straightforward and familiar.

Basic Usage

First, install Cheerio:

npm install cheerio

Then use the .html() method to extract inner HTML:

const cheerio = require('cheerio');

const html = `
<html>
  <body>
    <div id="content">
      <p>This is a paragraph inside a div.</p>
      <ul>
        <li>List item 1</li>
        <li>List item 2</li>
      </ul>
    </div>
  </body>
</html>
`;

// Load the HTML content into Cheerio
const $ = cheerio.load(html);

// Get the inner HTML of the element
const innerHTML = $('#content').html();

console.log(innerHTML);
// Output:
// <p>This is a paragraph inside a div.</p>
// <ul>
//   <li>List item 1</li>
//   <li>List item 2</li>
// </ul>

Working with Different Selectors

You can use any CSS selector to target elements:

// Select by ID
const idContent = $('#content').html();

// Select by class
const classContent = $('.container').html();

// Select by tag
const firstParagraph = $('p').first().html();

// Select by attribute
const linkContent = $('a[href="/home"]').html();

// Select nested elements
const listItem = $('#content ul li').first().html();

Handling Multiple Elements

When multiple elements match your selector, .html() returns the inner HTML of the first matched element:

const html = `
<div class="card">Card 1 content</div>
<div class="card">Card 2 content</div>
<div class="card">Card 3 content</div>
`;

const $ = cheerio.load(html);

// Gets inner HTML of first matching element only
const firstCard = $('.card').html(); // "Card 1 content"

// To get all elements, iterate through them
$('.card').each((index, element) => {
  console.log(`Card ${index + 1}:`, $(element).html());
});

Setting Inner HTML

You can also set inner HTML by passing content to the .html() method:

// Set new inner HTML
$('#content').html('<p>New content</p>');

// Append to existing content
const currentHTML = $('#content').html();
$('#content').html(currentHTML + '<p>Additional content</p>');

HTML vs Text Content

Compare .html() with .text() to understand the difference:

const html = `<div id="example"><strong>Bold text</strong> and <em>italic text</em></div>`;
const $ = cheerio.load(html);

console.log($('#example').html()); 
// Output: <strong>Bold text</strong> and <em>italic text</em>

console.log($('#example').text()); 
// Output: Bold text and italic text

Error Handling

Always check if elements exist before calling .html():

const element = $('#nonexistent');

if (element.length > 0) {
  const innerHTML = element.html();
  console.log(innerHTML);
} else {
  console.log('Element not found');
}

// Or use optional chaining (Node.js 14+)
const innerHTML = $('#content').html() || 'Element not found';

Real-World Web Scraping Example

const cheerio = require('cheerio');
const axios = require('axios');

async function scrapeArticleContent(url) {
  try {
    const response = await axios.get(url);
    const $ = cheerio.load(response.data);

    // Extract article content
    const articleHTML = $('.article-content').html();

    if (articleHTML) {
      console.log('Article HTML:', articleHTML);
      return articleHTML;
    } else {
      console.log('Article content not found');
      return null;
    }
  } catch (error) {
    console.error('Error scraping:', error.message);
    return null;
  }
}

The .html() method is essential for extracting formatted content while preserving HTML structure, making it perfect for content extraction, template generation, and DOM manipulation tasks in Node.js applications.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon