To get the inner HTML content of an element using Cheerio, use the .html()
method on your selected element. Cheerio is a server-side implementation of jQuery core, making DOM manipulation in Node.js straightforward and familiar.
Basic Usage
First, install Cheerio:
npm install cheerio
Then use the .html()
method to extract inner HTML:
const cheerio = require('cheerio');
const html = `
<html>
<body>
<div id="content">
<p>This is a paragraph inside a div.</p>
<ul>
<li>List item 1</li>
<li>List item 2</li>
</ul>
</div>
</body>
</html>
`;
// Load the HTML content into Cheerio
const $ = cheerio.load(html);
// Get the inner HTML of the element
const innerHTML = $('#content').html();
console.log(innerHTML);
// Output:
// <p>This is a paragraph inside a div.</p>
// <ul>
// <li>List item 1</li>
// <li>List item 2</li>
// </ul>
Working with Different Selectors
You can use any CSS selector to target elements:
// Select by ID
const idContent = $('#content').html();
// Select by class
const classContent = $('.container').html();
// Select by tag
const firstParagraph = $('p').first().html();
// Select by attribute
const linkContent = $('a[href="/home"]').html();
// Select nested elements
const listItem = $('#content ul li').first().html();
Handling Multiple Elements
When multiple elements match your selector, .html()
returns the inner HTML of the first matched element:
const html = `
<div class="card">Card 1 content</div>
<div class="card">Card 2 content</div>
<div class="card">Card 3 content</div>
`;
const $ = cheerio.load(html);
// Gets inner HTML of first matching element only
const firstCard = $('.card').html(); // "Card 1 content"
// To get all elements, iterate through them
$('.card').each((index, element) => {
console.log(`Card ${index + 1}:`, $(element).html());
});
Setting Inner HTML
You can also set inner HTML by passing content to the .html()
method:
// Set new inner HTML
$('#content').html('<p>New content</p>');
// Append to existing content
const currentHTML = $('#content').html();
$('#content').html(currentHTML + '<p>Additional content</p>');
HTML vs Text Content
Compare .html()
with .text()
to understand the difference:
const html = `<div id="example"><strong>Bold text</strong> and <em>italic text</em></div>`;
const $ = cheerio.load(html);
console.log($('#example').html());
// Output: <strong>Bold text</strong> and <em>italic text</em>
console.log($('#example').text());
// Output: Bold text and italic text
Error Handling
Always check if elements exist before calling .html()
:
const element = $('#nonexistent');
if (element.length > 0) {
const innerHTML = element.html();
console.log(innerHTML);
} else {
console.log('Element not found');
}
// Or use optional chaining (Node.js 14+)
const innerHTML = $('#content').html() || 'Element not found';
Real-World Web Scraping Example
const cheerio = require('cheerio');
const axios = require('axios');
async function scrapeArticleContent(url) {
try {
const response = await axios.get(url);
const $ = cheerio.load(response.data);
// Extract article content
const articleHTML = $('.article-content').html();
if (articleHTML) {
console.log('Article HTML:', articleHTML);
return articleHTML;
} else {
console.log('Article content not found');
return null;
}
} catch (error) {
console.error('Error scraping:', error.message);
return null;
}
}
The .html()
method is essential for extracting formatted content while preserving HTML structure, making it perfect for content extraction, template generation, and DOM manipulation tasks in Node.js applications.