Cheerio provides several methods for removing elements from the DOM, with .remove()
being the most commonly used. Here's a comprehensive guide to different removal techniques.
Basic Element Removal
The .remove()
method permanently removes selected elements from the DOM:
const cheerio = require('cheerio');
const html = `
<div class="container">
<p>Keep this paragraph</p>
<div class="unwanted">Remove this div</div>
<span class="unwanted">Remove this span</span>
<p>Keep this paragraph too</p>
</div>`;
const $ = cheerio.load(html);
// Remove all elements with class 'unwanted'
$('.unwanted').remove();
console.log($.html());
Output:
<div class="container">
<p>Keep this paragraph</p>
<p>Keep this paragraph too</p>
</div>
Advanced Removal Techniques
Remove by Tag Name
// Remove all script tags
$('script').remove();
// Remove all style tags
$('style').remove();
// Remove all comments (requires special handling)
$('*').contents().filter(function() {
return this.type === 'comment';
}).remove();
Remove by Attribute
// Remove elements with specific data attributes
$('[data-remove="true"]').remove();
// Remove elements with empty href attributes
$('a[href=""]').remove();
// Remove images without alt text
$('img:not([alt])').remove();
Conditional Removal
// Remove elements containing specific text
$('p').filter(function() {
return $(this).text().includes('advertisement');
}).remove();
// Remove empty elements
$('div').filter(function() {
return $(this).text().trim() === '';
}).remove();
Alternative Removal Methods
Using .empty()
Removes all child elements but keeps the parent:
const html = '<div class="container"><p>Child content</p><span>More content</span></div>';
const $ = cheerio.load(html);
$('.container').empty();
console.log($.html()); // <div class="container"></div>
Using .unwrap()
Removes the parent element but keeps children:
const html = '<div class="wrapper"><p>Keep this</p><span>And this</span></div>';
const $ = cheerio.load(html);
$('.wrapper').children().unwrap();
console.log($.html()); // <p>Keep this</p><span>And this</span>
Practical Example: Cleaning Web Content
Here's a real-world example of removing unwanted elements from scraped content:
const cheerio = require('cheerio');
function cleanContent(html) {
const $ = cheerio.load(html);
// Remove common unwanted elements
$('script, style, noscript').remove();
// Remove ads and tracking elements
$('[class*="ad"], [id*="ad"], [class*="tracking"]').remove();
// Remove empty paragraphs
$('p').filter(function() {
return $(this).text().trim() === '';
}).remove();
// Remove social media widgets
$('.social-widget, .share-buttons').remove();
// Remove comments
$('*').contents().filter(function() {
return this.type === 'comment';
}).remove();
return $.html();
}
// Usage
const dirtyHtml = `
<article>
<h1>Article Title</h1>
<p>Great content here</p>
<script>trackingCode();</script>
<div class="ad-banner">Advertisement</div>
<p></p>
<p>More content</p>
</article>`;
const cleanHtml = cleanContent(dirtyHtml);
console.log(cleanHtml);
Performance Considerations
- Batch operations: Remove multiple element types in a single chain when possible
- Selector efficiency: Use specific selectors to avoid unnecessary DOM traversal
- Memory management: Remove elements early in your processing pipeline to reduce memory usage
// Efficient: Chain multiple removals
$('script, style, .ads, .tracking').remove();
// Less efficient: Multiple separate calls
$('script').remove();
$('style').remove();
$('.ads').remove();
$('.tracking').remove();
Important Notes
.remove()
permanently deletes elements from the Cheerio instance- Removed elements cannot be recovered unless you reload the original HTML
- Changes only affect the Cheerio DOM representation, not the original source
- Use
$.html()
to get the modified HTML string after removal