Cheerio provides several methods for removing elements from the DOM, with .remove() being the most commonly used. Here's a comprehensive guide to different removal techniques.
Basic Element Removal
The .remove() method permanently removes selected elements from the DOM:
const cheerio = require('cheerio');
const html = `
<div class="container">
  <p>Keep this paragraph</p>
  <div class="unwanted">Remove this div</div>
  <span class="unwanted">Remove this span</span>
  <p>Keep this paragraph too</p>
</div>`;
const $ = cheerio.load(html);
// Remove all elements with class 'unwanted'
$('.unwanted').remove();
console.log($.html());
Output:
<div class="container">
  <p>Keep this paragraph</p>
  <p>Keep this paragraph too</p>
</div>
Advanced Removal Techniques
Remove by Tag Name
// Remove all script tags
$('script').remove();
// Remove all style tags
$('style').remove();
// Remove all comments (requires special handling)
$('*').contents().filter(function() {
  return this.type === 'comment';
}).remove();
Remove by Attribute
// Remove elements with specific data attributes
$('[data-remove="true"]').remove();
// Remove elements with empty href attributes
$('a[href=""]').remove();
// Remove images without alt text
$('img:not([alt])').remove();
Conditional Removal
// Remove elements containing specific text
$('p').filter(function() {
  return $(this).text().includes('advertisement');
}).remove();
// Remove empty elements
$('div').filter(function() {
  return $(this).text().trim() === '';
}).remove();
Alternative Removal Methods
Using .empty()
Removes all child elements but keeps the parent:
const html = '<div class="container"><p>Child content</p><span>More content</span></div>';
const $ = cheerio.load(html);
$('.container').empty();
console.log($.html()); // <div class="container"></div>
Using .unwrap()
Removes the parent element but keeps children:
const html = '<div class="wrapper"><p>Keep this</p><span>And this</span></div>';
const $ = cheerio.load(html);
$('.wrapper').children().unwrap();
console.log($.html()); // <p>Keep this</p><span>And this</span>
Practical Example: Cleaning Web Content
Here's a real-world example of removing unwanted elements from scraped content:
const cheerio = require('cheerio');
function cleanContent(html) {
  const $ = cheerio.load(html);
  // Remove common unwanted elements
  $('script, style, noscript').remove();
  // Remove ads and tracking elements
  $('[class*="ad"], [id*="ad"], [class*="tracking"]').remove();
  // Remove empty paragraphs
  $('p').filter(function() {
    return $(this).text().trim() === '';
  }).remove();
  // Remove social media widgets
  $('.social-widget, .share-buttons').remove();
  // Remove comments
  $('*').contents().filter(function() {
    return this.type === 'comment';
  }).remove();
  return $.html();
}
// Usage
const dirtyHtml = `
<article>
  <h1>Article Title</h1>
  <p>Great content here</p>
  <script>trackingCode();</script>
  <div class="ad-banner">Advertisement</div>
  <p></p>
  <p>More content</p>
</article>`;
const cleanHtml = cleanContent(dirtyHtml);
console.log(cleanHtml);
Performance Considerations
- Batch operations: Remove multiple element types in a single chain when possible
- Selector efficiency: Use specific selectors to avoid unnecessary DOM traversal
- Memory management: Remove elements early in your processing pipeline to reduce memory usage
// Efficient: Chain multiple removals
$('script, style, .ads, .tracking').remove();
// Less efficient: Multiple separate calls
$('script').remove();
$('style').remove();
$('.ads').remove();
$('.tracking').remove();
Important Notes
- .remove()permanently deletes elements from the Cheerio instance
- Removed elements cannot be recovered unless you reload the original HTML
- Changes only affect the Cheerio DOM representation, not the original source
- Use $.html()to get the modified HTML string after removal