How do I remove elements from the DOM using Simple HTML DOM?

Simple HTML DOM provides several methods to remove elements from the DOM in PHP. The primary approach is using the outertext property to completely remove elements, or innertext to remove only the content while preserving the tag structure.

Basic Element Removal

Remove Single Element

<?php
require 'simple_html_dom.php';

// Sample HTML content
$html_content = '
<html>
<body>
    <div id="content">
        <h1>Title</h1>
        <p class="description">Important content</p>
        <span class="ads">Advertisement</span>
        <p>More content here</p>
    </div>
</body>
</html>';

$html = str_get_html($html_content);

// Find and remove the advertisement span
$ad_element = $html->find('span.ads', 0);
if ($ad_element) {
    $ad_element->outertext = '';  // Completely removes the element
}

echo $html;
// Output: HTML without the <span class="ads"> element
?>

Remove Multiple Elements

<?php
// Remove all elements with specific class
$unwanted_elements = $html->find('.remove-me');
foreach ($unwanted_elements as $element) {
    $element->outertext = '';
}

// Remove multiple different selectors
$selectors = ['.ads', '.popup', '.banner', 'script'];
foreach ($selectors as $selector) {
    foreach ($html->find($selector) as $element) {
        $element->outertext = '';
    }
}
?>

Different Removal Methods

Complete Element Removal (outertext)

<?php
$html = str_get_html('<div><p class="unwanted">Remove this</p><p>Keep this</p></div>');

// Remove entire element including tags
$element = $html->find('p.unwanted', 0);
if ($element) {
    $element->outertext = '';  // Removes <p class="unwanted">Remove this</p>
}

echo $html;  // Output: <div><p>Keep this</p></div>
?>

Content-Only Removal (innertext)

<?php
$html = str_get_html('<div><p class="clear-content">Remove content</p></div>');

// Remove only content, keep the tag
$element = $html->find('p.clear-content', 0);
if ($element) {
    $element->innertext = '';  // Removes content but keeps <p class="clear-content"></p>
}

echo $html;  // Output: <div><p class="clear-content"></p></div>
?>

Practical Examples

Clean HTML by Removing Unwanted Elements

<?php
function cleanHtml($html_content) {
    $html = str_get_html($html_content);

    if (!$html) {
        return false;
    }

    // Define unwanted elements
    $unwanted_selectors = [
        'script',           // Remove all JavaScript
        'style',            // Remove inline CSS
        '.advertisement',   // Remove ads
        '.popup',          // Remove popups
        '[style*="display: none"]',  // Remove hidden elements
        'iframe[src*="ads"]'         // Remove ad iframes
    ];

    // Remove unwanted elements
    foreach ($unwanted_selectors as $selector) {
        foreach ($html->find($selector) as $element) {
            $element->outertext = '';
        }
    }

    return $html->save();
}

// Usage
$dirty_html = file_get_contents('webpage.html');
$clean_html = cleanHtml($dirty_html);
echo $clean_html;
?>

Remove Elements Based on Content

<?php
$html = str_get_html($html_content);

// Remove paragraphs containing specific text
foreach ($html->find('p') as $paragraph) {
    if (stripos($paragraph->plaintext, 'advertisement') !== false) {
        $paragraph->outertext = '';
    }
}

// Remove empty elements after cleanup
foreach ($html->find('div') as $div) {
    if (trim($div->plaintext) === '') {
        $div->outertext = '';
    }
}
?>

Conditional Element Removal

<?php
function removeElementsConditionally($html_content, $conditions) {
    $html = str_get_html($html_content);

    foreach ($conditions as $condition) {
        $elements = $html->find($condition['selector']);

        foreach ($elements as $element) {
            $should_remove = false;

            // Check various conditions
            if (isset($condition['contains_text'])) {
                $should_remove = stripos($element->plaintext, $condition['contains_text']) !== false;
            }

            if (isset($condition['attribute'])) {
                $attr_value = $element->getAttribute($condition['attribute']['name']);
                $should_remove = $attr_value === $condition['attribute']['value'];
            }

            if ($should_remove) {
                $element->outertext = '';
            }
        }
    }

    return $html->save();
}

// Usage example
$conditions = [
    [
        'selector' => 'div',
        'contains_text' => 'sponsored'
    ],
    [
        'selector' => 'img',
        'attribute' => ['name' => 'class', 'value' => 'tracking-pixel']
    ]
];

$cleaned_html = removeElementsConditionally($html_content, $conditions);
?>

Best Practices

Error Handling and Validation

<?php
function safeRemoveElements($html_content, $selectors) {
    // Validate input
    if (empty($html_content) || !is_array($selectors)) {
        return false;
    }

    $html = str_get_html($html_content);
    if (!$html) {
        return false;
    }

    $removed_count = 0;

    foreach ($selectors as $selector) {
        $elements = $html->find($selector);

        foreach ($elements as $element) {
            if ($element) {
                $element->outertext = '';
                $removed_count++;
            }
        }
    }

    // Clean up and return
    $result = $html->save();
    $html->clear();  // Free memory

    return [
        'html' => $result,
        'removed_count' => $removed_count
    ];
}
?>

Memory Management

When working with large HTML documents, remember to clean up:

<?php
$html = str_get_html($large_html_content);

// Perform removals
foreach ($html->find('.unwanted') as $element) {
    $element->outertext = '';
}

$result = $html->save();
$html->clear();  // Important: free memory

echo $result;
?>

Key Points

  • outertext = '': Removes the entire element including its tags
  • innertext = '': Removes only the content, keeping the empty tag
  • Always check if elements exist before attempting removal
  • Use $html->clear() to free memory when working with large documents
  • Simple HTML DOM works server-side only; for client-side removal, use JavaScript

This approach gives you complete control over HTML structure manipulation in PHP, making it ideal for web scraping cleanup and content processing tasks.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon