Table of contents

How do I remove elements from the DOM using Simple HTML DOM?

Simple HTML DOM provides several methods to remove elements from the DOM in PHP. The primary approach is using the outertext property to completely remove elements, or innertext to remove only the content while preserving the tag structure.

Basic Element Removal

Remove Single Element

<?php
require 'simple_html_dom.php';

// Sample HTML content
$html_content = '
<html>
<body>
    <div id="content">
        <h1>Title</h1>
        <p class="description">Important content</p>
        <span class="ads">Advertisement</span>
        <p>More content here</p>
    </div>
</body>
</html>';

$html = str_get_html($html_content);

// Find and remove the advertisement span
$ad_element = $html->find('span.ads', 0);
if ($ad_element) {
    $ad_element->outertext = '';  // Completely removes the element
}

echo $html;
// Output: HTML without the <span class="ads"> element
?>

Remove Multiple Elements

<?php
// Remove all elements with specific class
$unwanted_elements = $html->find('.remove-me');
foreach ($unwanted_elements as $element) {
    $element->outertext = '';
}

// Remove multiple different selectors
$selectors = ['.ads', '.popup', '.banner', 'script'];
foreach ($selectors as $selector) {
    foreach ($html->find($selector) as $element) {
        $element->outertext = '';
    }
}
?>

Different Removal Methods

Complete Element Removal (outertext)

<?php
$html = str_get_html('<div><p class="unwanted">Remove this</p><p>Keep this</p></div>');

// Remove entire element including tags
$element = $html->find('p.unwanted', 0);
if ($element) {
    $element->outertext = '';  // Removes <p class="unwanted">Remove this</p>
}

echo $html;  // Output: <div><p>Keep this</p></div>
?>

Content-Only Removal (innertext)

<?php
$html = str_get_html('<div><p class="clear-content">Remove content</p></div>');

// Remove only content, keep the tag
$element = $html->find('p.clear-content', 0);
if ($element) {
    $element->innertext = '';  // Removes content but keeps <p class="clear-content"></p>
}

echo $html;  // Output: <div><p class="clear-content"></p></div>
?>

Practical Examples

Clean HTML by Removing Unwanted Elements

<?php
function cleanHtml($html_content) {
    $html = str_get_html($html_content);

    if (!$html) {
        return false;
    }

    // Define unwanted elements
    $unwanted_selectors = [
        'script',           // Remove all JavaScript
        'style',            // Remove inline CSS
        '.advertisement',   // Remove ads
        '.popup',          // Remove popups
        '[style*="display: none"]',  // Remove hidden elements
        'iframe[src*="ads"]'         // Remove ad iframes
    ];

    // Remove unwanted elements
    foreach ($unwanted_selectors as $selector) {
        foreach ($html->find($selector) as $element) {
            $element->outertext = '';
        }
    }

    return $html->save();
}

// Usage
$dirty_html = file_get_contents('webpage.html');
$clean_html = cleanHtml($dirty_html);
echo $clean_html;
?>

Remove Elements Based on Content

<?php
$html = str_get_html($html_content);

// Remove paragraphs containing specific text
foreach ($html->find('p') as $paragraph) {
    if (stripos($paragraph->plaintext, 'advertisement') !== false) {
        $paragraph->outertext = '';
    }
}

// Remove empty elements after cleanup
foreach ($html->find('div') as $div) {
    if (trim($div->plaintext) === '') {
        $div->outertext = '';
    }
}
?>

Conditional Element Removal

<?php
function removeElementsConditionally($html_content, $conditions) {
    $html = str_get_html($html_content);

    foreach ($conditions as $condition) {
        $elements = $html->find($condition['selector']);

        foreach ($elements as $element) {
            $should_remove = false;

            // Check various conditions
            if (isset($condition['contains_text'])) {
                $should_remove = stripos($element->plaintext, $condition['contains_text']) !== false;
            }

            if (isset($condition['attribute'])) {
                $attr_value = $element->getAttribute($condition['attribute']['name']);
                $should_remove = $attr_value === $condition['attribute']['value'];
            }

            if ($should_remove) {
                $element->outertext = '';
            }
        }
    }

    return $html->save();
}

// Usage example
$conditions = [
    [
        'selector' => 'div',
        'contains_text' => 'sponsored'
    ],
    [
        'selector' => 'img',
        'attribute' => ['name' => 'class', 'value' => 'tracking-pixel']
    ]
];

$cleaned_html = removeElementsConditionally($html_content, $conditions);
?>

Best Practices

Error Handling and Validation

<?php
function safeRemoveElements($html_content, $selectors) {
    // Validate input
    if (empty($html_content) || !is_array($selectors)) {
        return false;
    }

    $html = str_get_html($html_content);
    if (!$html) {
        return false;
    }

    $removed_count = 0;

    foreach ($selectors as $selector) {
        $elements = $html->find($selector);

        foreach ($elements as $element) {
            if ($element) {
                $element->outertext = '';
                $removed_count++;
            }
        }
    }

    // Clean up and return
    $result = $html->save();
    $html->clear();  // Free memory

    return [
        'html' => $result,
        'removed_count' => $removed_count
    ];
}
?>

Memory Management

When working with large HTML documents, remember to clean up:

<?php
$html = str_get_html($large_html_content);

// Perform removals
foreach ($html->find('.unwanted') as $element) {
    $element->outertext = '';
}

$result = $html->save();
$html->clear();  // Important: free memory

echo $result;
?>

Key Points

  • outertext = '': Removes the entire element including its tags
  • innertext = '': Removes only the content, keeping the empty tag
  • Always check if elements exist before attempting removal
  • Use $html->clear() to free memory when working with large documents
  • Simple HTML DOM works server-side only; for client-side removal, use JavaScript

This approach gives you complete control over HTML structure manipulation in PHP, making it ideal for web scraping cleanup and content processing tasks.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon