Simple HTML DOM provides several methods to remove elements from the DOM in PHP. The primary approach is using the outertext
property to completely remove elements, or innertext
to remove only the content while preserving the tag structure.
Basic Element Removal
Remove Single Element
<?php
require 'simple_html_dom.php';
// Sample HTML content
$html_content = '
<html>
<body>
<div id="content">
<h1>Title</h1>
<p class="description">Important content</p>
<span class="ads">Advertisement</span>
<p>More content here</p>
</div>
</body>
</html>';
$html = str_get_html($html_content);
// Find and remove the advertisement span
$ad_element = $html->find('span.ads', 0);
if ($ad_element) {
$ad_element->outertext = ''; // Completely removes the element
}
echo $html;
// Output: HTML without the <span class="ads"> element
?>
Remove Multiple Elements
<?php
// Remove all elements with specific class
$unwanted_elements = $html->find('.remove-me');
foreach ($unwanted_elements as $element) {
$element->outertext = '';
}
// Remove multiple different selectors
$selectors = ['.ads', '.popup', '.banner', 'script'];
foreach ($selectors as $selector) {
foreach ($html->find($selector) as $element) {
$element->outertext = '';
}
}
?>
Different Removal Methods
Complete Element Removal (outertext
)
<?php
$html = str_get_html('<div><p class="unwanted">Remove this</p><p>Keep this</p></div>');
// Remove entire element including tags
$element = $html->find('p.unwanted', 0);
if ($element) {
$element->outertext = ''; // Removes <p class="unwanted">Remove this</p>
}
echo $html; // Output: <div><p>Keep this</p></div>
?>
Content-Only Removal (innertext
)
<?php
$html = str_get_html('<div><p class="clear-content">Remove content</p></div>');
// Remove only content, keep the tag
$element = $html->find('p.clear-content', 0);
if ($element) {
$element->innertext = ''; // Removes content but keeps <p class="clear-content"></p>
}
echo $html; // Output: <div><p class="clear-content"></p></div>
?>
Practical Examples
Clean HTML by Removing Unwanted Elements
<?php
function cleanHtml($html_content) {
$html = str_get_html($html_content);
if (!$html) {
return false;
}
// Define unwanted elements
$unwanted_selectors = [
'script', // Remove all JavaScript
'style', // Remove inline CSS
'.advertisement', // Remove ads
'.popup', // Remove popups
'[style*="display: none"]', // Remove hidden elements
'iframe[src*="ads"]' // Remove ad iframes
];
// Remove unwanted elements
foreach ($unwanted_selectors as $selector) {
foreach ($html->find($selector) as $element) {
$element->outertext = '';
}
}
return $html->save();
}
// Usage
$dirty_html = file_get_contents('webpage.html');
$clean_html = cleanHtml($dirty_html);
echo $clean_html;
?>
Remove Elements Based on Content
<?php
$html = str_get_html($html_content);
// Remove paragraphs containing specific text
foreach ($html->find('p') as $paragraph) {
if (stripos($paragraph->plaintext, 'advertisement') !== false) {
$paragraph->outertext = '';
}
}
// Remove empty elements after cleanup
foreach ($html->find('div') as $div) {
if (trim($div->plaintext) === '') {
$div->outertext = '';
}
}
?>
Conditional Element Removal
<?php
function removeElementsConditionally($html_content, $conditions) {
$html = str_get_html($html_content);
foreach ($conditions as $condition) {
$elements = $html->find($condition['selector']);
foreach ($elements as $element) {
$should_remove = false;
// Check various conditions
if (isset($condition['contains_text'])) {
$should_remove = stripos($element->plaintext, $condition['contains_text']) !== false;
}
if (isset($condition['attribute'])) {
$attr_value = $element->getAttribute($condition['attribute']['name']);
$should_remove = $attr_value === $condition['attribute']['value'];
}
if ($should_remove) {
$element->outertext = '';
}
}
}
return $html->save();
}
// Usage example
$conditions = [
[
'selector' => 'div',
'contains_text' => 'sponsored'
],
[
'selector' => 'img',
'attribute' => ['name' => 'class', 'value' => 'tracking-pixel']
]
];
$cleaned_html = removeElementsConditionally($html_content, $conditions);
?>
Best Practices
Error Handling and Validation
<?php
function safeRemoveElements($html_content, $selectors) {
// Validate input
if (empty($html_content) || !is_array($selectors)) {
return false;
}
$html = str_get_html($html_content);
if (!$html) {
return false;
}
$removed_count = 0;
foreach ($selectors as $selector) {
$elements = $html->find($selector);
foreach ($elements as $element) {
if ($element) {
$element->outertext = '';
$removed_count++;
}
}
}
// Clean up and return
$result = $html->save();
$html->clear(); // Free memory
return [
'html' => $result,
'removed_count' => $removed_count
];
}
?>
Memory Management
When working with large HTML documents, remember to clean up:
<?php
$html = str_get_html($large_html_content);
// Perform removals
foreach ($html->find('.unwanted') as $element) {
$element->outertext = '';
}
$result = $html->save();
$html->clear(); // Important: free memory
echo $result;
?>
Key Points
outertext = ''
: Removes the entire element including its tagsinnertext = ''
: Removes only the content, keeping the empty tag- Always check if elements exist before attempting removal
- Use
$html->clear()
to free memory when working with large documents - Simple HTML DOM works server-side only; for client-side removal, use JavaScript
This approach gives you complete control over HTML structure manipulation in PHP, making it ideal for web scraping cleanup and content processing tasks.