How do I install Simple HTML DOM?

Simple HTML DOM is a lightweight PHP library that provides an easy way to manipulate HTML documents. It's particularly popular for web scraping tasks because it offers jQuery-like CSS selectors and a simple API for parsing HTML content.

Installation Methods

Method 1: Composer Installation (Recommended)

The easiest and most reliable way to install Simple HTML DOM is through Composer:

composer require voku/simple_html_dom

Note: The original simple-html-dom/simple-html-dom package is no longer maintained. Use voku/simple_html_dom for the actively maintained fork with PHP 8+ support.

After installation, you can use the library with autoloading:

<?php
require_once 'vendor/autoload.php';

use voku\helper\HtmlDomParser;

// Parse HTML from string
$html = HtmlDomParser::str_get_html('<html><body><h1>Hello World</h1></body></html>');

// Find and output the h1 element
$h1 = $html->findOne('h1');
echo $h1->text(); // Outputs: Hello World

// Parse HTML from URL
$html = HtmlDomParser::file_get_html('https://example.com');

// Find all links
foreach ($html->find('a') as $link) {
    echo $link->href . "\n";
}

Method 2: Manual Installation

For projects not using Composer, you can install manually:

Download the library:
- Visit: https://github.com/voku/simple_html_dom
- Download the latest release or clone the repository
Include the required files:

<?php
// Include the main library file
require_once 'path/to/simple_html_dom/src/voku/helper/HtmlDomParser.php';

// You may need to include additional dependencies manually
// Check the composer.json for required packages

use voku\helper\HtmlDomParser;

$html = HtmlDomParser::str_get_html('<div>Hello</div>');
echo $html->find('div')[0]->text();

Legacy Simple HTML DOM

If you need to use the original Simple HTML DOM library (not recommended for new projects):

# For legacy projects only
composer require simple-html-dom/simple-html-dom

<?php
require_once 'vendor/autoload.php';

// Create DOM object from string
$html = str_get_html('<html><body>Hello!</body></html>');

// Find elements
$body = $html->find('body', 0);
echo $body->innertext;

// Clean up memory
$html->clear();
unset($html);

System Requirements

PHP: 7.0+ (8.0+ recommended for voku/simple_html_dom)
Extensions:
- dom extension (usually enabled by default)
- libxml extension
- mbstring extension (recommended)

Check your PHP version and extensions:

php -v
php -m | grep -E "(dom|libxml|mbstring)"

Common Usage Examples

Basic HTML Parsing

<?php
use voku\helper\HtmlDomParser;

$html = HtmlDomParser::str_get_html('
    <div class="container">
        <h1 id="title">Welcome</h1>
        <p class="text">This is a paragraph.</p>
        <ul>
            <li>Item 1</li>
            <li>Item 2</li>
        </ul>
    </div>
');

// Find by ID
$title = $html->findOne('#title');
echo $title->text(); // "Welcome"

// Find by class
$paragraph = $html->findOne('.text');
echo $paragraph->text(); // "This is a paragraph."

// Find multiple elements
$items = $html->find('li');
foreach ($items as $item) {
    echo $item->text() . "\n";
}

Web Scraping Example

<?php
use voku\helper\HtmlDomParser;

// Scrape a website
$html = HtmlDomParser::file_get_html('https://news.ycombinator.com');

// Extract article titles and URLs
$articles = $html->find('.titleline > a');

foreach ($articles as $article) {
    $title = $article->text();
    $url = $article->href;

    echo "Title: {$title}\n";
    echo "URL: {$url}\n\n";
}

Troubleshooting

Common Issues and Solutions

1. Composer Installation Fails

# Clear Composer cache
composer clear-cache

# Update Composer
composer self-update

# Try installing with verbose output
composer require voku/simple_html_dom -v

2. Memory Issues with Large HTML

// Set memory limit for large documents
ini_set('memory_limit', '256M');

// Always clean up
$html->clear();
unset($html);

3. SSL Certificate Issues

// For file_get_html with HTTPS URLs
$context = stream_context_create([
    'http' => [
        'verify_peer' => false,
        'verify_peer_name' => false,
    ]
]);

$html = HtmlDomParser::file_get_html('https://example.com', $context);

4. Character Encoding Problems

// Specify encoding when parsing
$html = HtmlDomParser::str_get_html($htmlString, 'UTF-8');

Best Practices

Always check if elements exist before accessing their properties
Clean up DOM objects to free memory: $html->clear(); unset($html);
Use appropriate selectors for better performance
Respect robots.txt and website terms of service
Implement rate limiting for web scraping
Handle errors gracefully with try-catch blocks

<?php
try {
    $html = HtmlDomParser::file_get_html('https://example.com');

    if ($html === false) {
        throw new Exception('Failed to load HTML');
    }

    $title = $html->findOne('title');
    if ($title) {
        echo $title->text();
    }

} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
} finally {
    if (isset($html)) {
        $html->clear();
    }
}

Table of contents

How do I install Simple HTML DOM?

Installation Methods

Method 1: Composer Installation (Recommended)

Method 2: Manual Installation

Legacy Simple HTML DOM

System Requirements

Common Usage Examples

Basic HTML Parsing

Web Scraping Example

Troubleshooting

Common Issues and Solutions

Best Practices

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

Can I use Simple HTML DOM with PHP 7.x or 8.x?

How do I remove elements from the DOM using Simple HTML DOM?

Are there any alternative libraries to Simple HTML DOM for PHP?

Get Started Now