Table of contents

How do I install Simple HTML DOM?

Simple HTML DOM is a lightweight PHP library that provides an easy way to manipulate HTML documents. It's particularly popular for web scraping tasks because it offers jQuery-like CSS selectors and a simple API for parsing HTML content.

Installation Methods

Method 1: Composer Installation (Recommended)

The easiest and most reliable way to install Simple HTML DOM is through Composer:

composer require voku/simple_html_dom

Note: The original simple-html-dom/simple-html-dom package is no longer maintained. Use voku/simple_html_dom for the actively maintained fork with PHP 8+ support.

After installation, you can use the library with autoloading:

<?php
require_once 'vendor/autoload.php';

use voku\helper\HtmlDomParser;

// Parse HTML from string
$html = HtmlDomParser::str_get_html('<html><body><h1>Hello World</h1></body></html>');

// Find and output the h1 element
$h1 = $html->findOne('h1');
echo $h1->text(); // Outputs: Hello World

// Parse HTML from URL
$html = HtmlDomParser::file_get_html('https://example.com');

// Find all links
foreach ($html->find('a') as $link) {
    echo $link->href . "\n";
}

Method 2: Manual Installation

For projects not using Composer, you can install manually:

  1. Download the library:

    • Visit: https://github.com/voku/simple_html_dom
    • Download the latest release or clone the repository
  2. Include the required files:

<?php
// Include the main library file
require_once 'path/to/simple_html_dom/src/voku/helper/HtmlDomParser.php';

// You may need to include additional dependencies manually
// Check the composer.json for required packages

use voku\helper\HtmlDomParser;

$html = HtmlDomParser::str_get_html('<div>Hello</div>');
echo $html->find('div')[0]->text();

Legacy Simple HTML DOM

If you need to use the original Simple HTML DOM library (not recommended for new projects):

# For legacy projects only
composer require simple-html-dom/simple-html-dom
<?php
require_once 'vendor/autoload.php';

// Create DOM object from string
$html = str_get_html('<html><body>Hello!</body></html>');

// Find elements
$body = $html->find('body', 0);
echo $body->innertext;

// Clean up memory
$html->clear();
unset($html);

System Requirements

  • PHP: 7.0+ (8.0+ recommended for voku/simple_html_dom)
  • Extensions:
    • dom extension (usually enabled by default)
    • libxml extension
    • mbstring extension (recommended)

Check your PHP version and extensions:

php -v
php -m | grep -E "(dom|libxml|mbstring)"

Common Usage Examples

Basic HTML Parsing

<?php
use voku\helper\HtmlDomParser;

$html = HtmlDomParser::str_get_html('
    <div class="container">
        <h1 id="title">Welcome</h1>
        <p class="text">This is a paragraph.</p>
        <ul>
            <li>Item 1</li>
            <li>Item 2</li>
        </ul>
    </div>
');

// Find by ID
$title = $html->findOne('#title');
echo $title->text(); // "Welcome"

// Find by class
$paragraph = $html->findOne('.text');
echo $paragraph->text(); // "This is a paragraph."

// Find multiple elements
$items = $html->find('li');
foreach ($items as $item) {
    echo $item->text() . "\n";
}

Web Scraping Example

<?php
use voku\helper\HtmlDomParser;

// Scrape a website
$html = HtmlDomParser::file_get_html('https://news.ycombinator.com');

// Extract article titles and URLs
$articles = $html->find('.titleline > a');

foreach ($articles as $article) {
    $title = $article->text();
    $url = $article->href;

    echo "Title: {$title}\n";
    echo "URL: {$url}\n\n";
}

Troubleshooting

Common Issues and Solutions

1. Composer Installation Fails

# Clear Composer cache
composer clear-cache

# Update Composer
composer self-update

# Try installing with verbose output
composer require voku/simple_html_dom -v

2. Memory Issues with Large HTML

// Set memory limit for large documents
ini_set('memory_limit', '256M');

// Always clean up
$html->clear();
unset($html);

3. SSL Certificate Issues

// For file_get_html with HTTPS URLs
$context = stream_context_create([
    'http' => [
        'verify_peer' => false,
        'verify_peer_name' => false,
    ]
]);

$html = HtmlDomParser::file_get_html('https://example.com', $context);

4. Character Encoding Problems

// Specify encoding when parsing
$html = HtmlDomParser::str_get_html($htmlString, 'UTF-8');

Best Practices

  • Always check if elements exist before accessing their properties
  • Clean up DOM objects to free memory: $html->clear(); unset($html);
  • Use appropriate selectors for better performance
  • Respect robots.txt and website terms of service
  • Implement rate limiting for web scraping
  • Handle errors gracefully with try-catch blocks
<?php
try {
    $html = HtmlDomParser::file_get_html('https://example.com');

    if ($html === false) {
        throw new Exception('Failed to load HTML');
    }

    $title = $html->findOne('title');
    if ($title) {
        echo $title->text();
    }

} catch (Exception $e) {
    echo "Error: " . $e->getMessage();
} finally {
    if (isset($html)) {
        $html->clear();
    }
}

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon