Html Agility Pack (HAP) is a popular .NET library for parsing and manipulating HTML documents, but several alternatives offer different features, better performance, or modern APIs. Here's a comprehensive overview of the best alternatives across different programming languages.
.NET Alternatives
1. AngleSharp ⭐ (Recommended)
Best for: Modern .NET applications requiring HTML5/CSS3 support
AngleSharp is the most popular modern alternative to HAP, offering full HTML5 and CSS3 compliance with a clean, async-first API.
Features: - Full HTML5 and CSS3 support - Async/await support - CSS selector queries - DOM manipulation - Better performance than HAP
using AngleSharp;
using AngleSharp.Html.Dom;
// Create configuration and context
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
// Load document from URL
var document = await context.OpenAsync("https://example.com");
// Query elements using CSS selectors
var titles = document.QuerySelectorAll("h1, h2, h3");
var links = document.QuerySelectorAll("a[href]")
.Cast<IHtmlAnchorElement>()
.Select(link => new { Text = link.TextContent, Url = link.Href });
// Extract specific data
var pageTitle = document.Title;
var metaDescription = document.QuerySelector("meta[name='description']")?.GetAttribute("content");
Installation: dotnet add package AngleSharp
2. CsQuery
Best for: Developers familiar with jQuery syntax
using CsQuery;
// Load from URL or HTML string
CQ dom = CQ.CreateFromUrl("https://example.com");
// or: CQ dom = CQ.Create(htmlString);
// jQuery-like syntax
var titles = dom["h1, h2, h3"];
var firstParagraph = dom["p"].First().Text();
var links = dom["a[href]"].Select(link => new {
Text = dom[link].Text(),
Url = dom[link].Attr("href")
});
Note: CsQuery is no longer actively maintained. Consider AngleSharp for new projects.
3. Fizzler (HAP Extension)
Best for: Existing HAP projects that need CSS selector support
using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;
var web = new HtmlWeb();
var document = web.Load("https://example.com");
// Use CSS selectors with HAP
var products = document.DocumentNode.QuerySelectorAll(".product-item");
var prices = document.DocumentNode.QuerySelectorAll(".price").Select(node => node.InnerText);
var images = document.DocumentNode.QuerySelectorAll("img[src]")
.Select(img => img.GetAttributeValue("src", ""));
Installation: dotnet add package Fizzler.Systems.HtmlAgilityPack
4. System.Text.Json + Regular Expressions
Best for: Simple parsing tasks or performance-critical scenarios
using System.Text.Json;
using System.Text.RegularExpressions;
// For simple extraction tasks
var titlePattern = @"<title>(.*?)</title>";
var title = Regex.Match(html, titlePattern, RegexOptions.IgnoreCase).Groups[1].Value;
// For JSON-LD structured data
var jsonLdPattern = @"<script[^>]*type=[""']application/ld\+json[""'][^>]*>(.*?)</script>";
var jsonLdMatch = Regex.Match(html, jsonLdPattern, RegexOptions.Singleline | RegexOptions.IgnoreCase);
if (jsonLdMatch.Success)
{
var structuredData = JsonSerializer.Deserialize<JsonElement>(jsonLdMatch.Groups[1].Value);
}
Python Alternatives
1. BeautifulSoup
Best for: Beginner-friendly HTML parsing with excellent documentation
from bs4 import BeautifulSoup
import requests
# Fetch and parse
response = requests.get('https://example.com')
soup = BeautifulSoup(response.content, 'html.parser')
# Extract data
titles = [title.get_text() for title in soup.find_all(['h1', 'h2', 'h3'])]
links = [{'text': a.get_text(), 'url': a.get('href')}
for a in soup.find_all('a', href=True)]
# CSS selectors
products = soup.select('.product-item')
prices = [price.get_text() for price in soup.select('.price')]
Installation: pip install beautifulsoup4 requests
2. lxml
Best for: High-performance parsing of large documents
from lxml import html, etree
import requests
# Parse HTML
response = requests.get('https://example.com')
tree = html.fromstring(response.content)
# XPath queries (more powerful than CSS selectors)
titles = tree.xpath('//h1/text() | //h2/text() | //h3/text()')
product_data = tree.xpath('//div[@class="product"]')
# Extract complex data structures
products = []
for product in product_data:
name = product.xpath('.//h3/text()')[0] if product.xpath('.//h3/text()') else ''
price = product.xpath('.//*[@class="price"]/text()')[0] if product.xpath('.//*[@class="price"]/text()') else ''
products.append({'name': name, 'price': price})
Installation: pip install lxml requests
JavaScript/Node.js Alternatives
1. Cheerio
Best for: Server-side HTML parsing with jQuery-like syntax
const cheerio = require('cheerio');
const axios = require('axios');
// Fetch and parse
const response = await axios.get('https://example.com');
const $ = cheerio.load(response.data);
// jQuery-like syntax
const titles = $('h1, h2, h3').map((i, el) => $(el).text()).get();
const links = $('a[href]').map((i, el) => ({
text: $(el).text(),
url: $(el).attr('href')
})).get();
// Extract product data
const products = $('.product-item').map((i, el) => ({
name: $(el).find('.product-name').text(),
price: $(el).find('.price').text(),
image: $(el).find('img').attr('src')
})).get();
Installation: npm install cheerio axios
2. Puppeteer/Playwright
Best for: JavaScript-rendered content and complex interactions
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Wait for dynamic content
await page.waitForSelector('.product-list');
// Extract data after JavaScript execution
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product-item')).map(item => ({
name: item.querySelector('.product-name')?.textContent,
price: item.querySelector('.price')?.textContent,
availability: item.querySelector('.stock-status')?.textContent
}));
});
await browser.close();
})();
Installation: npm install puppeteer
Choosing the Right Alternative
| Use Case | Recommended Alternative | Reason | |----------|------------------------|---------| | Modern .NET applications | AngleSharp | HTML5 support, async API, active development | | Existing HAP projects | Fizzler | Minimal migration, CSS selectors | | Python web scraping | BeautifulSoup | Beginner-friendly, excellent documentation | | High-performance Python | lxml | Fastest parsing, XPath support | | Node.js applications | Cheerio | jQuery syntax, lightweight | | JavaScript-heavy sites | Puppeteer/Playwright | Full browser rendering | | Simple text extraction | Regular Expressions | Minimal dependencies, fastest |
Migration Tips from HAP
When migrating from Html Agility Pack:
- AngleSharp: Replace
HtmlDocument.Load()
withcontext.OpenAsync()
- Update selectors: Convert XPath to CSS selectors where possible
- Handle async: Most modern alternatives use async/await patterns
- Test thoroughly: Different parsers may handle malformed HTML differently
Choose your alternative based on your specific requirements: performance needs, language ecosystem, team expertise, and maintenance considerations.