HtmlAgilityPack and AngleSharp are the two most popular HTML parsing libraries for C# web scraping. While both serve similar purposes, they have distinct architectural differences, performance characteristics, and use cases that make each suitable for different scenarios.
Overview and Installation
HtmlAgilityPack
// Install via NuGet
Install-Package HtmlAgilityPack
// Basic usage
var web = new HtmlWeb();
var doc = web.Load("https://example.com");
AngleSharp
// Install via NuGet
Install-Package AngleSharp
// Basic usage
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync("https://example.com");
Key Differences
1. Parsing Engine and HTML Handling
HtmlAgilityPack: - Uses a forgiving parser designed for "real-world" broken HTML - Handles malformed HTML gracefully without requiring well-formed XML - Battle-tested with over 15 years of development - More lenient with invalid markup
// HtmlAgilityPack handles broken HTML well
var html = "<div><p>Unclosed paragraph<div>Nested incorrectly</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode.SelectNodes("//p");
AngleSharp: - HTML5-compliant parser that mimics browser behavior - Strictly follows W3C specifications - Provides a more accurate DOM representation - Better suited for modern, well-formed HTML
// AngleSharp creates a browser-like DOM
var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));
var paragraphs = document.QuerySelectorAll("p");
2. Querying and Selection Methods
HtmlAgilityPack: - Primary strength: XPath expressions - CSS selectors available via external libraries (Fizzler) - Node navigation through properties
// XPath querying (HtmlAgilityPack's strength)
var titleNodes = doc.DocumentNode.SelectNodes("//title");
var links = doc.DocumentNode.SelectNodes("//a[@href]");
// CSS selectors require Fizzler
var cssNodes = doc.DocumentNode.QuerySelectorAll("div.content p");
AngleSharp: - Native CSS selector support - LINQ-friendly API - jQuery-like syntax
// Native CSS selectors
var titles = document.QuerySelectorAll("title");
var links = document.QuerySelectorAll("a[href]");
var content = document.QuerySelectorAll("div.content p");
// LINQ integration
var linkTexts = document.QuerySelectorAll("a")
.Where(a => a.GetAttribute("href") != null)
.Select(a => a.TextContent);
3. Performance Comparison
Memory Usage: - HtmlAgilityPack: Generally lower memory footprint - AngleSharp: Higher memory usage due to complete DOM implementation
Parsing Speed: - HtmlAgilityPack: Faster for simple parsing tasks - AngleSharp: Slower but more accurate parsing
// Performance test example
var stopwatch = Stopwatch.StartNew();
// HtmlAgilityPack
var doc = new HtmlDocument();
doc.LoadHtml(largeHtmlContent);
var hapTime = stopwatch.ElapsedMilliseconds;
stopwatch.Restart();
// AngleSharp
var parser = new HtmlParser();
var document = parser.ParseDocument(largeHtmlContent);
var angleTime = stopwatch.ElapsedMilliseconds;
4. Advanced Features
HtmlAgilityPack: - Simple and focused on HTML parsing - No built-in CSS parsing - No JavaScript execution capabilities - Excellent for basic scraping tasks
// Simple data extraction
var prices = doc.DocumentNode
.SelectNodes("//span[@class='price']")
.Select(node => node.InnerText.Trim())
.ToList();
AngleSharp: - Built-in CSS parsing and manipulation - JavaScript execution via AngleSharp.Scripting - Form handling and submission - Cookie management
// Advanced features
var config = Configuration.Default
.WithDefaultLoader()
.WithCss();
var document = await context.OpenAsync("https://example.com");
// CSS manipulation
var stylesheet = document.StyleSheets.First();
var rules = stylesheet.Rules;
// Form handling
var form = document.QuerySelector("form") as IHtmlFormElement;
await form.SubmitAsync();
5. Async/Await Support
HtmlAgilityPack: - Synchronous by design - Manual async implementation needed
// Manual async wrapper
public async Task<HtmlDocument> LoadAsync(string url)
{
return await Task.Run(() =>
{
var web = new HtmlWeb();
return web.Load(url);
});
}
AngleSharp: - Built-in async support - Non-blocking operations
// Native async support
public async Task<IDocument> LoadPageAsync(string url)
{
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
return await context.OpenAsync(url);
}
Practical Examples
Scraping Product Information
HtmlAgilityPack approach:
public class ProductScraper
{
public List<Product> ScrapeProducts(string url)
{
var web = new HtmlWeb();
var doc = web.Load(url);
return doc.DocumentNode
.SelectNodes("//div[@class='product']")
.Select(node => new Product
{
Name = node.SelectSingleNode(".//h3")?.InnerText,
Price = node.SelectSingleNode(".//span[@class='price']")?.InnerText,
Image = node.SelectSingleNode(".//img")?.GetAttributeValue("src", "")
})
.ToList();
}
}
AngleSharp approach:
public class ProductScraper
{
public async Task<List<Product>> ScrapeProductsAsync(string url)
{
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(url);
return document.QuerySelectorAll("div.product")
.Select(element => new Product
{
Name = element.QuerySelector("h3")?.TextContent,
Price = element.QuerySelector("span.price")?.TextContent,
Image = element.QuerySelector("img")?.GetAttribute("src")
})
.ToList();
}
}
When to Choose Each Library
Choose HtmlAgilityPack when:
- Working with legacy or poorly-formed HTML
- Performance is critical for simple parsing tasks
- Your team is comfortable with XPath
- You need a lightweight, stable solution
- Building desktop applications or services with limited resources
Choose AngleSharp when:
- Working with modern web applications
- You need CSS parsing capabilities
- Browser-like behavior is important
- Your team prefers CSS selectors over XPath
- You require JavaScript execution or form handling
- Building web applications that need DOM manipulation
Performance Recommendations
// For high-volume scraping with HtmlAgilityPack
public class OptimizedScraper
{
private static readonly HtmlWeb web = new HtmlWeb();
public async Task<List<string>> ScrapeMultiplePages(IEnumerable<string> urls)
{
var tasks = urls.Select(async url =>
{
return await Task.Run(() =>
{
var doc = web.Load(url);
return doc.DocumentNode.SelectSingleNode("//title")?.InnerText;
});
});
return (await Task.WhenAll(tasks)).ToList();
}
}
// For AngleSharp with connection reuse
public class OptimizedAngleSharpScraper
{
private readonly IBrowsingContext context;
public OptimizedAngleSharpScraper()
{
var config = Configuration.Default
.WithDefaultLoader()
.WithDefaultCookies();
context = BrowsingContext.New(config);
}
public async Task<List<string>> ScrapeMultiplePages(IEnumerable<string> urls)
{
var tasks = urls.Select(async url =>
{
var document = await context.OpenAsync(url);
return document.Title;
});
return (await Task.WhenAll(tasks)).ToList();
}
}
Conclusion
Both libraries excel in different scenarios. HtmlAgilityPack remains the go-to choice for straightforward HTML parsing tasks, especially when dealing with malformed HTML or when performance is paramount. AngleSharp shines in modern web development scenarios where standards compliance, CSS parsing, and browser-like behavior are essential.
Consider your specific requirements, team expertise, and the complexity of your scraping tasks when making your choice. For simple data extraction from static pages, HtmlAgilityPack is often sufficient. For complex modern web applications requiring dynamic content handling, AngleSharp provides a more comprehensive solution.