Table of contents

What are the differences between HtmlAgilityPack and AngleSharp for C# web scraping?

HtmlAgilityPack and AngleSharp are the two most popular HTML parsing libraries for C# web scraping. While both serve similar purposes, they have distinct architectural differences, performance characteristics, and use cases that make each suitable for different scenarios.

Overview and Installation

HtmlAgilityPack

// Install via NuGet
Install-Package HtmlAgilityPack

// Basic usage
var web = new HtmlWeb();
var doc = web.Load("https://example.com");

AngleSharp

// Install via NuGet
Install-Package AngleSharp

// Basic usage
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync("https://example.com");

Key Differences

1. Parsing Engine and HTML Handling

HtmlAgilityPack: - Uses a forgiving parser designed for "real-world" broken HTML - Handles malformed HTML gracefully without requiring well-formed XML - Battle-tested with over 15 years of development - More lenient with invalid markup

// HtmlAgilityPack handles broken HTML well
var html = "<div><p>Unclosed paragraph<div>Nested incorrectly</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var nodes = doc.DocumentNode.SelectNodes("//p");

AngleSharp: - HTML5-compliant parser that mimics browser behavior - Strictly follows W3C specifications - Provides a more accurate DOM representation - Better suited for modern, well-formed HTML

// AngleSharp creates a browser-like DOM
var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(html));
var paragraphs = document.QuerySelectorAll("p");

2. Querying and Selection Methods

HtmlAgilityPack: - Primary strength: XPath expressions - CSS selectors available via external libraries (Fizzler) - Node navigation through properties

// XPath querying (HtmlAgilityPack's strength)
var titleNodes = doc.DocumentNode.SelectNodes("//title");
var links = doc.DocumentNode.SelectNodes("//a[@href]");

// CSS selectors require Fizzler
var cssNodes = doc.DocumentNode.QuerySelectorAll("div.content p");

AngleSharp: - Native CSS selector support - LINQ-friendly API - jQuery-like syntax

// Native CSS selectors
var titles = document.QuerySelectorAll("title");
var links = document.QuerySelectorAll("a[href]");
var content = document.QuerySelectorAll("div.content p");

// LINQ integration
var linkTexts = document.QuerySelectorAll("a")
    .Where(a => a.GetAttribute("href") != null)
    .Select(a => a.TextContent);

3. Performance Comparison

Memory Usage: - HtmlAgilityPack: Generally lower memory footprint - AngleSharp: Higher memory usage due to complete DOM implementation

Parsing Speed: - HtmlAgilityPack: Faster for simple parsing tasks - AngleSharp: Slower but more accurate parsing

// Performance test example
var stopwatch = Stopwatch.StartNew();

// HtmlAgilityPack
var doc = new HtmlDocument();
doc.LoadHtml(largeHtmlContent);
var hapTime = stopwatch.ElapsedMilliseconds;

stopwatch.Restart();

// AngleSharp
var parser = new HtmlParser();
var document = parser.ParseDocument(largeHtmlContent);
var angleTime = stopwatch.ElapsedMilliseconds;

4. Advanced Features

HtmlAgilityPack: - Simple and focused on HTML parsing - No built-in CSS parsing - No JavaScript execution capabilities - Excellent for basic scraping tasks

// Simple data extraction
var prices = doc.DocumentNode
    .SelectNodes("//span[@class='price']")
    .Select(node => node.InnerText.Trim())
    .ToList();

AngleSharp: - Built-in CSS parsing and manipulation - JavaScript execution via AngleSharp.Scripting - Form handling and submission - Cookie management

// Advanced features
var config = Configuration.Default
    .WithDefaultLoader()
    .WithCss();

var document = await context.OpenAsync("https://example.com");

// CSS manipulation
var stylesheet = document.StyleSheets.First();
var rules = stylesheet.Rules;

// Form handling
var form = document.QuerySelector("form") as IHtmlFormElement;
await form.SubmitAsync();

5. Async/Await Support

HtmlAgilityPack: - Synchronous by design - Manual async implementation needed

// Manual async wrapper
public async Task<HtmlDocument> LoadAsync(string url)
{
    return await Task.Run(() => 
    {
        var web = new HtmlWeb();
        return web.Load(url);
    });
}

AngleSharp: - Built-in async support - Non-blocking operations

// Native async support
public async Task<IDocument> LoadPageAsync(string url)
{
    var config = Configuration.Default.WithDefaultLoader();
    var context = BrowsingContext.New(config);
    return await context.OpenAsync(url);
}

Practical Examples

Scraping Product Information

HtmlAgilityPack approach:

public class ProductScraper
{
    public List<Product> ScrapeProducts(string url)
    {
        var web = new HtmlWeb();
        var doc = web.Load(url);

        return doc.DocumentNode
            .SelectNodes("//div[@class='product']")
            .Select(node => new Product
            {
                Name = node.SelectSingleNode(".//h3")?.InnerText,
                Price = node.SelectSingleNode(".//span[@class='price']")?.InnerText,
                Image = node.SelectSingleNode(".//img")?.GetAttributeValue("src", "")
            })
            .ToList();
    }
}

AngleSharp approach:

public class ProductScraper
{
    public async Task<List<Product>> ScrapeProductsAsync(string url)
    {
        var config = Configuration.Default.WithDefaultLoader();
        var context = BrowsingContext.New(config);
        var document = await context.OpenAsync(url);

        return document.QuerySelectorAll("div.product")
            .Select(element => new Product
            {
                Name = element.QuerySelector("h3")?.TextContent,
                Price = element.QuerySelector("span.price")?.TextContent,
                Image = element.QuerySelector("img")?.GetAttribute("src")
            })
            .ToList();
    }
}

When to Choose Each Library

Choose HtmlAgilityPack when:

  • Working with legacy or poorly-formed HTML
  • Performance is critical for simple parsing tasks
  • Your team is comfortable with XPath
  • You need a lightweight, stable solution
  • Building desktop applications or services with limited resources

Choose AngleSharp when:

  • Working with modern web applications
  • You need CSS parsing capabilities
  • Browser-like behavior is important
  • Your team prefers CSS selectors over XPath
  • You require JavaScript execution or form handling
  • Building web applications that need DOM manipulation

Performance Recommendations

// For high-volume scraping with HtmlAgilityPack
public class OptimizedScraper
{
    private static readonly HtmlWeb web = new HtmlWeb();

    public async Task<List<string>> ScrapeMultiplePages(IEnumerable<string> urls)
    {
        var tasks = urls.Select(async url =>
        {
            return await Task.Run(() =>
            {
                var doc = web.Load(url);
                return doc.DocumentNode.SelectSingleNode("//title")?.InnerText;
            });
        });

        return (await Task.WhenAll(tasks)).ToList();
    }
}

// For AngleSharp with connection reuse
public class OptimizedAngleSharpScraper
{
    private readonly IBrowsingContext context;

    public OptimizedAngleSharpScraper()
    {
        var config = Configuration.Default
            .WithDefaultLoader()
            .WithDefaultCookies();
        context = BrowsingContext.New(config);
    }

    public async Task<List<string>> ScrapeMultiplePages(IEnumerable<string> urls)
    {
        var tasks = urls.Select(async url =>
        {
            var document = await context.OpenAsync(url);
            return document.Title;
        });

        return (await Task.WhenAll(tasks)).ToList();
    }
}

Conclusion

Both libraries excel in different scenarios. HtmlAgilityPack remains the go-to choice for straightforward HTML parsing tasks, especially when dealing with malformed HTML or when performance is paramount. AngleSharp shines in modern web development scenarios where standards compliance, CSS parsing, and browser-like behavior are essential.

Consider your specific requirements, team expertise, and the complexity of your scraping tasks when making your choice. For simple data extraction from static pages, HtmlAgilityPack is often sufficient. For complex modern web applications requiring dynamic content handling, AngleSharp provides a more comprehensive solution.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon