The Html Agility Pack (HAP) is a powerful .NET library for parsing HTML and XML documents. It provides an intuitive API similar to XPath and jQuery, making it ideal for web scraping, document manipulation, and HTML processing tasks.
Installation Methods
1. Visual Studio Package Manager UI
The most straightforward method for Visual Studio users:
- Open Your Project: Launch Visual Studio and open your .NET project
- Access NuGet Package Manager:
- Right-click on your project in Solution Explorer
- Select "Manage NuGet Packages..."
- Search and Install:
- Click the "Browse" tab
- Search for "HtmlAgilityPack"
- Select "HtmlAgilityPack" by Simon Mourier
- Click "Install" and accept any license agreements
2. .NET CLI (Recommended)
For command-line interface users or CI/CD environments:
# Navigate to your project directory
cd YourProjectFolder
# Install Html Agility Pack
dotnet add package HtmlAgilityPack
# Install specific version (optional)
dotnet add package HtmlAgilityPack --version 1.11.54
3. Package Manager Console
Within Visual Studio's Package Manager Console:
- Open Tools → NuGet Package Manager → Package Manager Console
- Run the installation command:
Install-Package HtmlAgilityPack
# For specific version
Install-Package HtmlAgilityPack -Version 1.11.54
4. PackageReference (Manual)
Add directly to your .csproj
file:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net6.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="HtmlAgilityPack" Version="1.11.54" />
</ItemGroup>
</Project>
Then restore packages:
dotnet restore
Verification and Basic Usage
After installation, verify by adding the using statement:
using HtmlAgilityPack;
Simple Example
Here's a basic example demonstrating HTML parsing:
using System;
using HtmlAgilityPack;
class Program
{
static void Main()
{
// Create HtmlDocument instance
var doc = new HtmlDocument();
// Load HTML content
string html = @"
<html>
<body>
<div class='content'>
<h1>Main Title</h1>
<p>First paragraph</p>
<p>Second paragraph</p>
</div>
</body>
</html>";
doc.LoadHtml(html);
// Extract title
var title = doc.DocumentNode
.SelectSingleNode("//h1")?.InnerText;
Console.WriteLine($"Title: {title}");
// Extract all paragraphs
var paragraphs = doc.DocumentNode
.SelectNodes("//p");
if (paragraphs != null)
{
foreach (var p in paragraphs)
{
Console.WriteLine($"Paragraph: {p.InnerText}");
}
}
}
}
Web Scraping Example
Real-world web scraping scenario:
using System;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;
class WebScrapingExample
{
static async Task Main()
{
try
{
using var client = new HttpClient();
// Fetch webpage
string url = "https://example.com";
string html = await client.GetStringAsync(url);
// Parse with Html Agility Pack
var doc = new HtmlDocument();
doc.LoadHtml(html);
// Extract specific data
var links = doc.DocumentNode
.SelectNodes("//a[@href]");
if (links != null)
{
foreach (var link in links)
{
string href = link.GetAttributeValue("href", "");
string text = link.InnerText.Trim();
Console.WriteLine($"Link: {text} -> {href}");
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
}
}
}
Common Patterns and Best Practices
Error Handling
Always check for null results when selecting nodes:
// Safe node selection
var node = doc.DocumentNode.SelectSingleNode("//div[@class='content']");
if (node != null)
{
string content = node.InnerText;
// Process content
}
// Safe collection handling
var nodes = doc.DocumentNode.SelectNodes("//p");
if (nodes?.Count > 0)
{
foreach (var p in nodes)
{
// Process each paragraph
}
}
CSS Selectors (Alternative)
Html Agility Pack also supports CSS selectors via the QuerySelectorAll
method:
using HtmlAgilityPack.CssSelectors.NetCore;
// CSS selector usage
var elements = doc.QuerySelectorAll("div.content p");
foreach (var element in elements)
{
Console.WriteLine(element.InnerText);
}
Troubleshooting
Common Issues
- Package not found: Ensure you're using the correct package name "HtmlAgilityPack"
- Version conflicts: Check your target framework compatibility
- Namespace errors: Verify the
using HtmlAgilityPack;
statement is included
Framework Compatibility
Html Agility Pack supports: - .NET Framework 2.0+ - .NET Core 1.0+ - .NET 5.0+ - .NET Standard 1.3+
Choose the appropriate version based on your project's target framework.
With Html Agility Pack installed, you're ready to parse HTML documents, scrape web content, and manipulate DOM structures efficiently in your .NET applications.