Yes, Html Agility Pack can parse XML documents as well as HTML. The Html Agility Pack (HAP) is a .NET library that is designed to read, manipulate, and write HTML and XML documents. It is particularly useful for tasks where you need to handle web content that is not well-formed, as the library is very tolerant of non-standard and broken HTML.
While HAP is often associated with HTML due to its name and common use cases, it can handle XML equally well. The library provides a HtmlDocument
class for HTML and an HtmlDocument
or XmlDocument
class for XML, both of which allow you to navigate and manipulate the document tree.
Here's an example of how you can use Html Agility Pack to parse an XML document in C#:
using System;
using HtmlAgilityPack;
class Program
{
static void Main()
{
var xml = @"<?xml version=""1.0"" encoding=""UTF-8""?>
<catalog>
<book id=""bk101"">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with XML.</description>
</book>
<!-- More books here -->
</catalog>";
HtmlDocument xmlDoc = new HtmlDocument();
xmlDoc.LoadHtml(xml); // or use Load method to load from file
// Select single node
HtmlNode bookNode = xmlDoc.DocumentNode.SelectSingleNode("//book[@id='bk101']");
if (bookNode != null)
{
Console.WriteLine("Book Found:");
Console.WriteLine($"Author: {bookNode.SelectSingleNode("author").InnerText}");
Console.WriteLine($"Title: {bookNode.SelectSingleNode("title").InnerText}");
}
else
{
Console.WriteLine("Book not found.");
}
// Iterate over nodes
HtmlNodeCollection bookNodes = xmlDoc.DocumentNode.SelectNodes("//book");
foreach (HtmlNode book in bookNodes)
{
Console.WriteLine(book.SelectSingleNode("title").InnerText);
}
}
}
In the example above, an XML string is loaded into an HtmlDocument
, and XPath is used to query the document. The LoadHtml
method can be used to load XML content from a string, while Load
can be used to load it from a file. The HtmlNode
class is used to navigate and query parts of the XML.
Keep in mind that while HAP can be used to parse XML, if you are working with well-formed XML, it might be more appropriate to use the System.Xml.Linq
namespace or System.Xml
namespace in .NET which are specifically designed for XML processing. These namespaces offer LINQ to XML (XDocument
, XElement
, etc.) and other XML classes (XmlDocument
, XmlNode
, etc.) which can provide more XML-centric features and might be more efficient for XML-only scenarios.