In the Html Agility Pack, navigating through child nodes is a straightforward process once you have loaded the HTML document you want to work with. Html Agility Pack is a .NET library that allows you to parse HTML and XML documents and navigate the DOM tree in a similar way to XPath or CSS selectors.
Here's how you can navigate through child nodes using the Html Agility Pack:
- Load the HTML document.
- Select the parent node.
- Iterate through the
ChildNodes
collection.
Below is a step-by-step example in C#:
using HtmlAgilityPack;
using System;
using System.Linq;
class Program
{
static void Main(string[] args)
{
// Create an instance of HtmlDocument
var htmlDoc = new HtmlDocument();
// Load the HTML content (you can also load from a file or URL)
htmlDoc.LoadHtml(@"
<html>
<body>
<div id='parent'>
<p>First child</p>
<p>Second child</p>
<span>Third child</span>
</div>
</body>
</html>");
// Select the parent node using XPath
var parentNode = htmlDoc.DocumentNode.SelectSingleNode("//div[@id='parent']");
// Check if the node exists
if (parentNode != null)
{
// Iterate through the child nodes
foreach (var childNode in parentNode.ChildNodes)
{
// You can filter element types if needed
if (childNode.NodeType == HtmlNodeType.Element)
{
Console.WriteLine(childNode.Name + ": " + childNode.InnerText);
}
}
}
}
}
In this example, the code does the following:
- Loads the HTML content into an
HtmlDocument
. - Selects the
<div>
with the id "parent" as the parent node. - Iterates through the
ChildNodes
collection of the selected parent node. - Checks the
NodeType
to ensure it's an element node (ignoring text nodes, comments, etc.). - Outputs the name and inner text of each child element to the console.
To run this code, you need to install the Html Agility Pack via NuGet:
Install-Package HtmlAgilityPack
If you want to perform more complex navigation, you can also use XPath expressions to target specific child nodes or use SelectNodes
to retrieve a collection of nodes based on a query. Here's an example using XPath to get only <p>
children of the parent node:
// Select all <p> children of the parent node using XPath
var paragraphNodes = parentNode.SelectNodes(".//p");
if (paragraphNodes != null)
{
foreach (var pNode in paragraphNodes)
{
Console.WriteLine("Paragraph: " + pNode.InnerText);
}
}
In this snippet, .//p
is an XPath expression where .
indicates the current node (parentNode
), and //p
selects all the <p>
elements that are descendants of the current node. This way, you can specifically target only those children that are <p>
elements.