Yes, Nokogiri is a versatile library that can parse both XML and HTML documents. It is written in Ruby and provides an easy-to-use interface for parsing, querying, and manipulating XML and HTML content. Nokogiri leverages the libxml2 library under the hood, which is a powerful XML parser that ensures speed and compliance with a wide range of XML standards.
Here's how you can use Nokogiri to parse an XML document in Ruby:
require 'nokogiri'
require 'open-uri'
# Sample XML content
xml_content = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element attribute="value">Content</element>
</root>
XML
# Parse the XML content with Nokogiri
doc = Nokogiri::XML(xml_content)
# Access elements using XPath or CSS selectors
element = doc.xpath('//element').first
puts element.text # => Content
# You can also access attributes
puts element['attribute'] # => value
Similarly, you can parse an HTML document using Nokogiri as follows:
require 'nokogiri'
require 'open-uri'
# Sample HTML content
html_content = <<-HTML
<!DOCTYPE html>
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<h1>Hello, Nokogiri!</h1>
<p class="description">This is a sample paragraph.</p>
</body>
</html>
HTML
# Parse the HTML content with Nokogiri
doc = Nokogiri::HTML(html_content)
# Access elements using XPath or CSS selectors
heading = doc.css('h1').first
puts heading.text # => Hello, Nokogiri!
# Get the class attribute of the paragraph
paragraph = doc.css('p.description').first
puts paragraph['class'] # => description
Nokogiri's ability to parse both XML and HTML with a consistent API makes it a popular choice for web scraping and data extraction tasks in Ruby. The library's comprehensive documentation provides detailed information on how to handle various parsing and querying scenarios.