Nokogiri is a popular Ruby library for parsing HTML and XML. It provides an easy-to-use interface for navigating and manipulating these types of documents. When it comes to XML, namespaces are often used to avoid element name conflicts and to ensure uniqueness across documents. Nokogiri has specific ways of handling XML namespaces when querying documents.
Here's how Nokogiri handles XML namespaces when querying documents:
Dealing with Namespaces
When Nokogiri parses an XML document, it keeps track of all the namespaces that are defined. To query elements that are within a namespace, you have to specify the namespace when using XPath or CSS selectors.
Using XPath
When using XPath to query namespaced elements, you can register a prefix with the Nokogiri::XML::Document#xpath
method and then use that prefix in your XPath expressions.
Here's an example:
require 'nokogiri'
xml_str = <<-XML
<root xmlns:foo="http://example.com/foo">
<foo:bar>Hello World</foo:bar>
</root>
XML
doc = Nokogiri::XML(xml_str)
# Register the namespace prefix 'f' for the URI
doc.xpath('//f:bar', 'f' => 'http://example.com/foo').each do |node|
puts node.content
end
This will output:
Hello World
Using CSS
When using CSS selectors, you can query elements with namespaces by using the |
(pipe) symbol to separate the namespace prefix and the element name. However, you need to define the namespace mappings first with the Nokogiri::XML::Document#css
method.
Here's an example:
require 'nokogiri'
xml_str = <<-XML
<root xmlns:foo="http://example.com/foo">
<foo:bar>Baz</foo:bar>
</root>
XML
doc = Nokogiri::XML(xml_str)
# Nokogiri allows CSS selectors on XML documents, but namespaces need to be declared
doc.css('foo|bar', 'foo' => 'http://example.com/foo').each do |node|
puts node.content
end
This will output:
Baz
Ignoring Namespaces
Sometimes, you might want to ignore namespaces and just query the elements by their local name. Nokogiri provides a way to do this using the local-name()
XPath function.
Here's an example:
require 'nokogiri'
xml_str = <<-XML
<root xmlns:foo="http://example.com/foo">
<foo:bar>Qux</foo:bar>
</root>
XML
doc = Nokogiri::XML(xml_str)
# Ignore the namespace and select all 'bar' elements
doc.xpath('//*[local-name()="bar"]').each do |node|
puts node.content
end
This will output:
Qux
Default Namespaces
If an XML element is defined with a default namespace (without a prefix), querying it can be a bit tricky because CSS selectors do not understand default namespaces. You will need to assign a prefix and use that in your XPath queries.
Here's an example:
require 'nokogiri'
xml_str = <<-XML
<root xmlns="http://example.com/default">
<bar>Default Namespace</bar>
</root>
XML
doc = Nokogiri::XML(xml_str)
# Assign a prefix 'd' to the default namespace and use it in XPath
doc.xpath('//d:bar', 'd' => 'http://example.com/default').each do |node|
puts node.content
end
This will output:
Default Namespace
In summary, Nokogiri provides flexible ways to handle XML namespaces. You can specify namespaces with prefixes when using XPath or CSS selectors, or you can choose to ignore them and query by local names. When dealing with default namespaces, you will need to assign a prefix for querying.