Nokogiri provides powerful methods to navigate HTML/XML documents by selecting sibling and parent elements. This guide covers all the essential techniques for traversing the DOM tree effectively.
Selecting Sibling Elements
Direct Sibling Navigation Methods
Nokogiri offers several methods to navigate between sibling elements:
require 'nokogiri'
html = <<-HTML
<div class="container">
<h1>Title</h1>
<p class="target">Target paragraph</p>
<span>Next element</span>
<div>Another div</div>
</div>
HTML
doc = Nokogiri::HTML(html)
target = doc.at_css('.target')
# Navigate to next/previous siblings
next_sibling = target.next_element # <span>Next element</span>
previous_sibling = target.previous_element # <h1>Title</h1>
# Include text nodes (whitespace, line breaks)
next_node = target.next_sibling # May include text nodes
previous_node = target.previous_sibling # May include text nodes
Finding All Siblings
# Get all following siblings
following_siblings = []
current = target.next_element
while current
following_siblings << current
current = current.next_element
end
# Get all preceding siblings
preceding_siblings = []
current = target.previous_element
while current
preceding_siblings.unshift(current)
current = current.previous_element
end
# All siblings (excluding the target element itself)
all_siblings = target.parent.children.reject { |child| child == target || child.text? }
CSS Selector Combinations for Siblings
# Select next sibling with specific criteria
next_div = target.next_element&.name == 'div' ? target.next_element : nil
# Using CSS combinators (general sibling selector)
# Note: This selects from document root, not relative to current element
following_divs = doc.css('.target ~ div')
# Adjacent sibling selector
adjacent_span = doc.css('.target + span')
XPath for Advanced Sibling Selection
# Following siblings with specific conditions
next_span = doc.at_xpath('.//p[@class="target"]/following-sibling::span[1]')
all_following_divs = doc.xpath('.//p[@class="target"]/following-sibling::div')
# Preceding siblings
previous_h1 = doc.at_xpath('.//p[@class="target"]/preceding-sibling::h1[1]')
all_preceding = doc.xpath('.//p[@class="target"]/preceding-sibling::*')
# Siblings with specific attributes
next_with_class = doc.at_xpath('.//p[@class="target"]/following-sibling::*[@class][1]')
Selecting Parent Elements
Direct Parent Access
html = <<-HTML
<article class="post">
<header>
<h1>Post Title</h1>
</header>
<section class="content">
<p class="highlight">Important text</p>
<div class="nested">
<span class="deep">Deep element</span>
</div>
</section>
</article>
HTML
doc = Nokogiri::HTML(html)
deep_element = doc.at_css('.deep')
# Direct parent
immediate_parent = deep_element.parent # <div class="nested">
# Navigate up multiple levels
section_parent = deep_element.parent.parent # <section class="content">
article_ancestor = deep_element.parent.parent.parent # <article class="post">
Finding Specific Ancestors
# Find first ancestor with specific tag
def find_ancestor_by_tag(element, tag_name)
current = element.parent
while current && current.name != tag_name
current = current.parent
end
current
end
section_ancestor = find_ancestor_by_tag(deep_element, 'section')
# Find first ancestor with specific class
def find_ancestor_by_class(element, class_name)
current = element.parent
while current
return current if current['class']&.include?(class_name)
current = current.parent
end
nil
end
post_ancestor = find_ancestor_by_class(deep_element, 'post')
XPath for Parent Selection
# Direct parent
parent = doc.at_xpath('.//span[@class="deep"]/..')
# Specific ancestor by tag name
section = doc.at_xpath('.//span[@class="deep"]/ancestor::section')
# First ancestor with specific attribute
article = doc.at_xpath('.//span[@class="deep"]/ancestor::*[@class="post"]')
# All ancestors
ancestors = doc.xpath('.//span[@class="deep"]/ancestor::*')
Practical Examples
Navigating Table Structures
html = <<-HTML
<table>
<tr>
<th>Name</th>
<th>Age</th>
<th>City</th>
</tr>
<tr>
<td class="name">John</td>
<td>25</td>
<td>New York</td>
</tr>
<tr>
<td class="name">Jane</td>
<td>30</td>
<td>London</td>
</tr>
</table>
HTML
doc = Nokogiri::HTML(html)
# Get age for a specific person
john_cell = doc.at_css('td:contains("John")')
john_age = john_cell.next_element.text # "25"
# Get all data for a row
jane_row = doc.at_css('td:contains("Jane")').parent
jane_data = jane_row.css('td').map(&:text) # ["Jane", "30", "London"]
# Get column headers
first_data_cell = doc.at_css('td')
table = first_data_cell.parent.parent # Navigate to table
headers = table.at_css('tr').css('th').map(&:text) # ["Name", "Age", "City"]
Form Field Navigation
html = <<-HTML
<form>
<div class="field-group">
<label for="email">Email:</label>
<input type="email" id="email" name="email">
<span class="error">Invalid email</span>
</div>
<div class="field-group">
<label for="password">Password:</label>
<input type="password" id="password" name="password">
</div>
</form>
HTML
doc = Nokogiri::HTML(html)
# Find error message for a specific input
email_input = doc.at_css('#email')
error_message = email_input.next_element # <span class="error">Invalid email</span>
# Find label for an input
label = email_input.previous_element # <label for="email">Email:</label>
# Get all related elements in a field group
field_group = email_input.parent
related_elements = field_group.children.reject(&:text?)
Error Handling and Best Practices
Safe Navigation
# Always check if elements exist before navigating
target = doc.at_css('.target')
if target
next_elem = target.next_element
puts next_elem.text if next_elem
end
# Using safe navigation operator (Ruby 2.3+)
next_text = doc.at_css('.target')&.next_element&.text
# Handle cases where navigation might go beyond document bounds
def safe_next_elements(element, count = 1)
elements = []
current = element
count.times do
current = current&.next_element
break unless current
elements << current
end
elements
end
Performance Considerations
# Cache frequently accessed elements
container = doc.at_css('.container')
children = container.children.reject(&:text?)
# Use more specific selectors when possible instead of navigation
# Less efficient: traverse siblings
target = doc.at_css('.item')
next_item = target.next_element
# More efficient: direct selection
next_item = doc.at_css('.item + *')
# For complex navigation, consider using XPath
specific_element = doc.at_xpath('.//div[@class="start"]/following-sibling::div[@class="target"][1]')
These navigation techniques allow you to efficiently traverse HTML/XML documents and access related elements programmatically. Choose the method that best fits your specific use case and document structure.