How do I select sibling or parent elements using Nokogiri?

Nokogiri provides powerful methods to navigate HTML/XML documents by selecting sibling and parent elements. This guide covers all the essential techniques for traversing the DOM tree effectively.

Selecting Sibling Elements

Direct Sibling Navigation Methods

Nokogiri offers several methods to navigate between sibling elements:

require 'nokogiri'

html = <<-HTML
<div class="container">
  <h1>Title</h1>
  <p class="target">Target paragraph</p>
  <span>Next element</span>
  <div>Another div</div>
</div>
HTML

doc = Nokogiri::HTML(html)
target = doc.at_css('.target')

# Navigate to next/previous siblings
next_sibling = target.next_element        # <span>Next element</span>
previous_sibling = target.previous_element # <h1>Title</h1>

# Include text nodes (whitespace, line breaks)
next_node = target.next_sibling     # May include text nodes
previous_node = target.previous_sibling # May include text nodes

Finding All Siblings

# Get all following siblings
following_siblings = []
current = target.next_element
while current
  following_siblings << current
  current = current.next_element
end

# Get all preceding siblings
preceding_siblings = []
current = target.previous_element
while current
  preceding_siblings.unshift(current)
  current = current.previous_element
end

# All siblings (excluding the target element itself)
all_siblings = target.parent.children.reject { |child| child == target || child.text? }

CSS Selector Combinations for Siblings

# Select next sibling with specific criteria
next_div = target.next_element&.name == 'div' ? target.next_element : nil

# Using CSS combinators (general sibling selector)
# Note: This selects from document root, not relative to current element
following_divs = doc.css('.target ~ div')

# Adjacent sibling selector
adjacent_span = doc.css('.target + span')

XPath for Advanced Sibling Selection

# Following siblings with specific conditions
next_span = doc.at_xpath('.//p[@class="target"]/following-sibling::span[1]')
all_following_divs = doc.xpath('.//p[@class="target"]/following-sibling::div')

# Preceding siblings
previous_h1 = doc.at_xpath('.//p[@class="target"]/preceding-sibling::h1[1]')
all_preceding = doc.xpath('.//p[@class="target"]/preceding-sibling::*')

# Siblings with specific attributes
next_with_class = doc.at_xpath('.//p[@class="target"]/following-sibling::*[@class][1]')

Selecting Parent Elements

Direct Parent Access

html = <<-HTML
<article class="post">
  <header>
    <h1>Post Title</h1>
  </header>
  <section class="content">
    <p class="highlight">Important text</p>
    <div class="nested">
      <span class="deep">Deep element</span>
    </div>
  </section>
</article>
HTML

doc = Nokogiri::HTML(html)
deep_element = doc.at_css('.deep')

# Direct parent
immediate_parent = deep_element.parent # <div class="nested">

# Navigate up multiple levels
section_parent = deep_element.parent.parent # <section class="content">
article_ancestor = deep_element.parent.parent.parent # <article class="post">

Finding Specific Ancestors

# Find first ancestor with specific tag
def find_ancestor_by_tag(element, tag_name)
  current = element.parent
  while current && current.name != tag_name
    current = current.parent
  end
  current
end

section_ancestor = find_ancestor_by_tag(deep_element, 'section')

# Find first ancestor with specific class
def find_ancestor_by_class(element, class_name)
  current = element.parent
  while current
    return current if current['class']&.include?(class_name)
    current = current.parent
  end
  nil
end

post_ancestor = find_ancestor_by_class(deep_element, 'post')

XPath for Parent Selection

# Direct parent
parent = doc.at_xpath('.//span[@class="deep"]/..')

# Specific ancestor by tag name
section = doc.at_xpath('.//span[@class="deep"]/ancestor::section')

# First ancestor with specific attribute
article = doc.at_xpath('.//span[@class="deep"]/ancestor::*[@class="post"]')

# All ancestors
ancestors = doc.xpath('.//span[@class="deep"]/ancestor::*')

Practical Examples

Navigating Table Structures

html = <<-HTML
<table>
  <tr>
    <th>Name</th>
    <th>Age</th>
    <th>City</th>
  </tr>
  <tr>
    <td class="name">John</td>
    <td>25</td>
    <td>New York</td>
  </tr>
  <tr>
    <td class="name">Jane</td>
    <td>30</td>
    <td>London</td>
  </tr>
</table>
HTML

doc = Nokogiri::HTML(html)

# Get age for a specific person
john_cell = doc.at_css('td:contains("John")')
john_age = john_cell.next_element.text # "25"

# Get all data for a row
jane_row = doc.at_css('td:contains("Jane")').parent
jane_data = jane_row.css('td').map(&:text) # ["Jane", "30", "London"]

# Get column headers
first_data_cell = doc.at_css('td')
table = first_data_cell.parent.parent # Navigate to table
headers = table.at_css('tr').css('th').map(&:text) # ["Name", "Age", "City"]

Form Field Navigation

html = <<-HTML
<form>
  <div class="field-group">
    <label for="email">Email:</label>
    <input type="email" id="email" name="email">
    <span class="error">Invalid email</span>
  </div>
  <div class="field-group">
    <label for="password">Password:</label>
    <input type="password" id="password" name="password">
  </div>
</form>
HTML

doc = Nokogiri::HTML(html)

# Find error message for a specific input
email_input = doc.at_css('#email')
error_message = email_input.next_element # <span class="error">Invalid email</span>

# Find label for an input
label = email_input.previous_element # <label for="email">Email:</label>

# Get all related elements in a field group
field_group = email_input.parent
related_elements = field_group.children.reject(&:text?)

Error Handling and Best Practices

Safe Navigation

# Always check if elements exist before navigating
target = doc.at_css('.target')
if target
  next_elem = target.next_element
  puts next_elem.text if next_elem
end

# Using safe navigation operator (Ruby 2.3+)
next_text = doc.at_css('.target')&.next_element&.text

# Handle cases where navigation might go beyond document bounds
def safe_next_elements(element, count = 1)
  elements = []
  current = element
  count.times do
    current = current&.next_element
    break unless current
    elements << current
  end
  elements
end

Performance Considerations

# Cache frequently accessed elements
container = doc.at_css('.container')
children = container.children.reject(&:text?)

# Use more specific selectors when possible instead of navigation
# Less efficient: traverse siblings
target = doc.at_css('.item')
next_item = target.next_element

# More efficient: direct selection
next_item = doc.at_css('.item + *')

# For complex navigation, consider using XPath
specific_element = doc.at_xpath('.//div[@class="start"]/following-sibling::div[@class="target"][1]')

These navigation techniques allow you to efficiently traverse HTML/XML documents and access related elements programmatically. Choose the method that best fits your specific use case and document structure.

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon