How to Use XPath Axes Like Following-Sibling and Preceding-Sibling

XPath axes are powerful navigation tools that allow you to traverse HTML documents in different directions from a context node. The following-sibling and preceding-sibling axes are particularly useful for web scraping scenarios where you need to navigate horizontally between elements at the same hierarchical level.

Understanding XPath Sibling Axes

XPath sibling axes operate on elements that share the same parent node. These axes are essential when you need to:

Extract data from table rows or columns
Navigate between form fields
Process lists or menu items
Handle dynamic content where element relationships matter more than absolute positions

Following-Sibling Axis

The following-sibling axis selects all siblings that appear after the current node in document order.

Syntax: following-sibling::node-test[predicate]

Preceding-Sibling Axis

The preceding-sibling axis selects all siblings that appear before the current node in document order.

Syntax: preceding-sibling::node-test[predicate]

Practical Examples with Code

HTML Structure for Examples

Let's work with this sample HTML structure:

<div class="product-info">
    <h2>Product Title</h2>
    <p class="price">$29.99</p>
    <p class="description">Product description here</p>
    <div class="rating">4.5 stars</div>
    <button class="add-to-cart">Add to Cart</button>
    <span class="availability">In Stock</span>
</div>

Python Examples with Selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Setup Chrome driver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

try:
    driver.get("https://example.com")

    # Find all elements following the price element
    following_elements = driver.find_elements(
        By.XPATH, 
        "//p[@class='price']/following-sibling::*"
    )

    print("Elements following the price:")
    for element in following_elements:
        print(f"Tag: {element.tag_name}, Text: {element.text}")

    # Find the first paragraph following the title
    next_paragraph = driver.find_element(
        By.XPATH,
        "//h2[text()='Product Title']/following-sibling::p[1]"
    )
    print(f"First paragraph after title: {next_paragraph.text}")

    # Find all elements preceding the rating
    preceding_elements = driver.find_elements(
        By.XPATH,
        "//div[@class='rating']/preceding-sibling::*"
    )

    print("Elements preceding the rating:")
    for element in preceding_elements:
        print(f"Tag: {element.tag_name}, Text: {element.text}")

    # Find the last element before the button
    last_before_button = driver.find_element(
        By.XPATH,
        "//button[@class='add-to-cart']/preceding-sibling::*[1]"
    )
    print(f"Element just before button: {last_before_button.text}")

finally:
    driver.quit()

Python with lxml

from lxml import html
import requests

# Fetch and parse HTML
response = requests.get("https://example.com")
tree = html.fromstring(response.content)

# Find following siblings of price element
following_siblings = tree.xpath("//p[@class='price']/following-sibling::*")
print("Following siblings of price:")
for sibling in following_siblings:
    print(f"Tag: {sibling.tag}, Text: {sibling.text_content().strip()}")

# Find preceding siblings of rating
preceding_siblings = tree.xpath("//div[@class='rating']/preceding-sibling::*")
print("Preceding siblings of rating:")
for sibling in preceding_siblings:
    print(f"Tag: {sibling.tag}, Text: {sibling.text_content().strip()}")

# More specific queries
next_two_siblings = tree.xpath("//p[@class='price']/following-sibling::*[position() <= 2]")
previous_paragraph = tree.xpath("//div[@class='rating']/preceding-sibling::p[last()]")

JavaScript Examples

// Using XPath in browser console or with libraries like Puppeteer

// Function to evaluate XPath
function getElementByXPath(xpath, contextNode = document) {
    return document.evaluate(
        xpath, 
        contextNode, 
        null, 
        XPathResult.FIRST_ORDERED_NODE_TYPE, 
        null
    ).singleNodeValue;
}

function getElementsByXPath(xpath, contextNode = document) {
    const result = [];
    const query = document.evaluate(
        xpath, 
        contextNode, 
        null, 
        XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, 
        null
    );

    for (let i = 0; i < query.snapshotLength; i++) {
        result.push(query.snapshotItem(i));
    }
    return result;
}

// Find following siblings
const followingSiblings = getElementsByXPath("//p[@class='price']/following-sibling::*");
console.log("Following siblings:", followingSiblings);

// Find preceding siblings
const precedingSiblings = getElementsByXPath("//div[@class='rating']/preceding-sibling::*");
console.log("Preceding siblings:", precedingSiblings);

// Find specific sibling by position
const secondFollowing = getElementByXPath("//p[@class='price']/following-sibling::*[2]");
console.log("Second following sibling:", secondFollowing);

Advanced Usage Patterns

Working with Tables

# Extract data from table rows using sibling axes
table_xpath_queries = [
    # Get all cells in the same row after finding a specific cell
    "//td[text()='Product A']/following-sibling::td",

    # Get the previous row's data
    "//tr[td[text()='Current Row']]/preceding-sibling::tr[1]/td",

    # Get specific column data from following rows
    "//tr[td[text()='Header']]/following-sibling::tr/td[2]"
]

for xpath in table_xpath_queries:
    elements = driver.find_elements(By.XPATH, xpath)
    print(f"XPath: {xpath}")
    for elem in elements:
        print(f"  Text: {elem.text}")

Form Field Navigation

# Navigate between form fields
form_navigation_examples = [
    # Find the label text for an input field
    "//input[@name='email']/preceding-sibling::label",

    # Find error message after an input
    "//input[@name='password']/following-sibling::span[@class='error']",

    # Get all form fields after a specific field
    "//input[@name='firstname']/following-sibling::input"
]

Advanced Techniques and Best Practices

Combining Axes with Predicates

# Complex XPath expressions combining axes and predicates
advanced_examples = [
    # Find the second paragraph following an h2 with specific text
    "//h2[contains(text(), 'Features')]/following-sibling::p[2]",

    # Find preceding sibling div with specific class
    "//button[@class='submit']/preceding-sibling::div[@class='form-group'][last()]",

    # Find following sibling that contains specific text
    "//span[@class='label']/following-sibling::*[contains(text(), 'Available')]",

    # Get all following siblings until a specific element
    "//h3[@class='section-title']/following-sibling::*[not(self::h3)]"
]

Performance Optimization

When using sibling axes, consider these performance tips:

Be specific with node tests: Use specific element names instead of * when possible
Limit scope with predicates: Use position predicates to limit results
Cache context nodes: Store frequently used context nodes in variables

# Optimized approach
price_element = driver.find_element(By.XPATH, "//p[@class='price']")
# Reuse the context element for multiple queries
description = driver.find_element(By.XPATH, "./following-sibling::p[@class='description']", price_element)
rating = driver.find_element(By.XPATH, "./following-sibling::div[@class='rating']", price_element)

Common Use Cases in Web Scraping

E-commerce Product Pages

def scrape_product_details(driver):
    """Extract product information using sibling navigation"""

    # Find product title and get related information
    title_element = driver.find_element(By.XPATH, "//h1[@class='product-title']")

    # Get price (usually follows title)
    price = driver.find_element(
        By.XPATH, 
        "//h1[@class='product-title']/following-sibling::*//span[@class='price']"
    ).text

    # Get description (often in next sibling paragraph)
    description = driver.find_element(
        By.XPATH,
        "//h1[@class='product-title']/following-sibling::p[1]"
    ).text

    # Get availability status
    availability = driver.find_element(
        By.XPATH,
        "//span[@class='price']/following-sibling::span[@class='stock-status']"
    ).text

    return {
        'title': title_element.text,
        'price': price,
        'description': description,
        'availability': availability
    }

News Article Processing

When working with dynamic content loading, you might need to handle AJAX requests using Puppeteer or wait for elements to load properly with Puppeteer's waitFor function.

def extract_article_metadata(driver):
    """Extract article metadata using sibling relationships"""

    # Find author and publication date that typically follow the title
    author = driver.find_element(
        By.XPATH,
        "//h1[@class='article-title']/following-sibling::div[@class='byline']//span[@class='author']"
    ).text

    date = driver.find_element(
        By.XPATH,
        "//span[@class='author']/following-sibling::time"
    ).get_attribute('datetime')

    # Get article tags that usually precede or follow content
    tags = [tag.text for tag in driver.find_elements(
        By.XPATH,
        "//div[@class='article-content']/following-sibling::div[@class='tags']//a"
    )]

    return {
        'author': author,
        'date': date,
        'tags': tags
    }

Troubleshooting Common Issues

Element Not Found Errors

def safe_sibling_extraction(driver, xpath):
    """Safely extract sibling elements with error handling"""
    try:
        elements = driver.find_elements(By.XPATH, xpath)
        if elements:
            return [elem.text for elem in elements]
        else:
            print(f"No elements found for XPath: {xpath}")
            return []
    except Exception as e:
        print(f"Error extracting elements: {e}")
        return []

# Usage
following_data = safe_sibling_extraction(
    driver, 
    "//p[@class='price']/following-sibling::*"
)

Dynamic Content Handling

For pages with dynamic content, consider waiting for elements to load:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Wait for sibling elements to be present
wait = WebDriverWait(driver, 10)
sibling_elements = wait.until(
    EC.presence_of_all_elements_located(
        (By.XPATH, "//div[@class='loaded-content']/following-sibling::div")
    )
)

Integration with Web Scraping APIs

When building scalable scraping solutions, you might need to combine XPath sibling navigation with robust scraping infrastructure. The WebScraping.AI API provides powerful XPath support for complex element selection and data extraction workflows.

import requests

# Example using WebScraping.AI API with XPath
api_url = "https://api.webscraping.ai/html"
params = {
    'url': 'https://example.com',
    'selector': '//p[@class="price"]/following-sibling::*',
    'selector_type': 'xpath'
}

response = requests.get(api_url, params=params)
selected_elements = response.json()

Conclusion

XPath sibling axes are powerful tools for navigating HTML documents horizontally, enabling precise element selection based on structural relationships. The following-sibling and preceding-sibling axes are particularly valuable in web scraping scenarios where you need to extract related data points or navigate between form elements.

Key takeaways: - Use sibling axes when element position relationships are more reliable than absolute paths - Combine axes with predicates for precise element selection - Consider performance implications and optimize XPath expressions - Implement proper error handling for robust scraping applications - Practice with different HTML structures to master these navigation techniques

By mastering XPath sibling navigation, you'll be able to create more flexible and maintainable web scraping solutions that can adapt to various HTML structures and layout changes.

Table of contents