What methods does Scraper (Rust) provide for selecting elements?

scraper is a Rust crate for parsing HTML documents and querying elements within them, similar to libraries like BeautifulSoup in Python or Nokogiri in Ruby. It is built on top of html5ever and selectors crates, which are part of the Servo project. The scraper crate provides a simple yet powerful interface to select and manipulate HTML elements using CSS selectors.

Here are some of the methods that scraper provides for selecting elements:

  1. Selecting elements with select: The select method is used on an ElementRef or a Selector to find all descendant elements that match a CSS selector.
use scraper::{Html, Selector};

fn main() {
    let html = r#"<div><p>Foo</p><p>Bar</p></div>"#;
    let document = Html::parse_document(html);
    let selector = Selector::parse("p").unwrap();

    for element in document.select(&selector) {
        println!("{}", element.inner_html());
    }
}
  1. Selecting a single element with select_first: To select the first matching element, you can use the select_first method. It returns an Option<ElementRef> since there may be no matching element.
// Assuming html and document are defined as in the previous example
let first_p = document.select(&selector).next();

if let Some(element) = first_p {
    println!("The first <p> element is: {}", element.inner_html());
}
  1. Getting an element's text with text: To get the text content of an element, you can use the text method, which returns an iterator over the text nodes.
// Assuming element is an ElementRef as obtained in previous examples
for text_node in element.text() {
    println!("{}", text_node);
}
  1. Navigating with parent, next_sibling, and prev_sibling: To navigate the HTML tree, you can use the parent, next_sibling, and prev_sibling methods provided by ElementRef.
// Assuming element is an ElementRef as obtained in previous examples
if let Some(parent) = element.parent() {
    println!("Parent HTML: {}", parent.html());
}

if let Some(next_sibling) = element.next_sibling() {
    println!("Next sibling HTML: {}", next_sibling.html());
}

if let Some(prev_sibling) = element.prev_sibling() {
    println!("Previous sibling HTML: {}", prev_sibling.html());
}
  1. Accessing element attributes with value: To get the value of an attribute, use the value method on an ElementRef.
// Assuming element is an ElementRef as obtained in previous examples
if let Some(class_attr) = element.value().attr("class") {
    println!("Class attribute: {}", class_attr);
}

These are some of the core methods provided by the scraper crate for selecting and navigating HTML elements in Rust. The crate also provides methods for creating and manipulating Html and ElementRef objects, as well as for serializing them back to HTML strings, among other utilities.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon