To extract attributes from HTML elements using the Scraper library in Rust, you first need to parse the HTML document with the Html
struct and then select elements using the Selector
struct. Once you have a reference to the desired element, you can use the attr
method to get the value of a specific attribute.
Here's a step-by-step example of how to do this:
- Add the Scraper library to your
Cargo.toml
file:
[dependencies]
scraper = "0.12.0"
- Use the following Rust code to parse an HTML document and extract attributes:
extern crate scraper;
use scraper::{Html, Selector};
fn main() {
// The HTML content to be scraped
let html_content = r#"
<html>
<body>
<a href="https://example.com" id="example-link">Example Link</a>
</body>
</html>
"#;
// Parse the HTML document
let document = Html::parse_document(html_content);
// Create a Selector to find the "a" elements
let selector = Selector::parse("a").unwrap();
// Iterate over elements matching the selector
for element in document.select(&selector) {
// Try to get the "href" attribute
if let Some(href) = element.value().attr("href") {
println!("The href attribute is: {}", href);
}
// Try to get the "id" attribute
if let Some(id) = element.value().attr("id") {
println!("The id attribute is: {}", id);
}
}
}
When you run this code, it will output:
The href attribute is: https://example.com
The id attribute is: example-link
The key steps in this example are:
- Parsing the HTML content into a
Html
object usingHtml::parse_document
. - Creating a
Selector
that defines which elements you're interested in. In this case, it's any<a>
element. - Using the
select
method on theHtml
object with theSelector
to get an iterator over the matching elements. - For each element found, using the
attr
method on the element's value to retrieve the value of a specific attribute by passing the attribute name as a string.
Remember to handle errors appropriately in a real-world scenario, such as when the Selector::parse
method fails or when an attribute does not exist on an element (the attr
method returns an Option
).