To extract all links from a webpage using Scraper, a web scraping library in Rust, you will first need to send a GET request to retrieve the HTML content of the page, and then use Scraper to parse the HTML and extract the href attributes of all <a>
tags.
Here are the steps to extract all links from a webpage using Scraper in Rust:
- Add dependencies to your
Cargo.toml
:
[dependencies]
scraper = "0.12"
reqwest = "0.11"
- Write the Rust code to perform the web scraping:
use scraper::{Html, Selector};
use reqwest;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// The URL of the webpage you want to scrape
let url = "https://example.com";
// Send a GET request to the URL to get the HTML content
let html = reqwest::get(url).await?.text().await?;
// Parse the HTML document
let document = Html::parse_document(&html);
// Create a Selector to find all <a> elements
let selector = Selector::parse("a").unwrap();
// Iterate over elements matching the selector
for element in document.select(&selector) {
// Try to get the href attribute
if let Some(href) = element.value().attr("href") {
println!("Link found: {}", href);
}
}
Ok(())
}
- Run the code using Cargo:
cargo run
This code snippet includes:
reqwest
for making HTTP requests.scraper
for parsing HTML and querying elements.- An async main function, since
reqwest
's methods are asynchronous.
Here's what each part does:
- The
reqwest::get(url).await?
fetches the HTML content of the givenurl
. Html::parse_document(&html)
parses the HTML content into a document that you can query.Selector::parse("a")
creates a selector to find all<a>
elements.document.select(&selector)
iterates over all matching elements.element.value().attr("href")
gets thehref
attribute from each<a>
element.
Remember to replace "https://example.com"
with the actual URL of the webpage you want to scrape. Also, be aware of the website's robots.txt
and terms of service to ensure that you're allowed to scrape it. Web scraping can have legal and ethical implications, so always scrape responsibly.