Reqwest is an HTTP client library in Rust, often used for making network requests. When combined with an HTML parsing library like scraper
or select
, it can be used for web scraping tasks. Here's how you can integrate Reqwest with these libraries:
Integrating Reqwest with scraper
The scraper
library is inspired by the Python library BeautifulSoup
and provides a simple API for HTML parsing and querying.
First, add the dependencies to your Cargo.toml
:
[dependencies]
reqwest = { version = "0.11", features = ["blocking"] }
scraper = "0.12"
Here's an example of how to use Reqwest with scraper
:
use reqwest;
use scraper::{Html, Selector};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Make a GET request
let body = reqwest::blocking::get("https://www.example.com")?.text()?;
// Parse the HTML
let document = Html::parse_document(&body);
// Create a Selector
let selector = Selector::parse("a").unwrap();
// Iterate over elements matching the selector
for element in document.select(&selector) {
// Extract the text or attribute value from the element
if let Some(href) = element.value().attr("href") {
println!("Found link: {}", href);
}
}
Ok(())
}
Integrating Reqwest with select
The select
library is another HTML parsing library that can be used with Reqwest for web scraping.
Add the dependencies to your Cargo.toml
:
[dependencies]
reqwest = { version = "0.11", features = ["blocking"] }
select = "0.5"
Here's an example using Reqwest with select
:
use reqwest;
use select::document::Document;
use select::predicate::Name;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Make a GET request
let res = reqwest::blocking::get("https://www.example.com")?;
let body = res.text()?;
// Parse the HTML
let document = Document::from(body.as_str());
// Iterate over elements matching the predicate
for node in document.find(Name("a")) {
// Extract the text or attribute value from the element
if let Some(href) = node.attr("href") {
println!("Found link: {}", href);
}
}
Ok(())
}
In both examples, we're using the blocking
feature of Reqwest, which provides a simple synchronous API. If you need to make asynchronous requests, you can use the async features of Reqwest by removing the blocking
feature and adapting the code to use async
functions and .await
.
Remember to handle errors properly in your real-world applications, and respect the robots.txt
rules of the websites you are scraping. Also, be aware of the legal and ethical implications of web scraping, and make sure you are in compliance with any relevant laws and terms of service.