Scraper is a web scraping library for Rust, which is designed to make it easy to parse and navigate HTML documents. It uses the reqwest
library for making HTTP requests, which handles SSL and HTTPS.
Here are some of the capabilities that Scraper (through the underlying reqwest
) offers in terms of SSL and HTTPS:
SSL/TLS Support:
reqwest
usesrustls
ornative-tls
as the TLS backend to provide secure transport over HTTPS. This means by default, all HTTPS traffic is encrypted using SSL/TLS.HTTPS Requests: Making HTTPS requests is transparent and requires no additional setup. Scraper automatically handles HTTPS URLs.
SSL Certificate Verification: By default,
reqwest
will verify SSL certificates using the webpki-roots, whichrustls
relies on. This helps in preventing Man-In-The-Middle (MITM) attacks.Custom Certificate Authorities (CAs): If you need to trust a custom CA or a self-signed certificate,
reqwest
allows you to add these to the set of roots thatrustls
will trust.Client Certificates: For mutual TLS (mTLS), where the client also presents a certificate to the server,
reqwest
can be configured with client certificates.Disabling SSL Verification: While not recommended for production use, SSL verification can be turned off in
reqwest
, which can be useful for debugging or dealing with certain test environments.
Here's a basic example of how you might use Scraper in Rust to perform an HTTPS GET request:
use scraper::{Html, Selector};
use reqwest;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create an instance of an HTTP client
let client = reqwest::Client::new();
// Perform a GET request to an HTTPS endpoint
let res = client.get("https://example.com")
.send()
.await?;
// Ensure we've received a successful response
assert!(res.status().is_success());
// Parse the body text as HTML
let body = res.text().await?;
let document = Html::parse_document(&body);
// Use a CSS selector to target elements
let selector = Selector::parse("h1").unwrap();
// Iterate over elements matching the selector
for element in document.select(&selector) {
// Do something with each element, e.g., print its text content
println!("{}", element.text().collect::<Vec<_>>().join(""));
}
Ok(())
}
In this example, because we are using the reqwest
client to perform an HTTPS request, SSL/TLS is automatically handled. If you needed to customize the SSL/TLS behavior (such as adding custom certificates or disabling verification), you would do so by configuring the reqwest::ClientBuilder
before creating the client instance.
Remember that handling SSL/TLS correctly is critical for the security of web scraping operations, so it's best to leave SSL verification enabled unless you have a specific reason to disable it and understand the risks involved.