Can I use Scraper (Rust) with Rust's async/await syntax?

Yes, you can use the scraper crate in combination with Rust's async/await syntax, but you will need to use an asynchronous HTTP client because scraper itself does not perform any network operations. It's used for parsing and querying HTML, usually with selectors in the style of CSS.

To fetch HTML content asynchronously, you can use an async HTTP client like reqwest which supports async/await. Here is an example of how you might use scraper with reqwest to perform web scraping asynchronously:

First, add the necessary dependencies to your Cargo.toml:

[dependencies]
scraper = "0.12"
reqwest = { version = "0.11", features = ["json"] }
tokio = { version = "1", features = ["full"] }

Now you can write an async function that fetches an HTML document and uses scraper to parse it:

use scraper::{Html, Selector};
use reqwest;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // The URL you want to scrape
    let url = "http://example.com";

    // Fetch the HTML content using reqwest
    let resp = reqwest::get(url).await?;
    assert!(resp.status().is_success());

    let body = resp.text().await?;

    // Parse the HTML using scraper
    let document = Html::parse_document(&body);

    // Create a Selector for parsing
    let selector = Selector::parse("h1").unwrap();

    // Iterate over elements matching the selector
    for element in document.select(&selector) {
        let text = element.text().collect::<Vec<_>>();
        println!("{:?}", text);
    }

    Ok(())
}

In this example:

  • We are using the tokio runtime to execute the async code.
  • The reqwest crate is used to perform an async GET request to fetch the HTML page.
  • We parse the body of the response with scraper by creating a Html document.
  • We create a Selector to find h1 tags in the HTML.
  • We iterate over the elements matched by the selector and print their text content.

Remember to match the version numbers in the Cargo.toml to the latest versions that are compatible with each other to ensure that you have the latest features and fixes.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon