What are some common user-agent strings to use with Scraper (Rust)?

When web scraping with any tool, including Scraper in Rust, it's important to use a user-agent string that represents a real browser. This helps to avoid detection as a bot, as many websites will block or limit access to non-browser user-agents. Here are some common user-agent strings you can use for your Scraper-based web scraping tasks:

Desktop Browsers

  • Google Chrome on Windows 10: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36

  • Mozilla Firefox on Windows 10: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0

  • Microsoft Edge on Windows 10: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36 Edg/88.0.705.50

  • Apple Safari on macOS: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15

Mobile Browsers

  • Google Chrome on Android: Mozilla/5.0 (Linux; Android 11; Pixel 4 XL) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Mobile Safari/537.36

  • Mozilla Firefox on Android: Mozilla/5.0 (Android 11; Mobile; rv:86.0) Gecko/86.0 Firefox/86.0

  • Apple Safari on iOS: Mozilla/5.0 (iPhone; CPU iPhone OS 14_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Mobile/15E148 Safari/604.1

Bots

  • Googlebot: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

  • Bingbot: Mozilla/5.0 (compatible; Bingbot/2.0; +http://www.bing.com/bingbot.htm)

How to Set a User-Agent in Scraper (Rust)

When using Scraper in Rust, you'll typically make HTTP requests with an additional crate such as reqwest. Here's an example of how to set a user-agent string in a Rust program:

use reqwest;
use scraper::{Html, Selector};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = reqwest::Client::builder()
        .user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36")
        .build()?;

    let response = client.get("http://example.com")
        .send()
        .await?;

    let body = response.text().await?;

    // Now you can use Scraper to parse `body`
    let document = Html::parse_document(&body);
    let selector = Selector::parse("a").unwrap();

    for element in document.select(&selector) {
        println!("{:?}", element.value().attr("href"));
    }

    Ok(())
}

In this example, we're using the reqwest crate to perform HTTP requests, specifying a user-agent string that represents Google Chrome on Windows 10. The response body is then parsed using Scraper.

Make sure to check the terms of service of the website you are scraping, as using a fake user-agent might be against their rules. Additionally, the legality of web scraping varies by jurisdiction, and it's important to scrape responsibly and ethically.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon