Web scrapers often use custom or predefined user-agent strings to identify themselves when making HTTP requests to web servers. A user-agent string typically provides information about the client initiating the request, such as the browser type, operating system, and device. It's important to note that using a user-agent string of a popular browser can help to blend in with regular traffic, but it should be done responsibly and in accordance with the website's terms of service and robots.txt file.
Here are some common user-agent strings that Rust web scrapers may use:
Google Chrome on Windows 10:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36
Mozilla Firefox on Windows 10:
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0
Safari on macOS:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15
Google Chrome on Android:
Mozilla/5.0 (Linux; Android 11; Pixel 4 XL) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.58 Mobile Safari/537.36
Safari on iOS (iPhone):
Mozilla/5.0 (iPhone; CPU iPhone OS 15_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1
When developing a Rust web scraper, you can set the user-agent string in your HTTP client. For example, if you are using the popular reqwest
crate, you could set the user-agent like this:
use reqwest::header::{HeaderMap, USER_AGENT};
#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
let mut headers = HeaderMap::new();
headers.insert(USER_AGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36".parse().unwrap());
let client = reqwest::Client::builder()
.default_headers(headers)
.build()?;
let res = client.get("https://www.example.com").send().await?;
println!("Status: {}", res.status());
Ok(())
}
Remember to use the user-agent string responsibly and ensure that your scraping activities are in compliance with the target website's scraping policy. Changing the user-agent regularly can help avoid being blocked by the server, but this should be done judiciously and ethically.
For a more extensive list of user-agent strings, you can visit websites like https://developers.whatismybrowser.com/useragents/explore/
, which provide a database of user-agent strings used by various devices and browsers.