HTTParty
is a Ruby gem that makes it easy to create HTTP requests. When web scraping, setting a user-agent can be critical as it identifies the client software originating the HTTP request to the server. Some websites may block requests that do not have a legitimate user-agent string.
Here is how you can set a user-agent in an HTTParty request:
First, ensure you have the HTTParty gem installed. If not, you can install it with the following command:
gem install httparty
Once installed, you can use HTTParty in your Ruby script like so:
require 'httparty'
url = 'https://example.com'
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
response = HTTParty.get(url, headers: { "User-Agent" => user_agent })
puts response.body
In the code above, we are setting the User-Agent
header by passing a hash to the headers
option in the HTTParty.get
method. This hash can include various headers, but we are specifically setting the User-Agent
to a string that represents a popular web browser.
It's important to use a realistic user-agent string to mimic a real web browser, as some sites may block known bots or scripts. You can find a list of user-agent strings from different browsers and devices on various websites, or you can use the user-agent string from your own web browser by inspecting the network requests it makes.
Always ensure that your web scraping activities are in compliance with the website's terms of service and the relevant laws and regulations, as setting a user-agent to mimic a browser could be seen as an attempt to circumvent access controls.