In Ruby, there are several methods and libraries available for making HTTP requests, which is a common task in web scraping. Below are some of the most popular methods:
Net::HTTP: This is a built-in Ruby library that allows you to perform HTTP requests.
require 'net/http' require 'uri' uri = URI('http://www.example.com/index.html') response = Net::HTTP.get(uri) puts response
For a more complex example with
Net::HTTP
, where you need to set headers or use other HTTP methods:require 'net/http' require 'uri' uri = URI('http://www.example.com/index.html') http = Net::HTTP.new(uri.host, uri.port) request = Net::HTTP::Get.new(uri.request_uri) request['User-Agent'] = 'Ruby' response = http.request(request) puts response.body
Open-URI: This is a simpler wrapper around
Net::HTTP
,Net::HTTPS
, andNet::FTP
. It's part of the standard library and can be used to easily fetch the content of a URL.require 'open-uri' content = open('http://www.example.com/index.html').read puts content
HTTParty: This is a gem that provides a nice interface to make HTTP requests. It's very popular in the Ruby community for its simplicity.
To use
HTTParty
, first install the gem:gem install httparty
Then, you can use it as follows:
require 'httparty' response = HTTParty.get('http://www.example.com/index.html') puts response.body
Faraday: This is a flexible HTTP client library that provides a uniform API over different adapters. You can switch between
Net::HTTP
,EM-HTTP-Request
,Excon
, and many others.To use
Faraday
, first install the gem:gem install faraday
Example usage:
require 'faraday' conn = Faraday.new(url: 'http://www.example.com') response = conn.get('/index.html') puts response.body
Mechanize: This library is particularly useful for web scraping as it simulates a web browser, handling cookies, sessions, and following redirects.
To use
Mechanize
, first install the gem:gem install mechanize
Example usage:
require 'mechanize' agent = Mechanize.new page = agent.get('http://www.example.com/index.html') puts page.body
Mechanize can also handle forms and links on web pages, making it very powerful for interactive scraping tasks.
When using any of these methods, be sure to respect the terms of service of the website you are scraping, handle your request rate to avoid overloading the server, and manage errors and exceptions that may occur during the request.