HTTParty is a Ruby library that makes HTTP requests, such as GET, POST, PUT, and DELETE, much simpler to perform from Ruby applications. It is commonly used for consuming RESTful APIs or making general HTTP requests.
However, HTTParty alone does not provide built-in support for parsing HTML or XML with XPath or CSS selectors. To use XPath or CSS selectors for data extraction in combination with HTTParty, you would need to use an additional library such as Nokogiri, which is a Ruby gem for parsing HTML and XML.
Here's how you can combine HTTParty with Nokogiri to extract data using XPath or CSS selectors:
- Install the required gems:
gem install httparty
gem install nokogiri
- Use HTTParty to make a request to the web page you want to scrape:
require 'httparty'
require 'nokogiri'
# Make a GET request to the desired URL
response = HTTParty.get('https://example.com')
# Check if the request was successful
if response.code == 200
# Parse the response body with Nokogiri
document = Nokogiri::HTML(response.body)
# Extract data using CSS selectors
css_elements = document.css('div.example')
css_elements.each do |element|
puts element.text.strip
end
# Extract data using XPath
xpath_elements = document.xpath('//div[@class="example"]')
xpath_elements.each do |element|
puts element.text.strip
end
else
puts "Failed to retrieve web page: #{response.code}"
end
In this example, HTTParty.get
is used to fetch the web page's HTML, and then Nokogiri::HTML
is used to parse the HTML content. After that, you can use either document.css
with a CSS selector or document.xpath
with an XPath expression to select the elements you're interested in and extract their data.
Keep in mind that web scraping should always be performed responsibly and in compliance with the terms of service of the website you're scraping. Some websites explicitly prohibit scraping in their terms of service, and others may implement measures to prevent it. Always check the robots.txt
file and the terms of service of any website before scraping it.