Can Nokogiri parse and extract information from RSS or Atom feeds?

Yes, Nokogiri can parse and extract information from RSS or Atom feeds. Nokogiri is a popular Ruby library for parsing HTML and XML, and since RSS and Atom feeds are XML-based formats, Nokogiri is well-suited for this task.

Here's an example of how you could use Nokogiri to parse an RSS feed and extract some basic information from it:

require 'nokogiri'
require 'open-uri'

# URL of the RSS or Atom feed
feed_url = 'http://example.com/feed.xml'

# Open and read the feed
xml_content = open(feed_url).read

# Parse the feed content with Nokogiri
doc = Nokogiri::XML(xml_content)

# Extract information from the feed
doc.xpath('//item').each do |item|
  title = item.xpath('title').text
  link = item.xpath('link').text
  description = item.xpath('description').text

  puts "Title: #{title}"
  puts "Link: #{link}"
  puts "Description: #{description}"
  puts "---"
end

In this example, we're using open-uri to fetch the feed and Nokogiri::XML to parse it. We then navigate through the XML structure using XPath queries to extract the title, link, and description of each item in the feed.

Please replace 'http://example.com/feed.xml' with the actual URL of the RSS or Atom feed you want to parse.

Remember to handle exceptions and errors that might occur when fetching or parsing the feed, such as network errors or invalid XML content.

Also, keep in mind that the structure of RSS and Atom feeds can vary slightly, so you may need to adjust the XPath expressions based on the specific format and structure of the feed you're working with.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon