How do I select elements by class or ID using Nokogiri in Ruby?

Nokogiri is a popular Ruby gem used for parsing HTML and XML. When you want to select elements by class or ID using Nokogiri, you'll be utilizing CSS selectors or XPath expressions.

Here's how to do it:

Selecting Elements by Class

To select elements by class, you can use the CSS selector .class-name where class-name is the name of the class you want to select.

require 'nokogiri'
require 'open-uri'

# Example HTML content
html_content = <<-HTML
<div class="user-profile">
  <p>Username: JohnDoe</p>
</div>
<div class="user-profile">
  <p>Username: JaneDoe</p>
</div>
HTML

# Parse the HTML content
doc = Nokogiri::HTML(html_content)

# Select elements with the class "user-profile"
user_profiles = doc.css('.user-profile')

# Iterate through the elements and print the content
user_profiles.each do |profile|
  puts profile.text.strip
end

Selecting Elements by ID

When you want to select an element by its ID, you can use the CSS selector #id where id is the ID of the element.

# Example HTML content with an ID
html_content = <<-HTML
<div id="header">
  <h1>Welcome to My Website</h1>
</div>
HTML

# Parse the HTML content
doc = Nokogiri::HTML(html_content)

# Select the element with the ID "header"
header = doc.css('#header')

# Print the content of the header
puts header.text.strip

Using XPath

Nokogiri also supports XPath expressions for selecting nodes. Here's how you can select elements by class or ID using XPath:

# Select elements by class using XPath
user_profiles = doc.xpath('//div[@class="user-profile"]')

# Iterate through the elements and print the content
user_profiles.each do |profile|
  puts profile.text.strip
end

# Select element by ID using XPath
header = doc.xpath('//div[@id="header"]')

# Print the content of the header
puts header.text.strip

When using XPath, you specify the attribute you're looking for in square brackets. For classes, you'd use [@class="class-name"], and for IDs, you'd use [@id="id"].

Conclusion

With Nokogiri, you have the flexibility to use either CSS selectors or XPath expressions to select elements by class or ID. The choice between CSS and XPath may come down to personal preference or specific use cases where one might be more suitable than the other. Both methods are powerful and can be used to navigate and manipulate parsed HTML/XML content effectively.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon