Nokogiri is a powerful Ruby library for parsing HTML and XML documents. To search for elements with a specific CSS class, you can use the .css
method, which accepts standard CSS selectors and returns a NodeSet of matching elements.
Basic Setup
First, install and require Nokogiri:
# Install the gem
gem install nokogiri
# Or add to your Gemfile
gem 'nokogiri'
require 'nokogiri'
Finding Elements by CSS Class
Simple Class Selection
Use the .css
method with a class selector (dot notation):
# Parse HTML document
html = '<div class="highlight">Important content</div>'
doc = Nokogiri::HTML(html)
# Find elements with specific class
elements = doc.css('.highlight')
puts elements.first.text # Output: "Important content"
Multiple Classes
Search for elements that have multiple classes:
html = <<-HTML
<div class="card featured">Featured Card</div>
<div class="card">Regular Card</div>
<div class="featured">Featured Item</div>
HTML
doc = Nokogiri::HTML(html)
# Elements with both 'card' and 'featured' classes
elements = doc.css('.card.featured')
puts elements.first.text # Output: "Featured Card"
Class with Element Type
Combine element type with class selection:
html = <<-HTML
<p class="warning">Paragraph warning</p>
<div class="warning">Div warning</div>
HTML
doc = Nokogiri::HTML(html)
# Only paragraph elements with 'warning' class
paragraphs = doc.css('p.warning')
puts paragraphs.first.text # Output: "Paragraph warning"
Advanced CSS Selectors
Descendant Selectors
Find elements with a class inside other elements:
html = <<-HTML
<article class="post">
<h2 class="title">Post Title</h2>
<p class="content">Post content here</p>
</article>
HTML
doc = Nokogiri::HTML(html)
# Find .title elements inside .post elements
titles = doc.css('.post .title')
puts titles.first.text # Output: "Post Title"
Attribute Selectors with Classes
Combine class selectors with other attributes:
html = <<-HTML
<input class="form-control" type="text" name="username">
<input class="form-control" type="password" name="password">
HTML
doc = Nokogiri::HTML(html)
# Find form-control elements with specific type
text_inputs = doc.css('.form-control[type="text"]')
puts text_inputs.first['name'] # Output: "username"
Practical Web Scraping Example
Here's a comprehensive example that demonstrates searching for elements by class in a real-world scenario:
require 'nokogiri'
require 'open-uri'
# Sample HTML structure similar to an e-commerce site
html_content = <<-HTML
<html>
<body>
<div class="product-grid">
<div class="product-card featured">
<h3 class="product-title">Premium Laptop</h3>
<span class="price">$1299.99</span>
<div class="rating stars-5">★★★★★</div>
</div>
<div class="product-card">
<h3 class="product-title">Budget Phone</h3>
<span class="price sale">$199.99</span>
<div class="rating stars-4">★★★★☆</div>
</div>
<div class="product-card out-of-stock">
<h3 class="product-title">Gaming Console</h3>
<span class="price">$499.99</span>
<div class="rating stars-5">★★★★★</div>
</div>
</div>
</body>
</html>
HTML
doc = Nokogiri::HTML(html_content)
# Find all product cards
products = doc.css('.product-card')
puts "Found #{products.length} products"
# Find only featured products
featured_products = doc.css('.product-card.featured')
puts "Featured products: #{featured_products.length}"
# Extract product information
products.each_with_index do |product, index|
title = product.css('.product-title').text.strip
price = product.css('.price').text.strip
rating_class = product.css('.rating').first['class']
puts "Product #{index + 1}:"
puts " Title: #{title}"
puts " Price: #{price}"
puts " Rating: #{rating_class}"
puts " In Stock: #{!product['class'].include?('out-of-stock')}"
puts
end
# Find products on sale
sale_products = doc.css('.price.sale')
puts "Products on sale: #{sale_products.length}"
Working with NodeSet Results
The .css
method returns a Nokogiri::XML::NodeSet
, which behaves like an array:
html = '<div class="item">Item 1</div><div class="item">Item 2</div>'
doc = Nokogiri::HTML(html)
items = doc.css('.item')
# Check if any elements were found
puts "Found items: #{items.any?}"
# Get count
puts "Number of items: #{items.length}"
# Access first/last elements
puts "First item: #{items.first.text}"
puts "Last item: #{items.last.text}"
# Convert to array if needed
items_array = items.to_a
# Check if element exists before accessing
if item = items.first
puts "Content: #{item.text}"
puts "HTML: #{item.to_html}"
end
Error Handling and Best Practices
require 'nokogiri'
def safe_css_search(doc, selector)
elements = doc.css(selector)
if elements.empty?
puts "No elements found for selector: #{selector}"
return []
end
elements
rescue => e
puts "Error searching for #{selector}: #{e.message}"
[]
end
# Usage
html = '<div class="content">Hello World</div>'
doc = Nokogiri::HTML(html)
# Safe searching
elements = safe_css_search(doc, '.content')
elements.each { |el| puts el.text } if elements.any?
# Handle malformed HTML gracefully
malformed_html = '<div class="test">Unclosed div'
doc = Nokogiri::HTML(malformed_html)
puts doc.css('.test').first.text # Still works: "Unclosed div"
Alternative: XPath Method
While CSS selectors are more intuitive, you can also use XPath for class searching:
html = '<div class="highlight important">Text</div>'
doc = Nokogiri::HTML(html)
# CSS selector (recommended)
css_result = doc.css('.highlight')
# XPath equivalent
xpath_result = doc.xpath("//div[@class='highlight important']")
xpath_contains = doc.xpath("//div[contains(@class, 'highlight')]")
puts css_result.first.text # "Text"
puts xpath_result.first.text # "Text"
puts xpath_contains.first.text # "Text"
The .css
method provides a clean, readable way to search HTML documents using familiar CSS selector syntax, making it ideal for web scraping and document parsing tasks in Ruby.