How can I search for elements with a specific CSS class using Nokogiri?

Nokogiri is a powerful Ruby library for parsing HTML and XML documents. To search for elements with a specific CSS class, you can use the .css method, which accepts standard CSS selectors and returns a NodeSet of matching elements.

Basic Setup

First, install and require Nokogiri:

# Install the gem
gem install nokogiri

# Or add to your Gemfile
gem 'nokogiri'

require 'nokogiri'

Finding Elements by CSS Class

Simple Class Selection

Use the .css method with a class selector (dot notation):

# Parse HTML document
html = '<div class="highlight">Important content</div>'
doc = Nokogiri::HTML(html)

# Find elements with specific class
elements = doc.css('.highlight')
puts elements.first.text  # Output: "Important content"

Multiple Classes

Search for elements that have multiple classes:

html = <<-HTML
<div class="card featured">Featured Card</div>
<div class="card">Regular Card</div>
<div class="featured">Featured Item</div>
HTML

doc = Nokogiri::HTML(html)

# Elements with both 'card' and 'featured' classes
elements = doc.css('.card.featured')
puts elements.first.text  # Output: "Featured Card"

Class with Element Type

Combine element type with class selection:

html = <<-HTML
<p class="warning">Paragraph warning</p>
<div class="warning">Div warning</div>
HTML

doc = Nokogiri::HTML(html)

# Only paragraph elements with 'warning' class
paragraphs = doc.css('p.warning')
puts paragraphs.first.text  # Output: "Paragraph warning"

Advanced CSS Selectors

Descendant Selectors

Find elements with a class inside other elements:

html = <<-HTML
<article class="post">
  <h2 class="title">Post Title</h2>
  <p class="content">Post content here</p>
</article>
HTML

doc = Nokogiri::HTML(html)

# Find .title elements inside .post elements
titles = doc.css('.post .title')
puts titles.first.text  # Output: "Post Title"

Attribute Selectors with Classes

Combine class selectors with other attributes:

html = <<-HTML
<input class="form-control" type="text" name="username">
<input class="form-control" type="password" name="password">
HTML

doc = Nokogiri::HTML(html)

# Find form-control elements with specific type
text_inputs = doc.css('.form-control[type="text"]')
puts text_inputs.first['name']  # Output: "username"

Practical Web Scraping Example

Here's a comprehensive example that demonstrates searching for elements by class in a real-world scenario:

require 'nokogiri'
require 'open-uri'

# Sample HTML structure similar to an e-commerce site
html_content = <<-HTML
<html>
<body>
  <div class="product-grid">
    <div class="product-card featured">
      <h3 class="product-title">Premium Laptop</h3>
      <span class="price">$1299.99</span>
      <div class="rating stars-5">★★★★★</div>
    </div>
    <div class="product-card">
      <h3 class="product-title">Budget Phone</h3>
      <span class="price sale">$199.99</span>
      <div class="rating stars-4">★★★★☆</div>
    </div>
    <div class="product-card out-of-stock">
      <h3 class="product-title">Gaming Console</h3>
      <span class="price">$499.99</span>
      <div class="rating stars-5">★★★★★</div>
    </div>
  </div>
</body>
</html>
HTML

doc = Nokogiri::HTML(html_content)

# Find all product cards
products = doc.css('.product-card')
puts "Found #{products.length} products"

# Find only featured products
featured_products = doc.css('.product-card.featured')
puts "Featured products: #{featured_products.length}"

# Extract product information
products.each_with_index do |product, index|
  title = product.css('.product-title').text.strip
  price = product.css('.price').text.strip
  rating_class = product.css('.rating').first['class']

  puts "Product #{index + 1}:"
  puts "  Title: #{title}"
  puts "  Price: #{price}"
  puts "  Rating: #{rating_class}"
  puts "  In Stock: #{!product['class'].include?('out-of-stock')}"
  puts
end

# Find products on sale
sale_products = doc.css('.price.sale')
puts "Products on sale: #{sale_products.length}"

Working with NodeSet Results

The .css method returns a Nokogiri::XML::NodeSet, which behaves like an array:

html = '<div class="item">Item 1</div><div class="item">Item 2</div>'
doc = Nokogiri::HTML(html)
items = doc.css('.item')

# Check if any elements were found
puts "Found items: #{items.any?}"

# Get count
puts "Number of items: #{items.length}"

# Access first/last elements
puts "First item: #{items.first.text}"
puts "Last item: #{items.last.text}"

# Convert to array if needed
items_array = items.to_a

# Check if element exists before accessing
if item = items.first
  puts "Content: #{item.text}"
  puts "HTML: #{item.to_html}"
end

Error Handling and Best Practices

require 'nokogiri'

def safe_css_search(doc, selector)
  elements = doc.css(selector)

  if elements.empty?
    puts "No elements found for selector: #{selector}"
    return []
  end

  elements
rescue => e
  puts "Error searching for #{selector}: #{e.message}"
  []
end

# Usage
html = '<div class="content">Hello World</div>'
doc = Nokogiri::HTML(html)

# Safe searching
elements = safe_css_search(doc, '.content')
elements.each { |el| puts el.text } if elements.any?

# Handle malformed HTML gracefully
malformed_html = '<div class="test">Unclosed div'
doc = Nokogiri::HTML(malformed_html)
puts doc.css('.test').first.text  # Still works: "Unclosed div"

Alternative: XPath Method

While CSS selectors are more intuitive, you can also use XPath for class searching:

html = '<div class="highlight important">Text</div>'
doc = Nokogiri::HTML(html)

# CSS selector (recommended)
css_result = doc.css('.highlight')

# XPath equivalent
xpath_result = doc.xpath("//div[@class='highlight important']")
xpath_contains = doc.xpath("//div[contains(@class, 'highlight')]")

puts css_result.first.text      # "Text"
puts xpath_result.first.text    # "Text"
puts xpath_contains.first.text  # "Text"

The .css method provides a clean, readable way to search HTML documents using familiar CSS selector syntax, making it ideal for web scraping and document parsing tasks in Ruby.

Table of contents

How can I search for elements with a specific CSS class using Nokogiri?

Basic Setup

Finding Elements by CSS Class

Simple Class Selection

Multiple Classes

Class with Element Type

Advanced CSS Selectors

Descendant Selectors

Attribute Selectors with Classes

Practical Web Scraping Example

Working with NodeSet Results

Error Handling and Best Practices

Alternative: XPath Method

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

What is the syntax for using CSS selectors in Nokogiri?

How do I extract attributes from HTML elements using Nokogiri?

How do I remove nodes from a document with Nokogiri?

Get Started Now