Table of contents

How can I search for elements with a specific CSS class using Nokogiri?

Nokogiri is a powerful Ruby library for parsing HTML and XML documents. To search for elements with a specific CSS class, you can use the .css method, which accepts standard CSS selectors and returns a NodeSet of matching elements.

Basic Setup

First, install and require Nokogiri:

# Install the gem
gem install nokogiri

# Or add to your Gemfile
gem 'nokogiri'
require 'nokogiri'

Finding Elements by CSS Class

Simple Class Selection

Use the .css method with a class selector (dot notation):

# Parse HTML document
html = '<div class="highlight">Important content</div>'
doc = Nokogiri::HTML(html)

# Find elements with specific class
elements = doc.css('.highlight')
puts elements.first.text  # Output: "Important content"

Multiple Classes

Search for elements that have multiple classes:

html = <<-HTML
<div class="card featured">Featured Card</div>
<div class="card">Regular Card</div>
<div class="featured">Featured Item</div>
HTML

doc = Nokogiri::HTML(html)

# Elements with both 'card' and 'featured' classes
elements = doc.css('.card.featured')
puts elements.first.text  # Output: "Featured Card"

Class with Element Type

Combine element type with class selection:

html = <<-HTML
<p class="warning">Paragraph warning</p>
<div class="warning">Div warning</div>
HTML

doc = Nokogiri::HTML(html)

# Only paragraph elements with 'warning' class
paragraphs = doc.css('p.warning')
puts paragraphs.first.text  # Output: "Paragraph warning"

Advanced CSS Selectors

Descendant Selectors

Find elements with a class inside other elements:

html = <<-HTML
<article class="post">
  <h2 class="title">Post Title</h2>
  <p class="content">Post content here</p>
</article>
HTML

doc = Nokogiri::HTML(html)

# Find .title elements inside .post elements
titles = doc.css('.post .title')
puts titles.first.text  # Output: "Post Title"

Attribute Selectors with Classes

Combine class selectors with other attributes:

html = <<-HTML
<input class="form-control" type="text" name="username">
<input class="form-control" type="password" name="password">
HTML

doc = Nokogiri::HTML(html)

# Find form-control elements with specific type
text_inputs = doc.css('.form-control[type="text"]')
puts text_inputs.first['name']  # Output: "username"

Practical Web Scraping Example

Here's a comprehensive example that demonstrates searching for elements by class in a real-world scenario:

require 'nokogiri'
require 'open-uri'

# Sample HTML structure similar to an e-commerce site
html_content = <<-HTML
<html>
<body>
  <div class="product-grid">
    <div class="product-card featured">
      <h3 class="product-title">Premium Laptop</h3>
      <span class="price">$1299.99</span>
      <div class="rating stars-5">★★★★★</div>
    </div>
    <div class="product-card">
      <h3 class="product-title">Budget Phone</h3>
      <span class="price sale">$199.99</span>
      <div class="rating stars-4">★★★★☆</div>
    </div>
    <div class="product-card out-of-stock">
      <h3 class="product-title">Gaming Console</h3>
      <span class="price">$499.99</span>
      <div class="rating stars-5">★★★★★</div>
    </div>
  </div>
</body>
</html>
HTML

doc = Nokogiri::HTML(html_content)

# Find all product cards
products = doc.css('.product-card')
puts "Found #{products.length} products"

# Find only featured products
featured_products = doc.css('.product-card.featured')
puts "Featured products: #{featured_products.length}"

# Extract product information
products.each_with_index do |product, index|
  title = product.css('.product-title').text.strip
  price = product.css('.price').text.strip
  rating_class = product.css('.rating').first['class']

  puts "Product #{index + 1}:"
  puts "  Title: #{title}"
  puts "  Price: #{price}"
  puts "  Rating: #{rating_class}"
  puts "  In Stock: #{!product['class'].include?('out-of-stock')}"
  puts
end

# Find products on sale
sale_products = doc.css('.price.sale')
puts "Products on sale: #{sale_products.length}"

Working with NodeSet Results

The .css method returns a Nokogiri::XML::NodeSet, which behaves like an array:

html = '<div class="item">Item 1</div><div class="item">Item 2</div>'
doc = Nokogiri::HTML(html)
items = doc.css('.item')

# Check if any elements were found
puts "Found items: #{items.any?}"

# Get count
puts "Number of items: #{items.length}"

# Access first/last elements
puts "First item: #{items.first.text}"
puts "Last item: #{items.last.text}"

# Convert to array if needed
items_array = items.to_a

# Check if element exists before accessing
if item = items.first
  puts "Content: #{item.text}"
  puts "HTML: #{item.to_html}"
end

Error Handling and Best Practices

require 'nokogiri'

def safe_css_search(doc, selector)
  elements = doc.css(selector)

  if elements.empty?
    puts "No elements found for selector: #{selector}"
    return []
  end

  elements
rescue => e
  puts "Error searching for #{selector}: #{e.message}"
  []
end

# Usage
html = '<div class="content">Hello World</div>'
doc = Nokogiri::HTML(html)

# Safe searching
elements = safe_css_search(doc, '.content')
elements.each { |el| puts el.text } if elements.any?

# Handle malformed HTML gracefully
malformed_html = '<div class="test">Unclosed div'
doc = Nokogiri::HTML(malformed_html)
puts doc.css('.test').first.text  # Still works: "Unclosed div"

Alternative: XPath Method

While CSS selectors are more intuitive, you can also use XPath for class searching:

html = '<div class="highlight important">Text</div>'
doc = Nokogiri::HTML(html)

# CSS selector (recommended)
css_result = doc.css('.highlight')

# XPath equivalent
xpath_result = doc.xpath("//div[@class='highlight important']")
xpath_contains = doc.xpath("//div[contains(@class, 'highlight')]")

puts css_result.first.text      # "Text"
puts xpath_result.first.text    # "Text"
puts xpath_contains.first.text  # "Text"

The .css method provides a clean, readable way to search HTML documents using familiar CSS selector syntax, making it ideal for web scraping and document parsing tasks in Ruby.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon