What is the syntax for using CSS selectors in Nokogiri?

Nokogiri is a popular Ruby library for parsing HTML and XML. It provides a simple way to navigate and manipulate these types of documents. To use CSS selectors with Nokogiri, you need to use the css method, which allows you to select elements in a similar way to how you would do it in a web browser with JavaScript or CSS itself.

Here's the basic syntax for using CSS selectors in Nokogiri:

require 'nokogiri'
require 'open-uri'

# Load the HTML document
html = open('http://example.com/')
doc = Nokogiri::HTML(html)

# Select elements using CSS selectors
elements = doc.css('selector')

# Example: Select all paragraph elements
paragraphs = doc.css('p')

# Example: Select elements with class 'example'
class_elements = doc.css('.example')

# Example: Select elements with id 'header'
id_elements = doc.css('#header')

# Example: Select all links within a list item
links_in_list = doc.css('li a')

The css method can also be called on any Nokogiri element, not just the document, to select descendants of that element.

# Select a specific div by id
specific_div = doc.css('#specific_div')

# Within that div, select elements with the class 'nested'
nested_elements = specific_div.css('.nested')

Nokogiri also allows you to use CSS pseudo-classes, which can be very powerful for selecting elements based on their state or position in the document.

# Select the first paragraph
first_paragraph = doc.css('p:first-of-type')

# Select all even rows in a table
even_rows = doc.css('tr:nth-child(even)')

Remember that CSS selectors in Nokogiri are case-sensitive, so you need to match the case of the element names, classes, and IDs in the HTML document.

Keep in mind that Nokogiri's css method returns a NodeSet, which is similar to an array and contains all the elements that match the selector. You can iterate over this NodeSet or access individual elements by index.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping