Table of contents

How do you select elements using Cheerio?

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server to parse, manipulate, and render HTML. It's the go-to tool for Node.js developers who need to scrape and manipulate HTML documents using familiar jQuery syntax.

Installation

First, install Cheerio via npm or yarn:

npm install cheerio
# or
yarn add cheerio

Basic Element Selection

Cheerio uses CSS selectors to target elements, just like jQuery. Here's the fundamental syntax:

const cheerio = require('cheerio');

// Sample HTML
const html = `
<html>
<head>
  <title>Web Scraping Tutorial</title>
</head>
<body>
  <h1 id="main-title">Welcome to My Website</h1>
  <div class="content">
    <p class="intro">This is the introduction paragraph.</p>
    <ul class="navigation">
      <li><a href="/home">Home</a></li>
      <li><a href="/about">About</a></li>
      <li><a href="/contact">Contact</a></li>
    </ul>
    <article data-category="tech">
      <h2>Latest Tech News</h2>
      <p>Technology updates here...</p>
    </article>
  </div>
</body>
</html>
`;

// Load HTML into Cheerio
const $ = cheerio.load(html);

Common Selection Methods

1. Basic Selectors

// Select by tag name
const title = $('h1').text();
console.log(title); // "Welcome to My Website"

// Select by ID
const mainTitle = $('#main-title').text();

// Select by class
const intro = $('.intro').text();

// Select by attribute
const techArticle = $('[data-category="tech"]').find('h2').text();
console.log(techArticle); // "Latest Tech News"

2. CSS Combinators

// Direct child selector
const navLinks = $('.navigation > li');

// Descendant selector
const allLinks = $('.content a');

// Adjacent sibling selector
const nextElement = $('h1 + div');

// General sibling selector
const allSiblings = $('h1 ~ div');

3. Pseudo-selectors

// First and last elements
const firstLink = $('.navigation li:first-child a').text();
const lastLink = $('.navigation li:last-child a').text();

// Nth child
const secondLink = $('.navigation li:nth-child(2) a').text();

// Contains text
const homeLink = $('a:contains("Home")').attr('href');

DOM Traversal Methods

Cheerio provides powerful methods for navigating the DOM tree:

// Find descendant elements
const links = $('.navigation').find('a');

// Get parent elements
const listParent = $('.navigation li').parent();

// Get children
const navItems = $('.navigation').children('li');

// Get siblings
const allSiblings = $('.intro').siblings();
const nextSibling = $('.intro').next();
const prevSibling = $('.intro').prev();

// Get closest ancestor matching selector
const contentDiv = $('.intro').closest('.content');

Practical Web Scraping Examples

Example 1: Extracting All Links

const $ = cheerio.load(html);

const links = [];
$('a').each((index, element) => {
  const link = {
    text: $(element).text().trim(),
    href: $(element).attr('href'),
    title: $(element).attr('title') || null
  };
  links.push(link);
});

console.log(links);
// Output: Array of link objects with text, href, and title

Example 2: Extracting Table Data

const tableHtml = `
<table class="data-table">
  <thead>
    <tr><th>Name</th><th>Age</th><th>City</th></tr>
  </thead>
  <tbody>
    <tr><td>John</td><td>25</td><td>New York</td></tr>
    <tr><td>Jane</td><td>30</td><td>Los Angeles</td></tr>
  </tbody>
</table>
`;

const $ = cheerio.load(tableHtml);

const tableData = [];
$('.data-table tbody tr').each((index, row) => {
  const rowData = {};
  $(row).find('td').each((cellIndex, cell) => {
    const headers = ['name', 'age', 'city'];
    rowData[headers[cellIndex]] = $(cell).text().trim();
  });
  tableData.push(rowData);
});

console.log(tableData);
// Output: [{ name: 'John', age: '25', city: 'New York' }, ...]

Example 3: Complex Selector Combinations

// Select elements with multiple conditions
const specificElements = $('div.content p:not(.intro)');

// Select elements by attribute value
const techArticles = $('article[data-category="tech"]');

// Combine multiple selectors
const importantElements = $('.intro, #main-title, .navigation a');

// Filter results
const externalLinks = $('a').filter((index, element) => {
  const href = $(element).attr('href');
  return href && href.startsWith('http');
});

Error Handling and Best Practices

const cheerio = require('cheerio');

function safeSelect(html, selector) {
  try {
    const $ = cheerio.load(html);
    const elements = $(selector);

    if (elements.length === 0) {
      console.warn(`No elements found for selector: ${selector}`);
      return null;
    }

    return elements;
  } catch (error) {
    console.error('Error parsing HTML:', error.message);
    return null;
  }
}

// Usage
const elements = safeSelect(html, '.non-existent-class');

Performance Tips

  1. Use specific selectors: More specific selectors are faster than broad ones
  2. Cache the Cheerio object: Don't reload HTML unnecessarily
  3. Use .get() when needed: Convert Cheerio objects to arrays when working with regular JavaScript methods
// Efficient way to work with large lists
const $ = cheerio.load(html);
const items = $('.list-item')
  .map((i, el) => $(el).text().trim())
  .get() // Convert to regular array
  .filter(text => text.length > 0);

Cheerio's jQuery-like syntax makes it an excellent choice for server-side HTML parsing and web scraping tasks. When combined with HTTP libraries like axios or node-fetch, it provides a powerful foundation for any web scraping project.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon