How do you use Cheerio with TypeScript?

Cheerio is a powerful server-side HTML parsing library that brings jQuery-like functionality to Node.js environments. When combined with TypeScript, it provides excellent type safety and developer experience for web scraping and HTML manipulation tasks. This guide will show you how to properly set up and use Cheerio with TypeScript.

Installing Cheerio with TypeScript Support

First, install Cheerio and its TypeScript definitions:

npm install cheerio
npm install --save-dev @types/cheerio typescript

For newer projects, you might want to use the latest version which includes built-in TypeScript support:

npm install cheerio@latest

Basic TypeScript Configuration

Create or update your tsconfig.json file:

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "lib": ["ES2020"],
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true,
    "resolveJsonModule": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules", "dist"]
}

Basic Cheerio Usage with TypeScript

Here's how to get started with Cheerio in TypeScript:

import * as cheerio from 'cheerio';
import axios from 'axios';

interface ScrapedData {
  title: string;
  links: string[];
  paragraphs: string[];
}

async function scrapeWebsite(url: string): Promise<ScrapedData> {
  try {
    // Fetch the HTML content
    const response = await axios.get(url);
    const html: string = response.data;

    // Load HTML into Cheerio
    const $: cheerio.CheerioAPI = cheerio.load(html);

    // Extract data with type safety
    const title: string = $('title').text().trim();

    const links: string[] = [];
    $('a[href]').each((index: number, element: cheerio.Element) => {
      const href = $(element).attr('href');
      if (href) {
        links.push(href);
      }
    });

    const paragraphs: string[] = [];
    $('p').each((index: number, element: cheerio.Element) => {
      paragraphs.push($(element).text().trim());
    });

    return { title, links, paragraphs };
  } catch (error) {
    throw new Error(`Failed to scrape website: ${error}`);
  }
}

// Usage example
scrapeWebsite('https://example.com')
  .then((data: ScrapedData) => {
    console.log('Title:', data.title);
    console.log('Links found:', data.links.length);
    console.log('Paragraphs:', data.paragraphs.length);
  })
  .catch((error: Error) => {
    console.error('Scraping failed:', error.message);
  });

Advanced TypeScript Features with Cheerio

Custom Type Definitions

Create custom interfaces for structured data extraction:

interface Product {
  name: string;
  price: number;
  description: string;
  imageUrl?: string;
  inStock: boolean;
}

interface ProductPage {
  products: Product[];
  totalCount: number;
  currentPage: number;
}

class ProductScraper {
  private $: cheerio.CheerioAPI;

  constructor(html: string) {
    this.$ = cheerio.load(html);
  }

  public extractProducts(): Product[] {
    const products: Product[] = [];

    this.$('.product-item').each((index: number, element: cheerio.Element) => {
      const $product = this.$(element);

      const name: string = $product.find('.product-name').text().trim();
      const priceText: string = $product.find('.price').text().trim();
      const price: number = parseFloat(priceText.replace(/[^0-9.]/g, '')) || 0;
      const description: string = $product.find('.description').text().trim();
      const imageUrl: string | undefined = $product.find('img').attr('src');
      const inStock: boolean = !$product.hasClass('out-of-stock');

      if (name && price > 0) {
        products.push({
          name,
          price,
          description,
          imageUrl,
          inStock
        });
      }
    });

    return products;
  }

  public getPageInfo(): { currentPage: number; totalPages: number } {
    const currentPage: number = parseInt(this.$('.pagination .current').text()) || 1;
    const totalPages: number = parseInt(this.$('.pagination .page').last().text()) || 1;

    return { currentPage, totalPages };
  }
}

Generic Helper Functions

Create reusable, type-safe helper functions:

function extractTextArray($: cheerio.CheerioAPI, selector: string): string[] {
  const results: string[] = [];
  $(selector).each((index: number, element: cheerio.Element) => {
    const text = $(element).text().trim();
    if (text) {
      results.push(text);
    }
  });
  return results;
}

function extractAttributes<T extends string>(
  $: cheerio.CheerioAPI, 
  selector: string, 
  attribute: T
): string[] {
  const results: string[] = [];
  $(selector).each((index: number, element: cheerio.Element) => {
    const attr = $(element).attr(attribute);
    if (attr) {
      results.push(attr);
    }
  });
  return results;
}

// Usage examples
const $ = cheerio.load('<div><p>Text 1</p><p>Text 2</p><a href="/link1">Link</a></div>');

const paragraphTexts: string[] = extractTextArray($, 'p');
const linkHrefs: string[] = extractAttributes($, 'a', 'href');

Error Handling and Type Safety

Implement robust error handling with TypeScript:

type ScrapeResult<T> = {
  success: true;
  data: T;
} | {
  success: false;
  error: string;
};

async function safeScrape<T>(
  url: string, 
  extractor: (html: string) => T
): Promise<ScrapeResult<T>> {
  try {
    const response = await axios.get(url, {
      timeout: 10000,
      headers: {
        'User-Agent': 'Mozilla/5.0 (compatible; TypeScript-Scraper/1.0)'
      }
    });

    if (response.status !== 200) {
      return {
        success: false,
        error: `HTTP ${response.status}: ${response.statusText}`
      };
    }

    const data = extractor(response.data);
    return { success: true, data };

  } catch (error) {
    return {
      success: false,
      error: error instanceof Error ? error.message : 'Unknown error occurred'
    };
  }
}

// Usage with type safety
const result = await safeScrape('https://example.com', (html: string) => {
  const $ = cheerio.load(html);
  return {
    title: $('title').text(),
    headings: extractTextArray($, 'h1, h2, h3')
  };
});

if (result.success) {
  console.log('Title:', result.data.title);
  console.log('Headings:', result.data.headings);
} else {
  console.error('Scraping failed:', result.error);
}

Working with Forms and Complex Structures

Handle complex HTML structures with type safety:

interface FormField {
  name: string;
  type: string;
  value?: string;
  required: boolean;
  options?: string[];
}

interface FormData {
  action: string;
  method: string;
  fields: FormField[];
}

function extractFormData($: cheerio.CheerioAPI, formSelector: string): FormData | null {
  const $form = $(formSelector).first();

  if ($form.length === 0) {
    return null;
  }

  const action: string = $form.attr('action') || '';
  const method: string = $form.attr('method')?.toUpperCase() || 'GET';
  const fields: FormField[] = [];

  $form.find('input, select, textarea').each((index: number, element: cheerio.Element) => {
    const $field = $(element);
    const name: string = $field.attr('name') || '';
    const type: string = $field.attr('type') || element.tagName.toLowerCase();
    const value: string | undefined = $field.attr('value') || $field.text();
    const required: boolean = $field.attr('required') !== undefined;

    let options: string[] | undefined;
    if (element.tagName.toLowerCase() === 'select') {
      options = [];
      $field.find('option').each((i: number, opt: cheerio.Element) => {
        const optionText = $(opt).text().trim();
        if (optionText) {
          options!.push(optionText);
        }
      });
    }

    if (name) {
      fields.push({ name, type, value, required, options });
    }
  });

  return { action, method, fields };
}

Integration with Modern TypeScript Patterns

Use modern TypeScript features for better code organization:

// Using async/await with proper typing
class WebScraper {
  private readonly baseUrl: string;
  private readonly timeout: number;

  constructor(baseUrl: string, timeout: number = 5000) {
    this.baseUrl = baseUrl;
    this.timeout = timeout;
  }

  async scrapeMultiplePages<T>(
    paths: string[], 
    extractor: (html: string, url: string) => T
  ): Promise<T[]> {
    const promises = paths.map(async (path: string): Promise<T> => {
      const url = new URL(path, this.baseUrl).toString();
      const response = await axios.get(url, { timeout: this.timeout });
      return extractor(response.data, url);
    });

    return Promise.all(promises);
  }
}

// Usage example
const scraper = new WebScraper('https://example.com');

const results = await scraper.scrapeMultiplePages(
  ['/page1', '/page2', '/page3'],
  (html: string, url: string) => {
    const $ = cheerio.load(html);
    return {
      url,
      title: $('title').text(),
      contentLength: $.text().length
    };
  }
);

Best Practices for TypeScript and Cheerio

1. Type Your Selectors

// Create constants for commonly used selectors
const SELECTORS = {
  TITLE: 'title',
  LINKS: 'a[href]',
  IMAGES: 'img[src]',
  PARAGRAPHS: 'p'
} as const;

// Use them consistently
const title: string = $(SELECTORS.TITLE).text();

2. Validate Data Types

function parseNumber(text: string): number {
  const num = parseFloat(text.replace(/[^0-9.-]/g, ''));
  return isNaN(num) ? 0 : num;
}

function parseBoolean(text: string): boolean {
  return /^(true|yes|1|on)$/i.test(text.trim());
}

3. Use Strict Type Checking

Enable strict mode in your TypeScript configuration and handle null/undefined cases:

function safeText($element: cheerio.Cheerio<cheerio.Element>): string {
  const text = $element.text();
  return text ? text.trim() : '';
}

function safeAttr($element: cheerio.Cheerio<cheerio.Element>, attr: string): string | null {
  return $element.attr(attr) || null;
}

Testing Cheerio with TypeScript

Set up proper testing with Jest and TypeScript:

// scraper.test.ts
import * as cheerio from 'cheerio';
import { extractProductData } from './scraper';

describe('Product Scraper', () => {
  const mockHtml = `
    <div class="product">
      <h2 class="name">Test Product</h2>
      <span class="price">$29.99</span>
      <p class="description">Great product</p>
    </div>
  `;

  test('should extract product data correctly', () => {
    const $ = cheerio.load(mockHtml);
    const product = extractProductData($, '.product');

    expect(product).toEqual({
      name: 'Test Product',
      price: 29.99,
      description: 'Great product'
    });
  });

  test('should handle missing elements gracefully', () => {
    const $ = cheerio.load('<div></div>');
    const product = extractProductData($, '.product');

    expect(product).toBeNull();
  });
});

Conclusion

Using Cheerio with TypeScript provides excellent type safety and developer experience for server-side HTML parsing and web scraping tasks. The combination allows you to catch errors at compile time, get better IntelliSense support, and write more maintainable code.

Key benefits include: - Type Safety: Catch errors before runtime - Better IDE Support: Auto-completion and refactoring tools - Code Documentation: Interfaces serve as documentation - Maintainability: Easier to refactor and extend

For more complex scraping scenarios involving JavaScript-heavy websites, consider combining Cheerio with browser automation tools or exploring how to handle dynamic content that loads after page load using browser automation frameworks.

When working with large-scale scraping projects, you might also want to implement proper error handling patterns and monitoring to ensure reliable data extraction.

Table of contents

How do you use Cheerio with TypeScript?

Installing Cheerio with TypeScript Support

Basic TypeScript Configuration

Basic Cheerio Usage with TypeScript

Advanced TypeScript Features with Cheerio

Custom Type Definitions

Generic Helper Functions

Error Handling and Type Safety

Working with Forms and Complex Structures

Integration with Modern TypeScript Patterns

Best Practices for TypeScript and Cheerio

1. Type Your Selectors

2. Validate Data Types

3. Use Strict Type Checking

Testing Cheerio with TypeScript

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

📖 Related Blog Guides

Web Scraping with JavaScript

JavaScript Scraping Libraries

Related Questions

How do you extract metadata from HTML head tags using Cheerio?

How do you handle dynamically loaded content that requires JavaScript execution?

What are the limitations of Cheerio compared to full browser automation tools?

Get Started Now

Support