Table of contents

What Programming Languages Does Crawlee Support?

Crawlee is a modern web scraping and browser automation library that officially supports JavaScript, TypeScript, and Python. Originally developed as a JavaScript/TypeScript library for Node.js, Crawlee has expanded to include a Python implementation, making it accessible to developers across different language ecosystems.

Crawlee for JavaScript and TypeScript (Node.js)

The original and most mature implementation of Crawlee is designed for Node.js environments, supporting both JavaScript and TypeScript. This version is the most feature-complete and actively maintained by the Apify team.

Installation

Install Crawlee for Node.js using npm, yarn, or pnpm:

# Using npm
npm install crawlee

# Using yarn
yarn add crawlee

# Using pnpm
pnpm add crawlee

TypeScript Example

Crawlee provides excellent TypeScript support with full type definitions out of the box:

import { PlaywrightCrawler, Dataset } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);

        const title = await page.title();
        const data = {
            url: request.url,
            title,
            timestamp: new Date().toISOString(),
        };

        await Dataset.pushData(data);
        await enqueueLinks();
    },
    maxRequestsPerCrawl: 50,
});

await crawler.run(['https://example.com']);

JavaScript Example

The same functionality works seamlessly in plain JavaScript:

const { PlaywrightCrawler, Dataset } = require('crawlee');

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);

        const title = await page.title();
        await Dataset.pushData({
            url: request.url,
            title,
            timestamp: new Date().toISOString(),
        });

        await enqueueLinks();
    },
    maxRequestsPerCrawl: 50,
});

await crawler.run(['https://example.com']);

Available Crawlers in Node.js

The JavaScript/TypeScript version of Crawlee includes several specialized crawlers:

  • CheerioCrawler: Fast HTTP crawler using Cheerio for HTML parsing
  • PlaywrightCrawler: Full browser automation with Playwright
  • PuppeteerCrawler: Full browser automation with Puppeteer
  • JSDOMCrawler: Lightweight DOM parsing with JSDOM
  • HttpCrawler: Basic HTTP requests without HTML parsing

Crawlee for Python

Crawlee for Python is a growing implementation that brings the power of Crawlee to Python developers. While newer than its JavaScript counterpart, it supports the core features needed for effective web scraping.

Installation

Install Crawlee for Python using pip:

pip install crawlee

For browser automation features, install with Playwright:

pip install 'crawlee[playwright]'

Python Example with PlaywrightCrawler

import asyncio
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

async def main() -> None:
    crawler = PlaywrightCrawler(
        max_requests_per_crawl=50,
    )

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        title = await context.page.title()
        await context.push_data({
            'url': context.request.url,
            'title': title,
        })

        await context.enqueue_links()

    await crawler.run(['https://example.com'])

if __name__ == '__main__':
    asyncio.run(main())

Python Example with BeautifulSoupCrawler

For faster, non-browser scraping in Python, use BeautifulSoupCrawler:

import asyncio
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext

async def main() -> None:
    crawler = BeautifulSoupCrawler(
        max_requests_per_crawl=50,
    )

    @crawler.router.default_handler
    async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        title = context.soup.find('title')
        await context.push_data({
            'url': context.request.url,
            'title': title.string if title else None,
        })

        await context.enqueue_links()

    await crawler.run(['https://example.com'])

if __name__ == '__main__':
    asyncio.run(main())

Available Crawlers in Python

The Python version currently includes:

  • BeautifulSoupCrawler: HTTP crawler using BeautifulSoup for parsing
  • PlaywrightCrawler: Browser automation with Playwright
  • HttpCrawler: Basic HTTP requests without parsing

Language Feature Comparison

| Feature | JavaScript/TypeScript | Python | |---------|----------------------|--------| | HTTP Crawling | ✅ CheerioCrawler, HttpCrawler | ✅ BeautifulSoupCrawler, HttpCrawler | | Browser Automation | ✅ PlaywrightCrawler, PuppeteerCrawler | ✅ PlaywrightCrawler | | Request Queue | ✅ Full support | ✅ Full support | | Dataset Storage | ✅ Full support | ✅ Full support | | Key-Value Store | ✅ Full support | ✅ Full support | | Proxy Management | ✅ Full support | ✅ Full support | | Session Management | ✅ Full support | ✅ Full support | | TypeScript Support | ✅ Native | ⚠️ Type hints available | | Maturity | ✅ Production-ready | ⚠️ Actively developing |

Choosing the Right Language

Choose JavaScript/TypeScript When:

  • You're already working in a Node.js ecosystem
  • You need the most mature and feature-complete implementation
  • You want access to both Puppeteer and Playwright crawlers
  • You prefer TypeScript's strong typing for better IDE support
  • Your team has JavaScript/Node.js expertise

Choose Python When:

  • Your existing codebase is in Python
  • You're working with Python data science tools (pandas, numpy, etc.)
  • You prefer Python's syntax and ecosystem
  • You're comfortable with a slightly newer implementation
  • You only need core scraping features

Cross-Language Considerations

Both implementations share the same core concepts and architecture:

Common Features Across Languages

  1. Request Queue Management: Both versions handle request queuing, deduplication, and retry logic
  2. Data Storage: Dataset and key-value store APIs work similarly
  3. Browser Automation: Both support Playwright for handling browser sessions and complex interactions
  4. Proxy Support: Built-in proxy rotation and management
  5. Rate Limiting: Automatic request throttling and concurrency control

Migration Considerations

If you're considering migrating between languages:

  • API Similarity: The Python API closely mirrors the JavaScript API, making conceptual migration straightforward
  • Code Patterns: Both use async/await patterns extensively
  • Storage Compatibility: Data formats are compatible across implementations
  • Documentation: Both versions have comprehensive documentation

Performance Considerations

JavaScript/TypeScript Performance

Node.js provides excellent single-threaded async performance, making it ideal for I/O-bound web scraping tasks. The event loop efficiently handles thousands of concurrent requests.

Python Performance

Python's async implementation (asyncio) is also highly efficient for I/O-bound operations. While CPython has a GIL (Global Interpreter Lock), it doesn't significantly impact web scraping performance since most time is spent waiting for network responses.

Getting Started with Either Language

JavaScript/TypeScript Quick Start

# Create a new project
mkdir my-crawler
cd my-crawler
npm init -y

# Install Crawlee
npm install crawlee

# Create crawler file
touch crawler.js

# Run your crawler
node crawler.js

Python Quick Start

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Crawlee
pip install 'crawlee[playwright]'

# Create crawler file
touch crawler.py

# Run your crawler
python crawler.py

Community and Support

Both language implementations are actively maintained by Apify and have strong community support:

  • Documentation: Comprehensive guides for both languages
  • GitHub: Separate repositories for JavaScript and Python versions
  • Discord: Active community for troubleshooting and discussions
  • Examples: Extensive example collections for both languages

Conclusion

Crawlee supports JavaScript, TypeScript, and Python, making it accessible to a wide range of developers. The JavaScript/TypeScript version is the most mature and feature-rich, while the Python implementation provides excellent coverage of core features and is rapidly evolving. Choose the language that best fits your team's expertise and your project's ecosystem.

Both implementations provide powerful web scraping capabilities, including handling AJAX requests, managing browser automation, and efficiently processing large-scale crawls. Regardless of which language you choose, Crawlee offers a robust, production-ready solution for modern web scraping challenges.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon