What Programming Languages Does Crawlee Support?

Crawlee is a modern web scraping and browser automation library that officially supports JavaScript, TypeScript, and Python. Originally developed as a JavaScript/TypeScript library for Node.js, Crawlee has expanded to include a Python implementation, making it accessible to developers across different language ecosystems.

Crawlee for JavaScript and TypeScript (Node.js)

The original and most mature implementation of Crawlee is designed for Node.js environments, supporting both JavaScript and TypeScript. This version is the most feature-complete and actively maintained by the Apify team.

Installation

Install Crawlee for Node.js using npm, yarn, or pnpm:

# Using npm
npm install crawlee

# Using yarn
yarn add crawlee

# Using pnpm
pnpm add crawlee

TypeScript Example

Crawlee provides excellent TypeScript support with full type definitions out of the box:

import { PlaywrightCrawler, Dataset } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);

        const title = await page.title();
        const data = {
            url: request.url,
            title,
            timestamp: new Date().toISOString(),
        };

        await Dataset.pushData(data);
        await enqueueLinks();
    },
    maxRequestsPerCrawl: 50,
});

await crawler.run(['https://example.com']);

JavaScript Example

The same functionality works seamlessly in plain JavaScript:

const { PlaywrightCrawler, Dataset } = require('crawlee');

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);

        const title = await page.title();
        await Dataset.pushData({
            url: request.url,
            title,
            timestamp: new Date().toISOString(),
        });

        await enqueueLinks();
    },
    maxRequestsPerCrawl: 50,
});

await crawler.run(['https://example.com']);

Available Crawlers in Node.js

The JavaScript/TypeScript version of Crawlee includes several specialized crawlers:

CheerioCrawler: Fast HTTP crawler using Cheerio for HTML parsing
PlaywrightCrawler: Full browser automation with Playwright
PuppeteerCrawler: Full browser automation with Puppeteer
JSDOMCrawler: Lightweight DOM parsing with JSDOM
HttpCrawler: Basic HTTP requests without HTML parsing

Crawlee for Python

Crawlee for Python is a growing implementation that brings the power of Crawlee to Python developers. While newer than its JavaScript counterpart, it supports the core features needed for effective web scraping.

Installation

Install Crawlee for Python using pip:

pip install crawlee

For browser automation features, install with Playwright:

pip install 'crawlee[playwright]'

Python Example with PlaywrightCrawler

import asyncio
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

async def main() -> None:
    crawler = PlaywrightCrawler(
        max_requests_per_crawl=50,
    )

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        title = await context.page.title()
        await context.push_data({
            'url': context.request.url,
            'title': title,
        })

        await context.enqueue_links()

    await crawler.run(['https://example.com'])

if __name__ == '__main__':
    asyncio.run(main())

Python Example with BeautifulSoupCrawler

For faster, non-browser scraping in Python, use BeautifulSoupCrawler:

import asyncio
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext

async def main() -> None:
    crawler = BeautifulSoupCrawler(
        max_requests_per_crawl=50,
    )

    @crawler.router.default_handler
    async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url} ...')

        title = context.soup.find('title')
        await context.push_data({
            'url': context.request.url,
            'title': title.string if title else None,
        })

        await context.enqueue_links()

    await crawler.run(['https://example.com'])

if __name__ == '__main__':
    asyncio.run(main())

Available Crawlers in Python

The Python version currently includes:

BeautifulSoupCrawler: HTTP crawler using BeautifulSoup for parsing
PlaywrightCrawler: Browser automation with Playwright
HttpCrawler: Basic HTTP requests without parsing

Language Feature Comparison

| Feature | JavaScript/TypeScript | Python | |---------|----------------------|--------| | HTTP Crawling | ✅ CheerioCrawler, HttpCrawler | ✅ BeautifulSoupCrawler, HttpCrawler | | Browser Automation | ✅ PlaywrightCrawler, PuppeteerCrawler | ✅ PlaywrightCrawler | | Request Queue | ✅ Full support | ✅ Full support | | Dataset Storage | ✅ Full support | ✅ Full support | | Key-Value Store | ✅ Full support | ✅ Full support | | Proxy Management | ✅ Full support | ✅ Full support | | Session Management | ✅ Full support | ✅ Full support | | TypeScript Support | ✅ Native | ⚠️ Type hints available | | Maturity | ✅ Production-ready | ⚠️ Actively developing |

Choosing the Right Language

Choose JavaScript/TypeScript When:

You're already working in a Node.js ecosystem
You need the most mature and feature-complete implementation
You want access to both Puppeteer and Playwright crawlers
You prefer TypeScript's strong typing for better IDE support
Your team has JavaScript/Node.js expertise

Choose Python When:

Your existing codebase is in Python
You're working with Python data science tools (pandas, numpy, etc.)
You prefer Python's syntax and ecosystem
You're comfortable with a slightly newer implementation
You only need core scraping features

Cross-Language Considerations

Both implementations share the same core concepts and architecture:

Common Features Across Languages

Request Queue Management: Both versions handle request queuing, deduplication, and retry logic
Data Storage: Dataset and key-value store APIs work similarly
Browser Automation: Both support Playwright for handling browser sessions and complex interactions
Proxy Support: Built-in proxy rotation and management
Rate Limiting: Automatic request throttling and concurrency control

Migration Considerations

If you're considering migrating between languages:

API Similarity: The Python API closely mirrors the JavaScript API, making conceptual migration straightforward
Code Patterns: Both use async/await patterns extensively
Storage Compatibility: Data formats are compatible across implementations
Documentation: Both versions have comprehensive documentation

Performance Considerations

JavaScript/TypeScript Performance

Node.js provides excellent single-threaded async performance, making it ideal for I/O-bound web scraping tasks. The event loop efficiently handles thousands of concurrent requests.

Python Performance

Python's async implementation (asyncio) is also highly efficient for I/O-bound operations. While CPython has a GIL (Global Interpreter Lock), it doesn't significantly impact web scraping performance since most time is spent waiting for network responses.

Getting Started with Either Language

JavaScript/TypeScript Quick Start

# Create a new project
mkdir my-crawler
cd my-crawler
npm init -y

# Install Crawlee
npm install crawlee

# Create crawler file
touch crawler.js

# Run your crawler
node crawler.js

Python Quick Start

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Crawlee
pip install 'crawlee[playwright]'

# Create crawler file
touch crawler.py

# Run your crawler
python crawler.py

Community and Support

Both language implementations are actively maintained by Apify and have strong community support:

Documentation: Comprehensive guides for both languages
GitHub: Separate repositories for JavaScript and Python versions
Discord: Active community for troubleshooting and discussions
Examples: Extensive example collections for both languages

Conclusion

Crawlee supports JavaScript, TypeScript, and Python, making it accessible to a wide range of developers. The JavaScript/TypeScript version is the most mature and feature-rich, while the Python implementation provides excellent coverage of core features and is rapidly evolving. Choose the language that best fits your team's expertise and your project's ecosystem.

Both implementations provide powerful web scraping capabilities, including handling AJAX requests, managing browser automation, and efficiently processing large-scale crawls. Regardless of which language you choose, Crawlee offers a robust, production-ready solution for modern web scraping challenges.

Table of contents

What Programming Languages Does Crawlee Support?

Crawlee for JavaScript and TypeScript (Node.js)

Installation

TypeScript Example

JavaScript Example

Available Crawlers in Node.js

Crawlee for Python

Installation

Python Example with PlaywrightCrawler

Python Example with BeautifulSoupCrawler

Available Crawlers in Python

Language Feature Comparison

Choosing the Right Language

Choose JavaScript/TypeScript When:

Choose Python When:

Cross-Language Considerations

Common Features Across Languages

Migration Considerations

Performance Considerations

JavaScript/TypeScript Performance

Python Performance

Getting Started with Either Language

JavaScript/TypeScript Quick Start

Python Quick Start

Community and Support

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

How does Crawlee compare to Scrapy for web scraping?

What are the differences between Crawlee and BeautifulSoup?

Should I use Crawlee with Playwright or Puppeteer for browser automation?

Get Started Now

Support