What Programming Languages Does Crawlee Support?
Crawlee is a modern web scraping and browser automation library that officially supports JavaScript, TypeScript, and Python. Originally developed as a JavaScript/TypeScript library for Node.js, Crawlee has expanded to include a Python implementation, making it accessible to developers across different language ecosystems.
Crawlee for JavaScript and TypeScript (Node.js)
The original and most mature implementation of Crawlee is designed for Node.js environments, supporting both JavaScript and TypeScript. This version is the most feature-complete and actively maintained by the Apify team.
Installation
Install Crawlee for Node.js using npm, yarn, or pnpm:
# Using npm
npm install crawlee
# Using yarn
yarn add crawlee
# Using pnpm
pnpm add crawlee
TypeScript Example
Crawlee provides excellent TypeScript support with full type definitions out of the box:
import { PlaywrightCrawler, Dataset } from 'crawlee';
const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
log.info(`Processing ${request.url}...`);
const title = await page.title();
const data = {
url: request.url,
title,
timestamp: new Date().toISOString(),
};
await Dataset.pushData(data);
await enqueueLinks();
},
maxRequestsPerCrawl: 50,
});
await crawler.run(['https://example.com']);
JavaScript Example
The same functionality works seamlessly in plain JavaScript:
const { PlaywrightCrawler, Dataset } = require('crawlee');
const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
log.info(`Processing ${request.url}...`);
const title = await page.title();
await Dataset.pushData({
url: request.url,
title,
timestamp: new Date().toISOString(),
});
await enqueueLinks();
},
maxRequestsPerCrawl: 50,
});
await crawler.run(['https://example.com']);
Available Crawlers in Node.js
The JavaScript/TypeScript version of Crawlee includes several specialized crawlers:
- CheerioCrawler: Fast HTTP crawler using Cheerio for HTML parsing
- PlaywrightCrawler: Full browser automation with Playwright
- PuppeteerCrawler: Full browser automation with Puppeteer
- JSDOMCrawler: Lightweight DOM parsing with JSDOM
- HttpCrawler: Basic HTTP requests without HTML parsing
Crawlee for Python
Crawlee for Python is a growing implementation that brings the power of Crawlee to Python developers. While newer than its JavaScript counterpart, it supports the core features needed for effective web scraping.
Installation
Install Crawlee for Python using pip:
pip install crawlee
For browser automation features, install with Playwright:
pip install 'crawlee[playwright]'
Python Example with PlaywrightCrawler
import asyncio
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
async def main() -> None:
crawler = PlaywrightCrawler(
max_requests_per_crawl=50,
)
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')
title = await context.page.title()
await context.push_data({
'url': context.request.url,
'title': title,
})
await context.enqueue_links()
await crawler.run(['https://example.com'])
if __name__ == '__main__':
asyncio.run(main())
Python Example with BeautifulSoupCrawler
For faster, non-browser scraping in Python, use BeautifulSoupCrawler:
import asyncio
from crawlee.beautifulsoup_crawler import BeautifulSoupCrawler, BeautifulSoupCrawlingContext
async def main() -> None:
crawler = BeautifulSoupCrawler(
max_requests_per_crawl=50,
)
@crawler.router.default_handler
async def request_handler(context: BeautifulSoupCrawlingContext) -> None:
context.log.info(f'Processing {context.request.url} ...')
title = context.soup.find('title')
await context.push_data({
'url': context.request.url,
'title': title.string if title else None,
})
await context.enqueue_links()
await crawler.run(['https://example.com'])
if __name__ == '__main__':
asyncio.run(main())
Available Crawlers in Python
The Python version currently includes:
- BeautifulSoupCrawler: HTTP crawler using BeautifulSoup for parsing
- PlaywrightCrawler: Browser automation with Playwright
- HttpCrawler: Basic HTTP requests without parsing
Language Feature Comparison
| Feature | JavaScript/TypeScript | Python | |---------|----------------------|--------| | HTTP Crawling | ✅ CheerioCrawler, HttpCrawler | ✅ BeautifulSoupCrawler, HttpCrawler | | Browser Automation | ✅ PlaywrightCrawler, PuppeteerCrawler | ✅ PlaywrightCrawler | | Request Queue | ✅ Full support | ✅ Full support | | Dataset Storage | ✅ Full support | ✅ Full support | | Key-Value Store | ✅ Full support | ✅ Full support | | Proxy Management | ✅ Full support | ✅ Full support | | Session Management | ✅ Full support | ✅ Full support | | TypeScript Support | ✅ Native | ⚠️ Type hints available | | Maturity | ✅ Production-ready | ⚠️ Actively developing |
Choosing the Right Language
Choose JavaScript/TypeScript When:
- You're already working in a Node.js ecosystem
- You need the most mature and feature-complete implementation
- You want access to both Puppeteer and Playwright crawlers
- You prefer TypeScript's strong typing for better IDE support
- Your team has JavaScript/Node.js expertise
Choose Python When:
- Your existing codebase is in Python
- You're working with Python data science tools (pandas, numpy, etc.)
- You prefer Python's syntax and ecosystem
- You're comfortable with a slightly newer implementation
- You only need core scraping features
Cross-Language Considerations
Both implementations share the same core concepts and architecture:
Common Features Across Languages
- Request Queue Management: Both versions handle request queuing, deduplication, and retry logic
- Data Storage: Dataset and key-value store APIs work similarly
- Browser Automation: Both support Playwright for handling browser sessions and complex interactions
- Proxy Support: Built-in proxy rotation and management
- Rate Limiting: Automatic request throttling and concurrency control
Migration Considerations
If you're considering migrating between languages:
- API Similarity: The Python API closely mirrors the JavaScript API, making conceptual migration straightforward
- Code Patterns: Both use async/await patterns extensively
- Storage Compatibility: Data formats are compatible across implementations
- Documentation: Both versions have comprehensive documentation
Performance Considerations
JavaScript/TypeScript Performance
Node.js provides excellent single-threaded async performance, making it ideal for I/O-bound web scraping tasks. The event loop efficiently handles thousands of concurrent requests.
Python Performance
Python's async implementation (asyncio) is also highly efficient for I/O-bound operations. While CPython has a GIL (Global Interpreter Lock), it doesn't significantly impact web scraping performance since most time is spent waiting for network responses.
Getting Started with Either Language
JavaScript/TypeScript Quick Start
# Create a new project
mkdir my-crawler
cd my-crawler
npm init -y
# Install Crawlee
npm install crawlee
# Create crawler file
touch crawler.js
# Run your crawler
node crawler.js
Python Quick Start
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Crawlee
pip install 'crawlee[playwright]'
# Create crawler file
touch crawler.py
# Run your crawler
python crawler.py
Community and Support
Both language implementations are actively maintained by Apify and have strong community support:
- Documentation: Comprehensive guides for both languages
- GitHub: Separate repositories for JavaScript and Python versions
- Discord: Active community for troubleshooting and discussions
- Examples: Extensive example collections for both languages
Conclusion
Crawlee supports JavaScript, TypeScript, and Python, making it accessible to a wide range of developers. The JavaScript/TypeScript version is the most mature and feature-rich, while the Python implementation provides excellent coverage of core features and is rapidly evolving. Choose the language that best fits your team's expertise and your project's ecosystem.
Both implementations provide powerful web scraping capabilities, including handling AJAX requests, managing browser automation, and efficiently processing large-scale crawls. Regardless of which language you choose, Crawlee offers a robust, production-ready solution for modern web scraping challenges.