Where Can I Find Crawlee Documentation and Examples?

Crawlee is a powerful web scraping and browser automation library that provides comprehensive documentation and numerous examples for developers. Whether you're building a simple web scraper or a complex data extraction pipeline, understanding where to find quality resources is essential for success.

Official Crawlee Documentation

JavaScript Documentation

The primary Crawlee documentation for JavaScript/TypeScript is hosted at crawlee.dev. This official resource provides:

API Reference: Complete documentation for all classes, methods, and interfaces
Guides and Tutorials: Step-by-step instructions for common use cases
Migration Guides: Help transitioning from other scraping tools
Best Practices: Performance optimization and production deployment tips

The JavaScript documentation is well-organized into sections:

# Install Crawlee for JavaScript/Node.js
npm install crawlee
# or
yarn add crawlee

Python Documentation

For Python developers, Crawlee's documentation is available at crawlee.dev/python. The Python version includes:

Installation Instructions: Setup guides for various operating systems
Quick Start Tutorial: Get up and running in minutes
API Documentation: Detailed Python-specific API reference
Examples Repository: Real-world scraping scenarios

# Install Crawlee for Python
pip install crawlee
# or
poetry add crawlee

Key Documentation Sections

1. Getting Started Guide

The getting started section walks you through your first Crawlee scraper. Here's a basic example from the docs:

JavaScript Example:

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);

        // Extract data from the page
        const data = await page.evaluate(() => {
            return {
                title: document.querySelector('h1')?.textContent,
                description: document.querySelector('meta[name="description"]')?.content
            };
        });

        // Save the data
        await Dataset.pushData(data);

        // Find and enqueue links
        await enqueueLinks({
            selector: 'a[href]',
            label: 'detail',
        });
    },
});

await crawler.run(['https://example.com']);

Python Example:

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

async def main():
    crawler = PlaywrightCrawler()

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url}...')

        # Extract data from the page
        data = await context.page.evaluate('''() => {
            return {
                title: document.querySelector('h1')?.textContent,
                description: document.querySelector('meta[name="description"]')?.textContent
            };
        }''')

        # Save the data
        await context.push_data(data)

        # Find and enqueue links
        await context.enqueue_links(selector='a[href]')

    await crawler.run(['https://example.com'])

if __name__ == '__main__':
    import asyncio
    asyncio.run(main())

2. API Reference Documentation

The API reference provides exhaustive documentation for every class and method. Key classes include:

PlaywrightCrawler: For JavaScript-heavy sites requiring browser automation similar to Puppeteer
CheerioCrawler: For static HTML parsing (faster and more efficient)
PuppeteerCrawler: Integration with Puppeteer for legacy projects
HttpCrawler: For API scraping and simple HTTP requests

3. Examples and Use Cases

The documentation includes detailed examples for:

E-commerce scraping: Product listings, prices, reviews
Job board crawling: Structured job posting data
News aggregation: Article extraction and monitoring
Real estate data: Property listings and market data
Social media monitoring: Public profile information

GitHub Repository and Examples

Official Examples Repository

Crawlee maintains an extensive examples repository on GitHub:

JavaScript Examples: github.com/apify/crawlee/tree/master/packages/crawlee/examples
Python Examples: github.com/apify/crawlee-python/tree/master/examples

These repositories contain production-ready code samples including:

// Advanced proxy rotation example
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1.example.com:8000',
        'http://proxy2.example.com:8000',
    ],
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    async requestHandler({ request, page, log }) {
        // Handle authentication challenges
        page.on('dialog', async dialog => {
            await dialog.accept();
        });

        // Wait for dynamic content
        await page.waitForSelector('.product-list', { timeout: 30000 });

        const products = await page.$$eval('.product-item', items => {
            return items.map(item => ({
                name: item.querySelector('.name')?.textContent,
                price: item.querySelector('.price')?.textContent,
                url: item.querySelector('a')?.href
            }));
        });

        await Dataset.pushData(products);
    },
    maxRequestsPerCrawl: 100,
});

await crawler.run(['https://example-shop.com/products']);

Community Resources

Discord Community

Crawlee has an active Discord community where developers share examples and get help:

Server: discord.gg/jyEM2PRvMU
Support channels: Ask questions and get real-time help
Examples sharing: Community members share their scraping solutions
Announcements: Stay updated on new features and releases

Stack Overflow

Search for questions tagged with crawlee:

# Search on Stack Overflow
[crawlee] your search query

YouTube Tutorials

The Apify YouTube channel features video tutorials covering:

Introduction to Crawlee
Building scrapers for specific websites
Advanced techniques and optimization
Handling dynamic content and AJAX requests

Apify Platform Integration

Crawlee is developed by Apify, and the Apify Platform provides additional resources:

Apify Actors: Pre-built scraping solutions using Crawlee
Templates: Starter projects for common scraping scenarios
Apify SDK Documentation: Extended functionality for cloud deployment
Video Courses: Free courses on web scraping with Crawlee

// Deploy Crawlee scraper to Apify
import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';

await Actor.init();

const input = await Actor.getInput();
const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, log }) {
        // Your scraping logic
        const data = await page.evaluate(() => ({
            title: document.title,
            url: window.location.href
        }));

        await Dataset.pushData(data);
    },
});

await crawler.run(input.startUrls);
await Actor.exit();

Advanced Documentation Topics

Request Queue Management

Documentation on managing request queues for large-scale scraping:

from crawlee import RequestQueue

# Initialize a request queue
queue = await RequestQueue.open()

# Add multiple URLs
await queue.add_request('https://example.com/page1')
await queue.add_request('https://example.com/page2')

# Fetch next request
request = await queue.fetch_next_request()

Storage and Data Export

Learn about Crawlee's storage system for datasets, key-value stores, and request queues. The documentation covers:

Dataset exports: JSON, CSV, Excel formats
Key-value storage: For configuration and state management
Request queue persistence: Resumable crawls

Session Management and Cookies

Documentation on handling authentication and maintaining sessions:

import { SessionPool } from 'crawlee';

const sessionPool = new SessionPool({
    maxPoolSize: 20,
    sessionOptions: {
        maxAgeSecs: 3600,
        maxUsageCount: 50,
    },
});

const crawler = new PlaywrightCrawler({
    sessionPoolOptions: {
        maxPoolSize: 20,
    },
    async requestHandler({ session, page }) {
        // Session is automatically managed
        console.log(`Using session: ${session.id}`);
    },
});

TypeScript Support

The JavaScript documentation includes comprehensive TypeScript definitions and examples:

import { PlaywrightCrawler, Dataset } from 'crawlee';
import { Page } from 'playwright';

interface ProductData {
    name: string;
    price: number;
    url: string;
}

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, log }) {
        const products: ProductData[] = await page.evaluate(() => {
            return Array.from(document.querySelectorAll('.product')).map(el => ({
                name: el.querySelector('.name')?.textContent ?? '',
                price: parseFloat(el.querySelector('.price')?.textContent ?? '0'),
                url: el.querySelector('a')?.href ?? ''
            }));
        });

        await Dataset.pushData(products);
    },
});

Keeping Up-to-Date

To stay current with Crawlee documentation updates:

GitHub Releases: Watch the repository for release notes
Blog Posts: Visit blog.apify.com for announcements
Newsletter: Subscribe to Apify's developer newsletter
Twitter/X: Follow @apify for updates

Troubleshooting and FAQ

The documentation includes a comprehensive troubleshooting section covering:

Memory management for large crawls
Debugging tips and logging configuration
Common errors and their solutions
Performance optimization strategies

Conclusion

Crawlee provides extensive, well-maintained documentation across multiple platforms. Whether you prefer JavaScript or Python, the official documentation at crawlee.dev offers comprehensive guides, API references, and practical examples. Combined with the active GitHub repository, Discord community, and Apify Platform resources, developers have all the tools needed to build robust web scraping solutions.

Start with the official documentation's getting started guide, explore the examples repository for your specific use case, and leverage the community resources when you need help. The documentation is regularly updated with new features and best practices, making it an invaluable resource for both beginners and experienced developers.

Table of contents