Table of contents

Where Can I Find Crawlee Documentation and Examples?

Crawlee is a powerful web scraping and browser automation library that provides comprehensive documentation and numerous examples for developers. Whether you're building a simple web scraper or a complex data extraction pipeline, understanding where to find quality resources is essential for success.

Official Crawlee Documentation

JavaScript Documentation

The primary Crawlee documentation for JavaScript/TypeScript is hosted at crawlee.dev. This official resource provides:

  • API Reference: Complete documentation for all classes, methods, and interfaces
  • Guides and Tutorials: Step-by-step instructions for common use cases
  • Migration Guides: Help transitioning from other scraping tools
  • Best Practices: Performance optimization and production deployment tips

The JavaScript documentation is well-organized into sections:

# Install Crawlee for JavaScript/Node.js
npm install crawlee
# or
yarn add crawlee

Python Documentation

For Python developers, Crawlee's documentation is available at crawlee.dev/python. The Python version includes:

  • Installation Instructions: Setup guides for various operating systems
  • Quick Start Tutorial: Get up and running in minutes
  • API Documentation: Detailed Python-specific API reference
  • Examples Repository: Real-world scraping scenarios
# Install Crawlee for Python
pip install crawlee
# or
poetry add crawlee

Key Documentation Sections

1. Getting Started Guide

The getting started section walks you through your first Crawlee scraper. Here's a basic example from the docs:

JavaScript Example:

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, enqueueLinks, log }) {
        log.info(`Processing ${request.url}...`);

        // Extract data from the page
        const data = await page.evaluate(() => {
            return {
                title: document.querySelector('h1')?.textContent,
                description: document.querySelector('meta[name="description"]')?.content
            };
        });

        // Save the data
        await Dataset.pushData(data);

        // Find and enqueue links
        await enqueueLinks({
            selector: 'a[href]',
            label: 'detail',
        });
    },
});

await crawler.run(['https://example.com']);

Python Example:

from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

async def main():
    crawler = PlaywrightCrawler()

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        context.log.info(f'Processing {context.request.url}...')

        # Extract data from the page
        data = await context.page.evaluate('''() => {
            return {
                title: document.querySelector('h1')?.textContent,
                description: document.querySelector('meta[name="description"]')?.textContent
            };
        }''')

        # Save the data
        await context.push_data(data)

        # Find and enqueue links
        await context.enqueue_links(selector='a[href]')

    await crawler.run(['https://example.com'])

if __name__ == '__main__':
    import asyncio
    asyncio.run(main())

2. API Reference Documentation

The API reference provides exhaustive documentation for every class and method. Key classes include:

  • PlaywrightCrawler: For JavaScript-heavy sites requiring browser automation similar to Puppeteer
  • CheerioCrawler: For static HTML parsing (faster and more efficient)
  • PuppeteerCrawler: Integration with Puppeteer for legacy projects
  • HttpCrawler: For API scraping and simple HTTP requests

3. Examples and Use Cases

The documentation includes detailed examples for:

  • E-commerce scraping: Product listings, prices, reviews
  • Job board crawling: Structured job posting data
  • News aggregation: Article extraction and monitoring
  • Real estate data: Property listings and market data
  • Social media monitoring: Public profile information

GitHub Repository and Examples

Official Examples Repository

Crawlee maintains an extensive examples repository on GitHub:

These repositories contain production-ready code samples including:

// Advanced proxy rotation example
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://proxy1.example.com:8000',
        'http://proxy2.example.com:8000',
    ],
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    async requestHandler({ request, page, log }) {
        // Handle authentication challenges
        page.on('dialog', async dialog => {
            await dialog.accept();
        });

        // Wait for dynamic content
        await page.waitForSelector('.product-list', { timeout: 30000 });

        const products = await page.$$eval('.product-item', items => {
            return items.map(item => ({
                name: item.querySelector('.name')?.textContent,
                price: item.querySelector('.price')?.textContent,
                url: item.querySelector('a')?.href
            }));
        });

        await Dataset.pushData(products);
    },
    maxRequestsPerCrawl: 100,
});

await crawler.run(['https://example-shop.com/products']);

Community Resources

Discord Community

Crawlee has an active Discord community where developers share examples and get help:

  • Server: discord.gg/jyEM2PRvMU
  • Support channels: Ask questions and get real-time help
  • Examples sharing: Community members share their scraping solutions
  • Announcements: Stay updated on new features and releases

Stack Overflow

Search for questions tagged with crawlee:

# Search on Stack Overflow
[crawlee] your search query

YouTube Tutorials

The Apify YouTube channel features video tutorials covering:

Apify Platform Integration

Crawlee is developed by Apify, and the Apify Platform provides additional resources:

  • Apify Actors: Pre-built scraping solutions using Crawlee
  • Templates: Starter projects for common scraping scenarios
  • Apify SDK Documentation: Extended functionality for cloud deployment
  • Video Courses: Free courses on web scraping with Crawlee
// Deploy Crawlee scraper to Apify
import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';

await Actor.init();

const input = await Actor.getInput();
const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, log }) {
        // Your scraping logic
        const data = await page.evaluate(() => ({
            title: document.title,
            url: window.location.href
        }));

        await Dataset.pushData(data);
    },
});

await crawler.run(input.startUrls);
await Actor.exit();

Advanced Documentation Topics

Request Queue Management

Documentation on managing request queues for large-scale scraping:

from crawlee import RequestQueue

# Initialize a request queue
queue = await RequestQueue.open()

# Add multiple URLs
await queue.add_request('https://example.com/page1')
await queue.add_request('https://example.com/page2')

# Fetch next request
request = await queue.fetch_next_request()

Storage and Data Export

Learn about Crawlee's storage system for datasets, key-value stores, and request queues. The documentation covers:

  • Dataset exports: JSON, CSV, Excel formats
  • Key-value storage: For configuration and state management
  • Request queue persistence: Resumable crawls

Session Management and Cookies

Documentation on handling authentication and maintaining sessions:

import { SessionPool } from 'crawlee';

const sessionPool = new SessionPool({
    maxPoolSize: 20,
    sessionOptions: {
        maxAgeSecs: 3600,
        maxUsageCount: 50,
    },
});

const crawler = new PlaywrightCrawler({
    sessionPoolOptions: {
        maxPoolSize: 20,
    },
    async requestHandler({ session, page }) {
        // Session is automatically managed
        console.log(`Using session: ${session.id}`);
    },
});

TypeScript Support

The JavaScript documentation includes comprehensive TypeScript definitions and examples:

import { PlaywrightCrawler, Dataset } from 'crawlee';
import { Page } from 'playwright';

interface ProductData {
    name: string;
    price: number;
    url: string;
}

const crawler = new PlaywrightCrawler({
    async requestHandler({ request, page, log }) {
        const products: ProductData[] = await page.evaluate(() => {
            return Array.from(document.querySelectorAll('.product')).map(el => ({
                name: el.querySelector('.name')?.textContent ?? '',
                price: parseFloat(el.querySelector('.price')?.textContent ?? '0'),
                url: el.querySelector('a')?.href ?? ''
            }));
        });

        await Dataset.pushData(products);
    },
});

Keeping Up-to-Date

To stay current with Crawlee documentation updates:

  • GitHub Releases: Watch the repository for release notes
  • Blog Posts: Visit blog.apify.com for announcements
  • Newsletter: Subscribe to Apify's developer newsletter
  • Twitter/X: Follow @apify for updates

Troubleshooting and FAQ

The documentation includes a comprehensive troubleshooting section covering:

  • Memory management for large crawls
  • Debugging tips and logging configuration
  • Common errors and their solutions
  • Performance optimization strategies

Conclusion

Crawlee provides extensive, well-maintained documentation across multiple platforms. Whether you prefer JavaScript or Python, the official documentation at crawlee.dev offers comprehensive guides, API references, and practical examples. Combined with the active GitHub repository, Discord community, and Apify Platform resources, developers have all the tools needed to build robust web scraping solutions.

Start with the official documentation's getting started guide, explore the examples repository for your specific use case, and leverage the community resources when you need help. The documentation is regularly updated with new features and best practices, making it an invaluable resource for both beginners and experienced developers.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon