Table of contents

What Proxy Providers Work Best with Crawlee?

When building production-grade web scrapers with Crawlee, choosing the right proxy provider is crucial for avoiding blocks, accessing geo-restricted content, and ensuring reliable data extraction. Crawlee offers excellent proxy integration support, making it compatible with virtually any proxy service that provides HTTP/HTTPS or SOCKS5 proxies.

Top Proxy Providers for Crawlee

1. Bright Data (formerly Luminati)

Bright Data is one of the most comprehensive proxy networks available, offering residential, datacenter, and mobile proxies with excellent reliability.

Pros: - Largest residential proxy network (72+ million IPs) - Built-in proxy configuration for Crawlee - Automatic IP rotation - High success rates for scraping protected sites - Excellent geographic coverage

Cons: - Higher pricing compared to competitors - Can be complex for beginners

Example configuration:

import { PlaywrightCrawler } from 'crawlee';

const crawler = new PlaywrightCrawler({
    proxyConfiguration: {
        proxyUrls: [
            'http://username-session-random:password@brd.superproxy.io:22225'
        ],
    },
    async requestHandler({ page, request }) {
        const title = await page.title();
        console.log(`Title: ${title}`);
    },
});

await crawler.run(['https://example.com']);

2. Oxylabs

Oxylabs provides enterprise-grade proxy solutions with excellent performance and customer support.

Pros: - Large residential proxy pool (100+ million IPs) - Dedicated account managers - City and state-level targeting - Strong uptime guarantees - Good JavaScript rendering support

Cons: - Premium pricing - Requires business contact for setup

Configuration example:

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
    proxyConfiguration: {
        proxyUrls: [
            'http://customer-USERNAME:PASSWORD@pr.oxylabs.io:7777'
        ],
    },
    async requestHandler({ $, request }) {
        const title = $('title').text();
        console.log(`Scraped: ${title}`);
    },
});

await crawler.run(['https://example.com']);

3. Smartproxy

Smartproxy offers a good balance between price and performance, making it popular among mid-size scraping operations.

Pros: - Affordable pricing plans - 40+ million residential IPs - Simple authentication - Good for handling browser sessions in Puppeteer - City-level targeting available

Cons: - Smaller pool than top-tier providers - Limited enterprise features

Implementation:

import { PuppeteerCrawler } from 'crawlee';

const crawler = new PuppeteerCrawler({
    proxyConfiguration: {
        proxyUrls: [
            'http://user-USERNAME:PASSWORD@gate.smartproxy.com:7000'
        ],
    },
    launchContext: {
        launchOptions: {
            headless: true,
        },
    },
    async requestHandler({ page, request }) {
        await page.waitForSelector('h1');
        const heading = await page.$eval('h1', el => el.textContent);
        console.log(`H1: ${heading}`);
    },
});

await crawler.run(['https://example.com']);

4. Apify Proxy

Apify Proxy integrates seamlessly with Crawlee since both are maintained by the Apify team.

Pros: - Native Crawlee integration - Automatic retry and rotation - Simple configuration - Pay-as-you-go pricing - Built-in datacenter and residential options

Cons: - Requires Apify account - Smaller residential pool

Built-in configuration:

import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://auto:apify_proxy_PASSWORD@proxy.apify.com:8000'
    ],
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    async requestHandler({ page, request, proxyInfo }) {
        console.log(`Using proxy: ${proxyInfo.url}`);
        const content = await page.content();
    },
});

await crawler.run(['https://example.com']);

5. ScraperAPI

ScraperAPI is more than a proxy provider—it's a complete scraping API that handles proxies, headers, and AJAX requests automatically.

Pros: - Automatic proxy rotation - JavaScript rendering included - Simple API-based approach - Good for beginners - Handles CAPTCHAs (premium plans)

Cons: - Credit-based pricing - Less control over proxy selection

Usage with Crawlee:

import { CheerioCrawler } from 'crawlee';

const SCRAPER_API_KEY = 'your_api_key';

const crawler = new CheerioCrawler({
    async requestHandler({ $, request, crawler }) {
        // ScraperAPI wraps the URL
        const apiUrl = `http://api.scraperapi.com/?api_key=${SCRAPER_API_KEY}&url=${encodeURIComponent(request.url)}`;
        // Request is automatically proxied
        const title = $('title').text();
        console.log(`Title: ${title}`);
    },
});

await crawler.run(['https://example.com']);

6. WebScrapingAPI

WebScrapingAPI provides a developer-friendly proxy and scraping solution with built-in JavaScript rendering.

Pros: - Simple API integration - Automatic proxy rotation - JavaScript rendering support - Good documentation - Competitive pricing

Example integration:

import { CheerioCrawler } from 'crawlee';

const API_KEY = 'your_api_key';

const crawler = new CheerioCrawler({
    async requestHandler({ $, request }) {
        const apiUrl = `https://api.webscrapingapi.com/v1?api_key=${API_KEY}&url=${encodeURIComponent(request.url)}`;
        // The API handles proxy rotation automatically
        const data = $('body').text();
        console.log(data);
    },
});

await crawler.run(['https://example.com']);

Advanced Proxy Configuration in Crawlee

Using Multiple Proxy Providers

You can configure Crawlee to use multiple proxy providers for redundancy:

import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://user1:pass1@provider1.com:8000',
        'http://user2:pass2@provider2.com:8000',
        'http://user3:pass3@provider3.com:8000',
    ],
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    maxRequestRetries: 3,
    async requestHandler({ page, request, proxyInfo }) {
        console.log(`Request via: ${proxyInfo.url}`);
        await page.goto(request.url);
    },
});

Session-Based Proxy Rotation

For scenarios requiring consistent IP addresses across multiple requests:

import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://user-session-{sessionId}:pass@provider.com:8000',
    ],
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    useSessionPool: true,
    sessionPoolOptions: {
        maxPoolSize: 100,
        sessionOptions: {
            maxUsageCount: 50, // Reuse session up to 50 times
        },
    },
    async requestHandler({ page, request, session }) {
        console.log(`Session ID: ${session.id}`);
        await page.goto(request.url);
    },
});

Geographic Targeting

Configure proxies for specific countries or cities:

import { PuppeteerCrawler } from 'crawlee';

const crawler = new PuppeteerCrawler({
    proxyConfiguration: {
        proxyUrls: [
            // Bright Data with US geo-targeting
            'http://user-country-us:pass@brd.superproxy.io:22225',
            // Smartproxy with UK targeting
            'http://user-country-gb:pass@gate.smartproxy.com:7000',
        ],
    },
    async requestHandler({ page, request }) {
        const content = await page.content();
        // Will see geo-specific content
    },
});

Proxy Testing and Monitoring

Always test your proxy configuration before running production crawls:

import { ProxyConfiguration } from 'crawlee';

async function testProxies() {
    const proxyConfiguration = new ProxyConfiguration({
        proxyUrls: [
            'http://user:pass@provider.com:8000',
        ],
    });

    const proxyInfo = await proxyConfiguration.newProxyInfo();
    console.log(`Testing proxy: ${proxyInfo.url}`);

    // Test the proxy
    try {
        const response = await fetch('https://api.ipify.org?format=json', {
            agent: proxyInfo.agent,
        });
        const data = await response.json();
        console.log(`Proxy IP: ${data.ip}`);
    } catch (error) {
        console.error(`Proxy failed: ${error.message}`);
    }
}

testProxies();

Best Practices for Proxy Usage with Crawlee

  1. Rotate Proxies Regularly: Use session-based rotation to avoid IP bans
  2. Monitor Success Rates: Track which proxies work best for your targets
  3. Use Residential Proxies for Protected Sites: Datacenter proxies may be blocked by advanced bot detection
  4. Implement Retry Logic: Configure maxRequestRetries to handle proxy failures
  5. Respect Rate Limits: Even with proxies, avoid overwhelming target servers
  6. Test Before Production: Always verify proxy functionality with small-scale tests

Cost Considerations

| Provider | Entry Price | Best For | |----------|-------------|----------| | Bright Data | ~$500/month | Enterprise scraping | | Oxylabs | ~$300/month | Large-scale operations | | Smartproxy | ~$75/month | Small to medium projects | | Apify Proxy | Pay-as-you-go | Apify platform users | | ScraperAPI | $49/month | Beginners, simple projects | | WebScrapingAPI | $39/month | Budget-conscious developers |

Conclusion

The best proxy provider for Crawlee depends on your specific needs, budget, and target websites. For enterprise-level scraping with advanced bot detection, Bright Data or Oxylabs offer the most robust solutions. For smaller projects or those just getting started, Smartproxy or ScraperAPI provide excellent value.

Regardless of which provider you choose, Crawlee's flexible proxy configuration system makes it easy to integrate any HTTP/HTTPS proxy service. Start with a small test to evaluate performance, then scale up based on your success rates and requirements.

For more advanced scraping techniques, explore how to handle authentication in Puppeteer or learn about monitoring network requests to optimize your scraping workflows.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon