What Proxy Providers Work Best with Crawlee?
When building production-grade web scrapers with Crawlee, choosing the right proxy provider is crucial for avoiding blocks, accessing geo-restricted content, and ensuring reliable data extraction. Crawlee offers excellent proxy integration support, making it compatible with virtually any proxy service that provides HTTP/HTTPS or SOCKS5 proxies.
Top Proxy Providers for Crawlee
1. Bright Data (formerly Luminati)
Bright Data is one of the most comprehensive proxy networks available, offering residential, datacenter, and mobile proxies with excellent reliability.
Pros: - Largest residential proxy network (72+ million IPs) - Built-in proxy configuration for Crawlee - Automatic IP rotation - High success rates for scraping protected sites - Excellent geographic coverage
Cons: - Higher pricing compared to competitors - Can be complex for beginners
Example configuration:
import { PlaywrightCrawler } from 'crawlee';
const crawler = new PlaywrightCrawler({
proxyConfiguration: {
proxyUrls: [
'http://username-session-random:password@brd.superproxy.io:22225'
],
},
async requestHandler({ page, request }) {
const title = await page.title();
console.log(`Title: ${title}`);
},
});
await crawler.run(['https://example.com']);
2. Oxylabs
Oxylabs provides enterprise-grade proxy solutions with excellent performance and customer support.
Pros: - Large residential proxy pool (100+ million IPs) - Dedicated account managers - City and state-level targeting - Strong uptime guarantees - Good JavaScript rendering support
Cons: - Premium pricing - Requires business contact for setup
Configuration example:
import { CheerioCrawler } from 'crawlee';
const crawler = new CheerioCrawler({
proxyConfiguration: {
proxyUrls: [
'http://customer-USERNAME:PASSWORD@pr.oxylabs.io:7777'
],
},
async requestHandler({ $, request }) {
const title = $('title').text();
console.log(`Scraped: ${title}`);
},
});
await crawler.run(['https://example.com']);
3. Smartproxy
Smartproxy offers a good balance between price and performance, making it popular among mid-size scraping operations.
Pros: - Affordable pricing plans - 40+ million residential IPs - Simple authentication - Good for handling browser sessions in Puppeteer - City-level targeting available
Cons: - Smaller pool than top-tier providers - Limited enterprise features
Implementation:
import { PuppeteerCrawler } from 'crawlee';
const crawler = new PuppeteerCrawler({
proxyConfiguration: {
proxyUrls: [
'http://user-USERNAME:PASSWORD@gate.smartproxy.com:7000'
],
},
launchContext: {
launchOptions: {
headless: true,
},
},
async requestHandler({ page, request }) {
await page.waitForSelector('h1');
const heading = await page.$eval('h1', el => el.textContent);
console.log(`H1: ${heading}`);
},
});
await crawler.run(['https://example.com']);
4. Apify Proxy
Apify Proxy integrates seamlessly with Crawlee since both are maintained by the Apify team.
Pros: - Native Crawlee integration - Automatic retry and rotation - Simple configuration - Pay-as-you-go pricing - Built-in datacenter and residential options
Cons: - Requires Apify account - Smaller residential pool
Built-in configuration:
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
'http://auto:apify_proxy_PASSWORD@proxy.apify.com:8000'
],
});
const crawler = new PlaywrightCrawler({
proxyConfiguration,
async requestHandler({ page, request, proxyInfo }) {
console.log(`Using proxy: ${proxyInfo.url}`);
const content = await page.content();
},
});
await crawler.run(['https://example.com']);
5. ScraperAPI
ScraperAPI is more than a proxy provider—it's a complete scraping API that handles proxies, headers, and AJAX requests automatically.
Pros: - Automatic proxy rotation - JavaScript rendering included - Simple API-based approach - Good for beginners - Handles CAPTCHAs (premium plans)
Cons: - Credit-based pricing - Less control over proxy selection
Usage with Crawlee:
import { CheerioCrawler } from 'crawlee';
const SCRAPER_API_KEY = 'your_api_key';
const crawler = new CheerioCrawler({
async requestHandler({ $, request, crawler }) {
// ScraperAPI wraps the URL
const apiUrl = `http://api.scraperapi.com/?api_key=${SCRAPER_API_KEY}&url=${encodeURIComponent(request.url)}`;
// Request is automatically proxied
const title = $('title').text();
console.log(`Title: ${title}`);
},
});
await crawler.run(['https://example.com']);
6. WebScrapingAPI
WebScrapingAPI provides a developer-friendly proxy and scraping solution with built-in JavaScript rendering.
Pros: - Simple API integration - Automatic proxy rotation - JavaScript rendering support - Good documentation - Competitive pricing
Example integration:
import { CheerioCrawler } from 'crawlee';
const API_KEY = 'your_api_key';
const crawler = new CheerioCrawler({
async requestHandler({ $, request }) {
const apiUrl = `https://api.webscrapingapi.com/v1?api_key=${API_KEY}&url=${encodeURIComponent(request.url)}`;
// The API handles proxy rotation automatically
const data = $('body').text();
console.log(data);
},
});
await crawler.run(['https://example.com']);
Advanced Proxy Configuration in Crawlee
Using Multiple Proxy Providers
You can configure Crawlee to use multiple proxy providers for redundancy:
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
'http://user1:pass1@provider1.com:8000',
'http://user2:pass2@provider2.com:8000',
'http://user3:pass3@provider3.com:8000',
],
});
const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxRequestRetries: 3,
async requestHandler({ page, request, proxyInfo }) {
console.log(`Request via: ${proxyInfo.url}`);
await page.goto(request.url);
},
});
Session-Based Proxy Rotation
For scenarios requiring consistent IP addresses across multiple requests:
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
'http://user-session-{sessionId}:pass@provider.com:8000',
],
});
const crawler = new PlaywrightCrawler({
proxyConfiguration,
useSessionPool: true,
sessionPoolOptions: {
maxPoolSize: 100,
sessionOptions: {
maxUsageCount: 50, // Reuse session up to 50 times
},
},
async requestHandler({ page, request, session }) {
console.log(`Session ID: ${session.id}`);
await page.goto(request.url);
},
});
Geographic Targeting
Configure proxies for specific countries or cities:
import { PuppeteerCrawler } from 'crawlee';
const crawler = new PuppeteerCrawler({
proxyConfiguration: {
proxyUrls: [
// Bright Data with US geo-targeting
'http://user-country-us:pass@brd.superproxy.io:22225',
// Smartproxy with UK targeting
'http://user-country-gb:pass@gate.smartproxy.com:7000',
],
},
async requestHandler({ page, request }) {
const content = await page.content();
// Will see geo-specific content
},
});
Proxy Testing and Monitoring
Always test your proxy configuration before running production crawls:
import { ProxyConfiguration } from 'crawlee';
async function testProxies() {
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
'http://user:pass@provider.com:8000',
],
});
const proxyInfo = await proxyConfiguration.newProxyInfo();
console.log(`Testing proxy: ${proxyInfo.url}`);
// Test the proxy
try {
const response = await fetch('https://api.ipify.org?format=json', {
agent: proxyInfo.agent,
});
const data = await response.json();
console.log(`Proxy IP: ${data.ip}`);
} catch (error) {
console.error(`Proxy failed: ${error.message}`);
}
}
testProxies();
Best Practices for Proxy Usage with Crawlee
- Rotate Proxies Regularly: Use session-based rotation to avoid IP bans
- Monitor Success Rates: Track which proxies work best for your targets
- Use Residential Proxies for Protected Sites: Datacenter proxies may be blocked by advanced bot detection
- Implement Retry Logic: Configure
maxRequestRetries
to handle proxy failures - Respect Rate Limits: Even with proxies, avoid overwhelming target servers
- Test Before Production: Always verify proxy functionality with small-scale tests
Cost Considerations
| Provider | Entry Price | Best For | |----------|-------------|----------| | Bright Data | ~$500/month | Enterprise scraping | | Oxylabs | ~$300/month | Large-scale operations | | Smartproxy | ~$75/month | Small to medium projects | | Apify Proxy | Pay-as-you-go | Apify platform users | | ScraperAPI | $49/month | Beginners, simple projects | | WebScrapingAPI | $39/month | Budget-conscious developers |
Conclusion
The best proxy provider for Crawlee depends on your specific needs, budget, and target websites. For enterprise-level scraping with advanced bot detection, Bright Data or Oxylabs offer the most robust solutions. For smaller projects or those just getting started, Smartproxy or ScraperAPI provide excellent value.
Regardless of which provider you choose, Crawlee's flexible proxy configuration system makes it easy to integrate any HTTP/HTTPS proxy service. Start with a small test to evaluate performance, then scale up based on your success rates and requirements.
For more advanced scraping techniques, explore how to handle authentication in Puppeteer or learn about monitoring network requests to optimize your scraping workflows.