How do I configure proxies in n8n for web scraping?
Configuring proxies in n8n is essential for effective web scraping, especially when dealing with rate limits, geo-restrictions, or IP blocking. Proxies allow you to route your requests through different IP addresses, making your scraping activities more resilient and less likely to be detected or blocked. This guide covers multiple approaches to using proxies in n8n workflows.
Why Use Proxies for Web Scraping in n8n?
Before diving into configuration, it's important to understand when and why you need proxies:
- IP Rotation: Distribute requests across multiple IPs to avoid rate limiting
- Geographic Targeting: Access region-specific content by using proxies from specific countries
- Anonymity: Hide your actual IP address from target websites
- Scaling: Handle large-scale scraping operations without triggering anti-bot measures
- Avoiding Bans: Prevent your IP from being permanently blocked
- Bypassing Restrictions: Access content that may be restricted in your region
Method 1: Using Proxies with HTTP Request Node
The simplest way to use proxies in n8n is through the HTTP Request node, which has built-in proxy support.
Basic HTTP Request with Proxy Configuration
// In n8n HTTP Request Node settings:
// URL: https://example.com/data
// Method: GET
// Authentication: None
// Options:
// - Proxy: http://username:password@proxy-server.com:8080
Using Environment Variables for Proxy Configuration
For better security and maintainability, store proxy credentials in environment variables:
# In your .env file or environment configuration
export HTTP_PROXY="http://username:password@proxy-server.com:8080"
export HTTPS_PROXY="http://username:password@proxy-server.com:8080"
export NO_PROXY="localhost,127.0.0.1"
Then in n8n, the HTTP Request node will automatically use these environment variables if the "Use Proxy" option is enabled.
Dynamic Proxy Selection with Function Node
For advanced scenarios requiring proxy rotation, use a Function node before your HTTP Request:
// Function Node: Select Random Proxy
const proxies = [
'http://user1:pass1@proxy1.example.com:8080',
'http://user2:pass2@proxy2.example.com:8080',
'http://user3:pass3@proxy3.example.com:8080',
'http://user4:pass4@proxy4.example.com:8080'
];
// Randomly select a proxy
const selectedProxy = proxies[Math.floor(Math.random() * proxies.length)];
// Return the proxy for use in the next node
return {
json: {
proxyUrl: selectedProxy,
targetUrl: 'https://example.com/api/data'
}
};
Then configure the HTTP Request node to use {{ $json.proxyUrl }}
as the proxy value.
Method 2: Configuring Proxies with Puppeteer in n8n
When using Puppeteer for web scraping in n8n, proxy configuration requires passing proxy arguments during browser launch. This approach is particularly useful when you use Puppeteer with n8n for web scraping dynamic content.
Basic Puppeteer Proxy Setup
// Function Node with Puppeteer
const puppeteer = require('puppeteer');
// Proxy configuration
const proxyServer = 'proxy-server.com:8080';
const proxyUsername = 'your-username';
const proxyPassword = 'your-password';
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
`--proxy-server=${proxyServer}`
]
});
try {
const page = await browser.newPage();
// Authenticate with proxy
await page.authenticate({
username: proxyUsername,
password: proxyPassword
});
// Navigate to target URL
await page.goto('https://example.com', {
waitUntil: 'networkidle2'
});
// Extract data
const data = await page.evaluate(() => {
return {
title: document.title,
content: document.querySelector('.main-content')?.innerText,
ip: document.querySelector('.your-ip')?.innerText // To verify proxy is working
};
});
await browser.close();
return [{ json: data }];
} catch (error) {
await browser.close();
throw new Error(`Scraping failed: ${error.message}`);
}
Advanced Puppeteer Proxy with Rotation
For enterprise-grade scraping, implement proxy rotation with Puppeteer:
// Function Node: Puppeteer with Proxy Rotation
const puppeteer = require('puppeteer');
// Define your proxy pool
const proxyPool = [
{ server: 'proxy1.example.com:8080', username: 'user1', password: 'pass1' },
{ server: 'proxy2.example.com:8080', username: 'user2', password: 'pass2' },
{ server: 'proxy3.example.com:8080', username: 'user3', password: 'pass3' }
];
// Select proxy (round-robin or random)
const currentIndex = $node["Function"].index || 0;
const proxy = proxyPool[currentIndex % proxyPool.length];
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
`--proxy-server=${proxy.server}`
]
});
try {
const page = await browser.newPage();
// Set proxy authentication
await page.authenticate({
username: proxy.username,
password: proxy.password
});
// Set realistic headers to avoid detection
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
});
// Navigate to target
await page.goto('https://example.com/products', {
waitUntil: 'networkidle2',
timeout: 30000
});
// Wait for content to load
await page.waitForSelector('.product-list', { timeout: 10000 });
// Extract data
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product-item')).map(item => ({
name: item.querySelector('.product-name')?.innerText,
price: item.querySelector('.product-price')?.innerText,
url: item.querySelector('a')?.href
}));
});
await browser.close();
return [{
json: {
success: true,
proxy: proxy.server,
productsCount: products.length,
products: products
}
}];
} catch (error) {
await browser.close();
return [{
json: {
success: false,
proxy: proxy.server,
error: error.message
}
}];
} finally {
// Increment index for next iteration
$node["Function"].index = (currentIndex + 1) % proxyPool.length;
}
Handling SOCKS Proxies with Puppeteer
SOCKS proxies offer better performance for certain scenarios:
const puppeteer = require('puppeteer');
// SOCKS5 proxy configuration
const socksProxy = 'socks5://username:password@proxy-server.com:1080';
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
`--proxy-server=${socksProxy}`
]
});
const page = await browser.newPage();
// For SOCKS proxies, authentication might work differently
// Some providers require authentication in the URL itself
await page.goto('https://example.com', {
waitUntil: 'networkidle2'
});
const data = await page.evaluate(() => ({
title: document.title,
body: document.body.innerText.substring(0, 500)
}));
await browser.close();
return [{ json: data }];
Method 3: Using Proxies with Playwright in n8n
Playwright is another browser automation tool similar to Puppeteer. When configuring Playwright with n8n, proxy setup follows a similar pattern:
// Function Node with Playwright
const { chromium } = require('playwright');
const browser = await chromium.launch({
headless: true,
proxy: {
server: 'http://proxy-server.com:8080',
username: 'your-username',
password: 'your-password'
}
});
try {
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
});
const page = await context.newPage();
await page.goto('https://example.com', {
waitUntil: 'networkidle'
});
const data = await page.evaluate(() => {
return {
title: document.title,
heading: document.querySelector('h1')?.innerText
};
});
await browser.close();
return [{ json: data }];
} catch (error) {
await browser.close();
throw error;
}
Method 4: Using Proxy Services with API Integration
Many proxy providers offer API-based solutions that simplify proxy management. Here's an example using a residential proxy service:
// Function Node: Using Proxy Service API
const axios = require('axios');
// Get a proxy from your provider's API
const proxyResponse = await axios.get('https://proxy-provider.com/api/get-proxy', {
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
},
params: {
country: 'US',
type: 'residential'
}
});
const proxyUrl = proxyResponse.data.proxy_url;
// Use the proxy for your request
const response = await axios.get('https://example.com/api/data', {
proxy: {
host: proxyUrl.split(':')[0],
port: parseInt(proxyUrl.split(':')[1]),
auth: {
username: 'your-username',
password: 'your-password'
}
}
});
return [{ json: response.data }];
Verifying Proxy Configuration
Always verify that your proxy is working correctly:
// Function Node: Verify Proxy
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
`--proxy-server=${proxy.server}`
]
});
const page = await browser.newPage();
await page.authenticate({
username: proxy.username,
password: proxy.password
});
// Visit a site that shows your IP address
await page.goto('https://api.ipify.org?format=json', {
waitUntil: 'networkidle2'
});
const ipData = await page.evaluate(() => {
return JSON.parse(document.body.innerText);
});
await browser.close();
return [{
json: {
detectedIP: ipData.ip,
proxyServer: proxy.server,
proxyWorking: true
}
}];
Best Practices for Using Proxies in n8n
1. Implement Retry Logic with Proxy Rotation
const maxRetries = 3;
let attempt = 0;
let success = false;
let result = null;
while (attempt < maxRetries && !success) {
try {
// Select different proxy for each attempt
const proxy = proxyPool[attempt % proxyPool.length];
// Your scraping code here
result = await scrapeWithProxy(proxy);
success = true;
} catch (error) {
attempt++;
if (attempt >= maxRetries) {
throw new Error(`Failed after ${maxRetries} attempts: ${error.message}`);
}
// Wait before retrying
await new Promise(resolve => setTimeout(resolve, 2000));
}
}
return [{ json: result }];
2. Monitor Proxy Health
Track proxy performance and automatically remove failing proxies:
// Store proxy statistics in n8n workflow data
const proxyStats = $workflow.staticData.proxyStats || {};
if (!proxyStats[proxy.server]) {
proxyStats[proxy.server] = {
requests: 0,
failures: 0,
lastUsed: new Date()
};
}
proxyStats[proxy.server].requests++;
proxyStats[proxy.server].lastUsed = new Date();
if (error) {
proxyStats[proxy.server].failures++;
// Remove proxy if failure rate > 50%
const failureRate = proxyStats[proxy.server].failures / proxyStats[proxy.server].requests;
if (failureRate > 0.5 && proxyStats[proxy.server].requests > 10) {
// Mark proxy as unhealthy
proxyStats[proxy.server].healthy = false;
}
}
$workflow.staticData.proxyStats = proxyStats;
3. Use Appropriate Proxy Types
- Datacenter Proxies: Fast and cheap, but easier to detect
- Residential Proxies: More expensive but appear as real user IPs
- Rotating Proxies: Automatically change IP with each request
- Static Residential: Maintain same IP for session-based scraping
4. Respect Rate Limits
Add delays between requests even when using proxies:
// Add random delay between 1-3 seconds
const delay = Math.floor(Math.random() * 2000) + 1000;
await page.waitForTimeout(delay);
5. Handle Proxy Timeouts
Set appropriate timeouts to avoid stuck workflows:
const browser = await puppeteer.launch({
headless: true,
args: [`--proxy-server=${proxy.server}`],
timeout: 30000 // Browser launch timeout
});
const page = await browser.newPage();
page.setDefaultTimeout(30000); // Page action timeout
try {
await page.goto('https://example.com', {
waitUntil: 'networkidle2',
timeout: 30000 // Navigation timeout
});
} catch (error) {
if (error.name === 'TimeoutError') {
// Handle timeout - try different proxy
return await retryWithDifferentProxy();
}
throw error;
}
Combining Proxies with Other Anti-Detection Techniques
For robust scraping operations, combine proxies with other techniques when handling browser sessions in Puppeteer:
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-blink-features=AutomationControlled',
`--proxy-server=${proxy.server}`
]
});
const page = await browser.newPage();
// Proxy authentication
await page.authenticate({
username: proxy.username,
password: proxy.password
});
// Hide automation indicators
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => false
});
window.navigator.chrome = {
runtime: {}
};
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
});
// Set realistic viewport
await page.setViewport({
width: 1920,
height: 1080,
deviceScaleFactor: 1
});
// Random user agent
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];
await page.setUserAgent(userAgents[Math.floor(Math.random() * userAgents.length)]);
// Add realistic headers
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Referer': 'https://www.google.com/'
});
// Continue with scraping...
await page.goto('https://example.com');
const data = await page.evaluate(() => ({
content: document.body.innerText
}));
await browser.close();
return [{ json: data }];
Troubleshooting Common Proxy Issues
Issue 1: "Proxy authentication required"
Solution: Ensure credentials are correctly formatted:
// Correct format
await page.authenticate({
username: 'your-username', // No URL encoding
password: 'your-password'
});
// Or include in proxy URL
const proxyUrl = `http://${encodeURIComponent(username)}:${encodeURIComponent(password)}@proxy.com:8080`;
Issue 2: "Proxy connection timeout"
Solution: Test proxy connectivity separately and increase timeouts:
// Test proxy first
const testResponse = await axios.get('https://httpbin.org/ip', {
proxy: {
host: proxyHost,
port: proxyPort
},
timeout: 10000
});
if (testResponse.status === 200) {
// Proxy is working, proceed with scraping
}
Issue 3: "Too many requests / IP banned"
Solution: Implement proper proxy rotation and delays:
// Rotate through proxy pool
const proxyIndex = Math.floor(Math.random() * proxyPool.length);
const proxy = proxyPool[proxyIndex];
// Add random delay
const delay = Math.floor(Math.random() * 3000) + 2000;
await new Promise(resolve => setTimeout(resolve, delay));
Using WebScraping.AI as a Proxy Alternative
For developers who want to avoid the complexity of managing proxies, WebScraping.AI provides a scraping API that handles proxy rotation, JavaScript rendering, and anti-bot detection automatically:
// Function Node: Using WebScraping.AI API
const axios = require('axios');
const response = await axios.get('https://api.webscraping.ai/html', {
params: {
api_key: 'YOUR_API_KEY',
url: 'https://example.com',
js: true, // Enable JavaScript rendering
proxy: 'residential' // Automatic proxy selection
}
});
const html = response.data;
// Process HTML as needed
return [{ json: { html } }];
This approach eliminates the need to manage proxy pools, handle authentication, or deal with browser automation complexity, allowing you to focus on extracting and processing data in your n8n workflows.
Conclusion
Configuring proxies in n8n for web scraping provides essential capabilities for building robust, scalable scraping workflows. Whether using the built-in HTTP Request node, Puppeteer, or Playwright, proper proxy configuration helps avoid rate limits, access geo-restricted content, and maintain anonymity.
Key takeaways:
- Use HTTP Request node proxies for simple API scraping
- Implement Puppeteer or Playwright proxies for JavaScript-heavy sites
- Always rotate proxies and implement retry logic
- Monitor proxy health and performance
- Combine proxies with other anti-detection techniques
- Consider managed solutions like WebScraping.AI for production use
By following the examples and best practices in this guide, you can build reliable web scraping workflows that scale with your data extraction needs while respecting target websites and avoiding detection.