Table of contents

How do I configure proxies in n8n for web scraping?

Configuring proxies in n8n is essential for effective web scraping, especially when dealing with rate limits, geo-restrictions, or IP blocking. Proxies allow you to route your requests through different IP addresses, making your scraping activities more resilient and less likely to be detected or blocked. This guide covers multiple approaches to using proxies in n8n workflows.

Why Use Proxies for Web Scraping in n8n?

Before diving into configuration, it's important to understand when and why you need proxies:

  • IP Rotation: Distribute requests across multiple IPs to avoid rate limiting
  • Geographic Targeting: Access region-specific content by using proxies from specific countries
  • Anonymity: Hide your actual IP address from target websites
  • Scaling: Handle large-scale scraping operations without triggering anti-bot measures
  • Avoiding Bans: Prevent your IP from being permanently blocked
  • Bypassing Restrictions: Access content that may be restricted in your region

Method 1: Using Proxies with HTTP Request Node

The simplest way to use proxies in n8n is through the HTTP Request node, which has built-in proxy support.

Basic HTTP Request with Proxy Configuration

// In n8n HTTP Request Node settings:
// URL: https://example.com/data
// Method: GET
// Authentication: None
// Options:
//   - Proxy: http://username:password@proxy-server.com:8080

Using Environment Variables for Proxy Configuration

For better security and maintainability, store proxy credentials in environment variables:

# In your .env file or environment configuration
export HTTP_PROXY="http://username:password@proxy-server.com:8080"
export HTTPS_PROXY="http://username:password@proxy-server.com:8080"
export NO_PROXY="localhost,127.0.0.1"

Then in n8n, the HTTP Request node will automatically use these environment variables if the "Use Proxy" option is enabled.

Dynamic Proxy Selection with Function Node

For advanced scenarios requiring proxy rotation, use a Function node before your HTTP Request:

// Function Node: Select Random Proxy
const proxies = [
  'http://user1:pass1@proxy1.example.com:8080',
  'http://user2:pass2@proxy2.example.com:8080',
  'http://user3:pass3@proxy3.example.com:8080',
  'http://user4:pass4@proxy4.example.com:8080'
];

// Randomly select a proxy
const selectedProxy = proxies[Math.floor(Math.random() * proxies.length)];

// Return the proxy for use in the next node
return {
  json: {
    proxyUrl: selectedProxy,
    targetUrl: 'https://example.com/api/data'
  }
};

Then configure the HTTP Request node to use {{ $json.proxyUrl }} as the proxy value.

Method 2: Configuring Proxies with Puppeteer in n8n

When using Puppeteer for web scraping in n8n, proxy configuration requires passing proxy arguments during browser launch. This approach is particularly useful when you use Puppeteer with n8n for web scraping dynamic content.

Basic Puppeteer Proxy Setup

// Function Node with Puppeteer
const puppeteer = require('puppeteer');

// Proxy configuration
const proxyServer = 'proxy-server.com:8080';
const proxyUsername = 'your-username';
const proxyPassword = 'your-password';

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    `--proxy-server=${proxyServer}`
  ]
});

try {
  const page = await browser.newPage();

  // Authenticate with proxy
  await page.authenticate({
    username: proxyUsername,
    password: proxyPassword
  });

  // Navigate to target URL
  await page.goto('https://example.com', {
    waitUntil: 'networkidle2'
  });

  // Extract data
  const data = await page.evaluate(() => {
    return {
      title: document.title,
      content: document.querySelector('.main-content')?.innerText,
      ip: document.querySelector('.your-ip')?.innerText // To verify proxy is working
    };
  });

  await browser.close();

  return [{ json: data }];

} catch (error) {
  await browser.close();
  throw new Error(`Scraping failed: ${error.message}`);
}

Advanced Puppeteer Proxy with Rotation

For enterprise-grade scraping, implement proxy rotation with Puppeteer:

// Function Node: Puppeteer with Proxy Rotation
const puppeteer = require('puppeteer');

// Define your proxy pool
const proxyPool = [
  { server: 'proxy1.example.com:8080', username: 'user1', password: 'pass1' },
  { server: 'proxy2.example.com:8080', username: 'user2', password: 'pass2' },
  { server: 'proxy3.example.com:8080', username: 'user3', password: 'pass3' }
];

// Select proxy (round-robin or random)
const currentIndex = $node["Function"].index || 0;
const proxy = proxyPool[currentIndex % proxyPool.length];

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',
    `--proxy-server=${proxy.server}`
  ]
});

try {
  const page = await browser.newPage();

  // Set proxy authentication
  await page.authenticate({
    username: proxy.username,
    password: proxy.password
  });

  // Set realistic headers to avoid detection
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36');

  await page.setExtraHTTPHeaders({
    'Accept-Language': 'en-US,en;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
  });

  // Navigate to target
  await page.goto('https://example.com/products', {
    waitUntil: 'networkidle2',
    timeout: 30000
  });

  // Wait for content to load
  await page.waitForSelector('.product-list', { timeout: 10000 });

  // Extract data
  const products = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.product-item')).map(item => ({
      name: item.querySelector('.product-name')?.innerText,
      price: item.querySelector('.product-price')?.innerText,
      url: item.querySelector('a')?.href
    }));
  });

  await browser.close();

  return [{
    json: {
      success: true,
      proxy: proxy.server,
      productsCount: products.length,
      products: products
    }
  }];

} catch (error) {
  await browser.close();

  return [{
    json: {
      success: false,
      proxy: proxy.server,
      error: error.message
    }
  }];
} finally {
  // Increment index for next iteration
  $node["Function"].index = (currentIndex + 1) % proxyPool.length;
}

Handling SOCKS Proxies with Puppeteer

SOCKS proxies offer better performance for certain scenarios:

const puppeteer = require('puppeteer');

// SOCKS5 proxy configuration
const socksProxy = 'socks5://username:password@proxy-server.com:1080';

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    `--proxy-server=${socksProxy}`
  ]
});

const page = await browser.newPage();

// For SOCKS proxies, authentication might work differently
// Some providers require authentication in the URL itself
await page.goto('https://example.com', {
  waitUntil: 'networkidle2'
});

const data = await page.evaluate(() => ({
  title: document.title,
  body: document.body.innerText.substring(0, 500)
}));

await browser.close();

return [{ json: data }];

Method 3: Using Proxies with Playwright in n8n

Playwright is another browser automation tool similar to Puppeteer. When configuring Playwright with n8n, proxy setup follows a similar pattern:

// Function Node with Playwright
const { chromium } = require('playwright');

const browser = await chromium.launch({
  headless: true,
  proxy: {
    server: 'http://proxy-server.com:8080',
    username: 'your-username',
    password: 'your-password'
  }
});

try {
  const context = await browser.newContext({
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
  });

  const page = await context.newPage();

  await page.goto('https://example.com', {
    waitUntil: 'networkidle'
  });

  const data = await page.evaluate(() => {
    return {
      title: document.title,
      heading: document.querySelector('h1')?.innerText
    };
  });

  await browser.close();

  return [{ json: data }];

} catch (error) {
  await browser.close();
  throw error;
}

Method 4: Using Proxy Services with API Integration

Many proxy providers offer API-based solutions that simplify proxy management. Here's an example using a residential proxy service:

// Function Node: Using Proxy Service API
const axios = require('axios');

// Get a proxy from your provider's API
const proxyResponse = await axios.get('https://proxy-provider.com/api/get-proxy', {
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  params: {
    country: 'US',
    type: 'residential'
  }
});

const proxyUrl = proxyResponse.data.proxy_url;

// Use the proxy for your request
const response = await axios.get('https://example.com/api/data', {
  proxy: {
    host: proxyUrl.split(':')[0],
    port: parseInt(proxyUrl.split(':')[1]),
    auth: {
      username: 'your-username',
      password: 'your-password'
    }
  }
});

return [{ json: response.data }];

Verifying Proxy Configuration

Always verify that your proxy is working correctly:

// Function Node: Verify Proxy
const puppeteer = require('puppeteer');

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    `--proxy-server=${proxy.server}`
  ]
});

const page = await browser.newPage();

await page.authenticate({
  username: proxy.username,
  password: proxy.password
});

// Visit a site that shows your IP address
await page.goto('https://api.ipify.org?format=json', {
  waitUntil: 'networkidle2'
});

const ipData = await page.evaluate(() => {
  return JSON.parse(document.body.innerText);
});

await browser.close();

return [{
  json: {
    detectedIP: ipData.ip,
    proxyServer: proxy.server,
    proxyWorking: true
  }
}];

Best Practices for Using Proxies in n8n

1. Implement Retry Logic with Proxy Rotation

const maxRetries = 3;
let attempt = 0;
let success = false;
let result = null;

while (attempt < maxRetries && !success) {
  try {
    // Select different proxy for each attempt
    const proxy = proxyPool[attempt % proxyPool.length];

    // Your scraping code here
    result = await scrapeWithProxy(proxy);
    success = true;

  } catch (error) {
    attempt++;

    if (attempt >= maxRetries) {
      throw new Error(`Failed after ${maxRetries} attempts: ${error.message}`);
    }

    // Wait before retrying
    await new Promise(resolve => setTimeout(resolve, 2000));
  }
}

return [{ json: result }];

2. Monitor Proxy Health

Track proxy performance and automatically remove failing proxies:

// Store proxy statistics in n8n workflow data
const proxyStats = $workflow.staticData.proxyStats || {};

if (!proxyStats[proxy.server]) {
  proxyStats[proxy.server] = {
    requests: 0,
    failures: 0,
    lastUsed: new Date()
  };
}

proxyStats[proxy.server].requests++;
proxyStats[proxy.server].lastUsed = new Date();

if (error) {
  proxyStats[proxy.server].failures++;

  // Remove proxy if failure rate > 50%
  const failureRate = proxyStats[proxy.server].failures / proxyStats[proxy.server].requests;
  if (failureRate > 0.5 && proxyStats[proxy.server].requests > 10) {
    // Mark proxy as unhealthy
    proxyStats[proxy.server].healthy = false;
  }
}

$workflow.staticData.proxyStats = proxyStats;

3. Use Appropriate Proxy Types

  • Datacenter Proxies: Fast and cheap, but easier to detect
  • Residential Proxies: More expensive but appear as real user IPs
  • Rotating Proxies: Automatically change IP with each request
  • Static Residential: Maintain same IP for session-based scraping

4. Respect Rate Limits

Add delays between requests even when using proxies:

// Add random delay between 1-3 seconds
const delay = Math.floor(Math.random() * 2000) + 1000;
await page.waitForTimeout(delay);

5. Handle Proxy Timeouts

Set appropriate timeouts to avoid stuck workflows:

const browser = await puppeteer.launch({
  headless: true,
  args: [`--proxy-server=${proxy.server}`],
  timeout: 30000 // Browser launch timeout
});

const page = await browser.newPage();
page.setDefaultTimeout(30000); // Page action timeout

try {
  await page.goto('https://example.com', {
    waitUntil: 'networkidle2',
    timeout: 30000 // Navigation timeout
  });
} catch (error) {
  if (error.name === 'TimeoutError') {
    // Handle timeout - try different proxy
    return await retryWithDifferentProxy();
  }
  throw error;
}

Combining Proxies with Other Anti-Detection Techniques

For robust scraping operations, combine proxies with other techniques when handling browser sessions in Puppeteer:

const puppeteer = require('puppeteer');

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-blink-features=AutomationControlled',
    `--proxy-server=${proxy.server}`
  ]
});

const page = await browser.newPage();

// Proxy authentication
await page.authenticate({
  username: proxy.username,
  password: proxy.password
});

// Hide automation indicators
await page.evaluateOnNewDocument(() => {
  Object.defineProperty(navigator, 'webdriver', {
    get: () => false
  });

  window.navigator.chrome = {
    runtime: {}
  };

  Object.defineProperty(navigator, 'plugins', {
    get: () => [1, 2, 3, 4, 5]
  });
});

// Set realistic viewport
await page.setViewport({
  width: 1920,
  height: 1080,
  deviceScaleFactor: 1
});

// Random user agent
const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];

await page.setUserAgent(userAgents[Math.floor(Math.random() * userAgents.length)]);

// Add realistic headers
await page.setExtraHTTPHeaders({
  'Accept-Language': 'en-US,en;q=0.9',
  'Accept-Encoding': 'gzip, deflate, br',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Referer': 'https://www.google.com/'
});

// Continue with scraping...
await page.goto('https://example.com');

const data = await page.evaluate(() => ({
  content: document.body.innerText
}));

await browser.close();

return [{ json: data }];

Troubleshooting Common Proxy Issues

Issue 1: "Proxy authentication required"

Solution: Ensure credentials are correctly formatted:

// Correct format
await page.authenticate({
  username: 'your-username', // No URL encoding
  password: 'your-password'
});

// Or include in proxy URL
const proxyUrl = `http://${encodeURIComponent(username)}:${encodeURIComponent(password)}@proxy.com:8080`;

Issue 2: "Proxy connection timeout"

Solution: Test proxy connectivity separately and increase timeouts:

// Test proxy first
const testResponse = await axios.get('https://httpbin.org/ip', {
  proxy: {
    host: proxyHost,
    port: proxyPort
  },
  timeout: 10000
});

if (testResponse.status === 200) {
  // Proxy is working, proceed with scraping
}

Issue 3: "Too many requests / IP banned"

Solution: Implement proper proxy rotation and delays:

// Rotate through proxy pool
const proxyIndex = Math.floor(Math.random() * proxyPool.length);
const proxy = proxyPool[proxyIndex];

// Add random delay
const delay = Math.floor(Math.random() * 3000) + 2000;
await new Promise(resolve => setTimeout(resolve, delay));

Using WebScraping.AI as a Proxy Alternative

For developers who want to avoid the complexity of managing proxies, WebScraping.AI provides a scraping API that handles proxy rotation, JavaScript rendering, and anti-bot detection automatically:

// Function Node: Using WebScraping.AI API
const axios = require('axios');

const response = await axios.get('https://api.webscraping.ai/html', {
  params: {
    api_key: 'YOUR_API_KEY',
    url: 'https://example.com',
    js: true, // Enable JavaScript rendering
    proxy: 'residential' // Automatic proxy selection
  }
});

const html = response.data;

// Process HTML as needed
return [{ json: { html } }];

This approach eliminates the need to manage proxy pools, handle authentication, or deal with browser automation complexity, allowing you to focus on extracting and processing data in your n8n workflows.

Conclusion

Configuring proxies in n8n for web scraping provides essential capabilities for building robust, scalable scraping workflows. Whether using the built-in HTTP Request node, Puppeteer, or Playwright, proper proxy configuration helps avoid rate limits, access geo-restricted content, and maintain anonymity.

Key takeaways:

  • Use HTTP Request node proxies for simple API scraping
  • Implement Puppeteer or Playwright proxies for JavaScript-heavy sites
  • Always rotate proxies and implement retry logic
  • Monitor proxy health and performance
  • Combine proxies with other anti-detection techniques
  • Consider managed solutions like WebScraping.AI for production use

By following the examples and best practices in this guide, you can build reliable web scraping workflows that scale with your data extraction needs while respecting target websites and avoiding detection.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon