Table of contents

What are Common Errors in n8n Scraping and How to Fix Them?

Web scraping with n8n can be incredibly powerful for automating data extraction, but like any automation tool, it comes with its share of challenges. Understanding common errors and their solutions is crucial for building reliable scraping workflows. This comprehensive guide covers the most frequent n8n scraping errors and provides practical solutions to resolve them.

1. Timeout Errors

The Problem

Timeout errors are among the most common issues in n8n scraping workflows. They occur when a request takes longer than the configured timeout period to complete.

Error: Request timed out after 30000ms

Root Causes

  • Slow server response times
  • Heavy JavaScript execution on dynamic pages
  • Network connectivity issues
  • Insufficient timeout configuration

Solutions

Increase Timeout Values

In your HTTP Request node or Puppeteer node, increase the timeout setting:

// In n8n Code Node
const options = {
  timeout: 60000, // Increase to 60 seconds
  waitUntil: 'networkidle2'
};

Implement Retry Logic

Add an error workflow that retries failed requests with exponential backoff:

// Retry configuration in n8n
const maxRetries = 3;
const retryDelay = 2000;

for (let i = 0; i < maxRetries; i++) {
  try {
    // Your scraping code here
    break;
  } catch (error) {
    if (i === maxRetries - 1) throw error;
    await new Promise(resolve => setTimeout(resolve, retryDelay * (i + 1)));
  }
}

Optimize Page Load Strategy

When using Puppeteer nodes, adjust the wait strategy to avoid unnecessarily long waits:

await page.goto(url, {
  waitUntil: 'domcontentloaded', // Faster than 'networkidle2'
  timeout: 30000
});

Similar to handling timeouts in Puppeteer, properly configuring wait conditions can significantly reduce timeout errors.

2. Element Not Found / Selector Errors

The Problem

This error occurs when n8n cannot locate elements on the page using your CSS or XPath selectors:

Error: Element not found for selector: .product-price

Root Causes

  • Incorrect or outdated selectors
  • Elements loaded dynamically via JavaScript
  • Content rendered inside iframes
  • Changes to website structure

Solutions

Wait for Elements to Load

Ensure elements are present before attempting to extract data:

// In n8n Code Node with Puppeteer
await page.waitForSelector('.product-price', {
  visible: true,
  timeout: 10000
});

const price = await page.$eval('.product-price', el => el.textContent);

Use More Robust Selectors

Prefer stable attributes over dynamic class names:

// Instead of: .css-abc123-product
// Use: [data-testid="product-price"]
// Or: #product-price

const price = await page.$eval('[data-testid="product-price"]', el => el.textContent);

Handle Dynamic Content

For single-page applications and AJAX-loaded content:

// Wait for network to be idle
await page.waitForNetworkIdle({
  timeout: 5000,
  idleTime: 500
});

// Or wait for specific XHR completion
await page.waitForFunction(() => {
  return document.querySelector('.product-price') !== null;
});

When dealing with content in iframes, you'll need to handle iframes in Puppeteer by first accessing the frame context.

Add Error Handling

Implement fallback selectors:

// Try multiple selector strategies
let price;
try {
  price = await page.$eval('.product-price', el => el.textContent);
} catch (e) {
  try {
    price = await page.$eval('#price', el => el.textContent);
  } catch (e) {
    price = await page.$eval('[itemprop="price"]', el => el.textContent);
  }
}

3. Authentication and Access Denied Errors

The Problem

Error: 403 Forbidden
Error: 401 Unauthorized
Error: Access Denied

Root Causes

  • Missing or incorrect authentication credentials
  • Required cookies or session tokens
  • IP-based restrictions
  • User-agent blocking

Solutions

Set Proper Headers

Configure realistic browser headers in your HTTP Request node:

const headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Language': 'en-US,en;q=0.5',
  'Accept-Encoding': 'gzip, deflate, br',
  'Referer': 'https://example.com',
  'Connection': 'keep-alive'
};

Handle Authentication

For sites requiring login, properly manage session cookies:

// Login workflow
await page.goto('https://example.com/login');
await page.type('#username', credentials.username);
await page.type('#password', credentials.password);
await page.click('button[type="submit"]');
await page.waitForNavigation();

// Save cookies for subsequent requests
const cookies = await page.cookies();
// Store cookies in n8n credentials or static data

For detailed authentication strategies, refer to handling authentication in Puppeteer.

Use Proxy Servers

When facing IP-based restrictions, configure proxy settings in your n8n workflow to rotate IP addresses.

4. Rate Limiting and Blocking

The Problem

Error: 429 Too Many Requests
Error: Your requests are being blocked

Root Causes

  • Too many requests in a short timeframe
  • Lack of delays between requests
  • Bot detection systems

Solutions

Add Delays Between Requests

Implement wait times to simulate human behavior:

// In n8n Code Node
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));

// Add random delay between 2-5 seconds
const randomDelay = Math.floor(Math.random() * 3000) + 2000;
await delay(randomDelay);

Implement Request Throttling

Use n8n's Loop node with delays to control request rate:

// Process items with controlled rate
for (const item of items) {
  // Scrape item
  await scrapeItem(item);

  // Wait before next request
  await new Promise(resolve => setTimeout(resolve, 3000));
}

Rotate User Agents

Randomize user agents to appear as different browsers:

const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) Firefox/89.0'
];

const randomUA = userAgents[Math.floor(Math.random() * userAgents.length)];
await page.setUserAgent(randomUA);

5. Memory and Performance Issues

The Problem

Error: JavaScript heap out of memory
Error: Workflow execution timed out

Root Causes

  • Processing too much data at once
  • Memory leaks in long-running workflows
  • Not closing browser instances properly

Solutions

Process Data in Batches

Split large datasets into smaller chunks:

// Process in batches of 10
const batchSize = 10;
const results = [];

for (let i = 0; i < items.length; i += batchSize) {
  const batch = items.slice(i, i + batchSize);
  const batchResults = await processBatch(batch);
  results.push(...batchResults);

  // Optional: Add delay between batches
  await new Promise(resolve => setTimeout(resolve, 1000));
}

return results;

Close Browser Instances

Always close Puppeteer browser instances to free memory:

let browser;
try {
  browser = await puppeteer.launch();
  const page = await browser.newPage();
  // Your scraping code
} finally {
  if (browser) {
    await browser.close();
  }
}

Optimize Data Extraction

Only extract the data you need:

// Instead of returning entire HTML
const data = await page.evaluate(() => {
  return {
    title: document.querySelector('h1').textContent.trim(),
    price: document.querySelector('.price').textContent.trim()
  };
});

6. JSON Parsing Errors

The Problem

Error: Unexpected token < in JSON at position 0
Error: Cannot parse JSON response

Root Causes

  • Receiving HTML instead of JSON
  • Malformed JSON responses
  • Content-type mismatches

Solutions

Validate Response Type

Check the content type before parsing:

const response = await fetch(url);
const contentType = response.headers.get('content-type');

if (contentType && contentType.includes('application/json')) {
  const data = await response.json();
} else {
  // Handle HTML or other response types
  const text = await response.text();
  console.error('Received non-JSON response:', text.substring(0, 200));
}

Add Try-Catch for Parsing

Gracefully handle parsing errors:

let data;
try {
  data = JSON.parse(responseText);
} catch (e) {
  console.error('JSON parsing failed:', e.message);
  // Try to extract JSON from HTML
  const jsonMatch = responseText.match(/<script[^>]*>({[\s\S]*?})<\/script>/);
  if (jsonMatch) {
    data = JSON.parse(jsonMatch[1]);
  } else {
    throw new Error('Unable to parse response as JSON');
  }
}

7. XPath and CSS Selector Errors

The Problem

Error: Invalid selector
Error: XPath expression is not valid

Solutions

Test Selectors in Browser Console

Before using in n8n, test your selectors:

// CSS Selector test
document.querySelector('.product-name')

// XPath test
$x('//div[@class="product-name"]')

Use Proper Escaping

Escape special characters in selectors:

// For class names with special characters
const element = await page.$('[class*="price-value"]');

// For attributes with quotes
const element = await page.$('div[data-product="Product \\"Name\\""]');

Validate XPath Syntax

Use correct XPath syntax in n8n:

// Correct XPath in n8n
const xpath = '//div[@class="product"]//span[@class="price"]/text()';
const elements = await page.$x(xpath);

Best Practices for Error Prevention

  1. Implement Comprehensive Error Handling: Wrap all scraping operations in try-catch blocks
  2. Use Wait Strategies: Always wait for elements before interaction
  3. Log Detailed Information: Include timestamps, URLs, and error messages in logs
  4. Test Incrementally: Build and test workflows step-by-step
  5. Monitor Workflow Execution: Set up alerts for failed executions
  6. Keep Selectors Updated: Regularly review and update selectors when sites change
  7. Use Webhook Triggers Carefully: Implement proper validation and rate limiting

Debugging Tips

Enable Verbose Logging

console.log('Starting scraping process...');
console.log('URL:', url);
console.log('Current step:', stepName);
console.log('Data extracted:', JSON.stringify(data, null, 2));

Take Screenshots on Errors

try {
  // Your scraping code
} catch (error) {
  await page.screenshot({ path: 'error-screenshot.png' });
  throw error;
}

Use n8n's Error Workflow

Configure an error workflow to catch and handle all workflow errors, sending notifications or logging to external systems.

Conclusion

Understanding and resolving common n8n scraping errors is essential for building robust automation workflows. By implementing proper error handling, using appropriate wait strategies, respecting rate limits, and following best practices, you can create reliable scraping workflows that handle edge cases gracefully.

Remember that web scraping is inherently fragile due to changing website structures and anti-bot measures. Regular maintenance, monitoring, and updates to your n8n workflows will ensure long-term reliability. Start with small, well-tested workflows and gradually expand functionality while maintaining comprehensive error handling at each step.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon