Table of contents

What are the best n8n templates for web scraping?

n8n templates are pre-built workflows that help developers quickly set up web scraping automation without building everything from scratch. These templates provide battle-tested patterns for common scraping scenarios, saving significant development time while following best practices for data extraction, error handling, and workflow orchestration.

Understanding n8n Templates for Web Scraping

n8n templates are JSON-based workflow definitions that can be imported directly into your n8n instance. They typically include multiple nodes configured to work together, handle edge cases, and demonstrate effective patterns for web scraping tasks. Templates range from simple single-page scrapers to complex multi-step workflows with data transformation, storage, and notification systems.

Top n8n Web Scraping Templates

1. Basic HTML Scraping Template

This foundational template extracts data from static HTML pages using CSS selectors.

Template Structure: - HTTP Request node: Fetches the target webpage - HTML Extract node: Parses HTML and extracts data - Set node: Formats extracted data - Storage node: Saves to database or spreadsheet

Implementation:

{
  "name": "Basic HTML Scraper",
  "nodes": [
    {
      "parameters": {
        "url": "={{$json[\"url\"]}}",
        "options": {
          "timeout": 30000
        }
      },
      "name": "HTTP Request",
      "type": "n8n-nodes-base.httpRequest",
      "position": [250, 300]
    },
    {
      "parameters": {
        "mode": "htmlExtract",
        "dataPropertyName": "data",
        "extractionValues": {
          "values": [
            {
              "key": "title",
              "cssSelector": "h1.product-title",
              "returnValue": "text"
            },
            {
              "key": "price",
              "cssSelector": ".price-value",
              "returnValue": "text"
            },
            {
              "key": "image",
              "cssSelector": "img.product-image",
              "returnValue": "attribute",
              "attribute": "src"
            }
          ]
        }
      },
      "name": "Extract Data",
      "type": "n8n-nodes-base.html",
      "position": [450, 300]
    }
  ],
  "connections": {
    "HTTP Request": {
      "main": [[{"node": "Extract Data", "type": "main", "index": 0}]]
    }
  }
}

Use Cases: - Product information scraping - Blog post extraction - News article collection - Real estate listings

2. JavaScript-Rendered Content Template

For modern websites using React, Vue, or Angular that require JavaScript execution.

Template Features: - WebScraping.AI API integration - JavaScript rendering enabled - Configurable wait times - Proxy support

{
  "name": "JS-Rendered Scraper",
  "nodes": [
    {
      "parameters": {
        "method": "GET",
        "url": "https://api.webscraping.ai/html",
        "authentication": "queryAuth",
        "queryParameters": {
          "parameters": [
            {
              "name": "api_key",
              "value": "={{$env.WEBSCRAPING_AI_KEY}}"
            },
            {
              "name": "url",
              "value": "={{$json[\"target_url\"]}}"
            },
            {
              "name": "js",
              "value": "true"
            },
            {
              "name": "js_timeout",
              "value": "5000"
            },
            {
              "name": "proxy",
              "value": "datacenter"
            }
          ]
        },
        "options": {
          "response": {
            "response": {
              "fullResponse": true
            }
          }
        }
      },
      "name": "Fetch with JS",
      "type": "n8n-nodes-base.httpRequest",
      "position": [250, 300]
    },
    {
      "parameters": {
        "jsCode": "const html = items[0].json.body;\nconst cheerio = require('cheerio');\nconst $ = cheerio.load(html);\n\nconst products = [];\n$('.product-card').each((i, el) => {\n  products.push({\n    name: $(el).find('.name').text().trim(),\n    price: $(el).find('.price').text().trim(),\n    rating: $(el).find('.rating').attr('data-rating'),\n    url: $(el).find('a').attr('href')\n  });\n});\n\nreturn products.map(p => ({json: p}));"
      },
      "name": "Parse HTML",
      "type": "n8n-nodes-base.code",
      "position": [450, 300]
    }
  ]
}

Use Cases: - Single-page applications (SPAs) - Social media content - Dynamic pricing pages - AJAX-loaded content

3. Pagination Scraping Template

This template handles multi-page scraping with automatic pagination detection and processing.

Workflow Logic:

// Initialize pagination
const startPage = 1;
const maxPages = 100;
let currentPage = startPage;
let allResults = [];

// Pagination loop function
function buildPaginationUrl(baseUrl, page) {
  return `${baseUrl}?page=${page}`;
}

// Check if next page exists
function hasNextPage(html) {
  const $ = cheerio.load(html);
  return $('.next-page').length > 0 || $('.pagination .next').length > 0;
}

// Main extraction logic
while (currentPage <= maxPages) {
  const url = buildPaginationUrl($json.base_url, currentPage);

  // Fetch page (implement via HTTP Request node)
  const html = await fetchPage(url);

  // Extract data (implement via HTML Extract or Code node)
  const pageData = extractData(html);
  allResults.push(...pageData);

  // Check for next page
  if (!hasNextPage(html)) break;

  currentPage++;

  // Rate limiting
  await sleep(1000);
}

return allResults.map(item => ({json: item}));

Template Configuration: 1. Loop node: Iterates through pages 2. HTTP Request node: Fetches each page 3. Code node: Extracts data and checks for next page 4. IF node: Determines whether to continue pagination 5. Merge node: Combines results from all pages

Use Cases: - E-commerce product catalogs - Search result scraping - Directory listings - Forum post extraction

4. Scheduled Monitoring Template

Automates regular data collection with change detection and notifications.

Template Structure:

{
  "name": "Price Monitor",
  "nodes": [
    {
      "parameters": {
        "triggerTimes": {
          "item": [
            {
              "mode": "everyHour",
              "hour": 6
            }
          ]
        }
      },
      "name": "Schedule Trigger",
      "type": "n8n-nodes-base.cron",
      "position": [100, 300]
    },
    {
      "parameters": {
        "operation": "executeQuery",
        "query": "SELECT id, url, last_price FROM products WHERE active = true"
      },
      "name": "Load Products",
      "type": "n8n-nodes-base.postgres",
      "position": [300, 300]
    },
    {
      "parameters": {
        "batchSize": 10,
        "options": {}
      },
      "name": "Split In Batches",
      "type": "n8n-nodes-base.splitInBatches",
      "position": [500, 300]
    },
    {
      "parameters": {
        "method": "GET",
        "url": "https://api.webscraping.ai/html",
        "queryParameters": {
          "parameters": [
            {
              "name": "api_key",
              "value": "={{$env.API_KEY}}"
            },
            {
              "name": "url",
              "value": "={{$json[\"url\"]}}"
            },
            {
              "name": "js",
              "value": "true"
            }
          ]
        }
      },
      "name": "Scrape Current Price",
      "type": "n8n-nodes-base.httpRequest",
      "position": [700, 300]
    },
    {
      "parameters": {
        "jsCode": "const html = items[0].json.body;\nconst $ = cheerio.load(html);\nconst currentPrice = parseFloat($('.price').text().replace(/[^0-9.]/g, ''));\nconst lastPrice = items[0].json.last_price;\nconst priceChange = ((currentPrice - lastPrice) / lastPrice * 100).toFixed(2);\n\nreturn [{\n  json: {\n    product_id: items[0].json.id,\n    url: items[0].json.url,\n    current_price: currentPrice,\n    last_price: lastPrice,\n    price_change_percent: priceChange,\n    alert: Math.abs(priceChange) > 10\n  }\n}];"
      },
      "name": "Compare Prices",
      "type": "n8n-nodes-base.code",
      "position": [900, 300]
    }
  ]
}

Use Cases: - Competitor price monitoring - Stock availability tracking - Content change detection - Job posting alerts

5. Multi-Source Aggregation Template

Collects data from multiple websites simultaneously and aggregates results.

Implementation Pattern:

// Define sources to scrape
const sources = [
  {
    name: 'Source A',
    url: 'https://example-a.com/products',
    selector: '.product-item'
  },
  {
    name: 'Source B',
    url: 'https://example-b.com/listings',
    selector: '.listing-card'
  },
  {
    name: 'Source C',
    url: 'https://example-c.com/items',
    selector: '.item-box'
  }
];

// Process each source in parallel
const results = await Promise.all(
  sources.map(async source => {
    const html = await fetchWithWebScrapingAI(source.url);
    return extractData(html, source.selector, source.name);
  })
);

// Aggregate and normalize
const aggregated = results.flat().map(item => ({
  json: {
    ...item,
    normalized_price: convertToUSD(item.price, item.currency),
    scraped_at: new Date().toISOString()
  }
}));

return aggregated;

Template Nodes: 1. Split In Batches: Divides sources into groups 2. HTTP Request: Fetches from each source 3. Code: Extracts and normalizes data 4. Merge: Combines all results 5. Dedupe: Removes duplicates 6. Sort: Orders by relevance or price

Use Cases: - Price comparison engines - Job board aggregation - News aggregation - Market research

6. API-First Scraping Template

Leverages WebScraping.AI's advanced features for production-grade scraping.

Advanced Configuration:

{
  "name": "API-First Scraper",
  "nodes": [
    {
      "parameters": {
        "method": "GET",
        "url": "https://api.webscraping.ai/selected",
        "queryParameters": {
          "parameters": [
            {
              "name": "api_key",
              "value": "={{$env.WEBSCRAPING_AI_KEY}}"
            },
            {
              "name": "url",
              "value": "={{$json[\"target_url\"]}}"
            },
            {
              "name": "selector",
              "value": "={{$json[\"css_selector\"]}}"
            },
            {
              "name": "js",
              "value": "true"
            },
            {
              "name": "proxy",
              "value": "residential"
            },
            {
              "name": "headers",
              "value": "={{JSON.stringify({'User-Agent': 'Custom-Agent'})}}
            },
            {
              "name": "timeout",
              "value": "20000"
            }
          ]
        },
        "options": {
          "response": {
            "response": {
              "neverError": true
            }
          }
        }
      },
      "name": "WebScraping.AI Selected",
      "type": "n8n-nodes-base.httpRequest",
      "position": [250, 300]
    }
  ]
}

Features: - Automatic proxy rotation - JavaScript rendering - Custom headers support - Timeout handling for reliable execution - Residential proxy support

7. AI-Powered Data Extraction Template

Uses WebScraping.AI's question-answering endpoint for intelligent data extraction.

Template Example:

{
  "name": "AI Extraction",
  "nodes": [
    {
      "parameters": {
        "method": "GET",
        "url": "https://api.webscraping.ai/question",
        "queryParameters": {
          "parameters": [
            {
              "name": "api_key",
              "value": "={{$env.API_KEY}}"
            },
            {
              "name": "url",
              "value": "={{$json[\"product_url\"]}}"
            },
            {
              "name": "question",
              "value": "Extract the product name, price in USD, brand name, availability status, shipping cost, and customer rating. Return as JSON."
            }
          ]
        }
      },
      "name": "AI Extract",
      "type": "n8n-nodes-base.httpRequest",
      "position": [250, 300]
    },
    {
      "parameters": {
        "jsCode": "const answer = JSON.parse(items[0].json.body);\n\nreturn [{\n  json: {\n    product_name: answer.product_name,\n    price_usd: parseFloat(answer.price),\n    brand: answer.brand,\n    in_stock: answer.availability === 'in stock',\n    shipping_cost: parseFloat(answer.shipping || 0),\n    rating: parseFloat(answer.rating),\n    extracted_at: new Date().toISOString()\n  }\n}];"
      },
      "name": "Parse AI Response",
      "type": "n8n-nodes-base.code",
      "position": [450, 300]
    }
  ]
}

Use Cases: - Unstructured data extraction - Multi-format content parsing - Complex data relationships - Schema-less scraping

8. Error Handling and Retry Template

Robust template with comprehensive error handling, retries, and fallback mechanisms.

Error Workflow Structure:

// Retry configuration
const MAX_RETRIES = 3;
const BACKOFF_MULTIPLIER = 2;
let retryCount = 0;
let lastError = null;

async function scrapeWithRetry(url) {
  while (retryCount < MAX_RETRIES) {
    try {
      const response = await fetchUrl(url);

      if (response.statusCode === 200) {
        return response.body;
      } else if (response.statusCode === 429) {
        // Rate limit - exponential backoff
        const waitTime = Math.pow(BACKOFF_MULTIPLIER, retryCount) * 1000;
        await sleep(waitTime);
        retryCount++;
        continue;
      } else if (response.statusCode >= 500) {
        // Server error - retry
        retryCount++;
        await sleep(2000);
        continue;
      } else {
        // Client error - don't retry
        throw new Error(`HTTP ${response.statusCode}: ${response.statusMessage}`);
      }
    } catch (error) {
      lastError = error;
      retryCount++;

      if (retryCount >= MAX_RETRIES) {
        return {
          json: {
            success: false,
            url: url,
            error: error.message,
            retries: retryCount,
            timestamp: new Date().toISOString()
          }
        };
      }

      await sleep(1000 * retryCount);
    }
  }
}

Template Components: - Error trigger node - Retry logic with exponential backoff - Error logging to database - Slack/Email notifications - Fallback data sources

Use Cases: - Production scraping workflows - High-reliability data collection - Mission-critical monitoring

9. Data Validation and Cleaning Template

Ensures data quality with validation, cleaning, and normalization steps.

Validation Logic:

function validateAndCleanProduct(item) {
  const errors = [];
  const cleaned = {};

  // Validate and clean name
  if (!item.name || item.name.trim().length === 0) {
    errors.push('Missing product name');
  } else {
    cleaned.name = item.name.trim()
      .replace(/\s+/g, ' ')
      .substring(0, 255);
  }

  // Validate and clean price
  const priceMatch = item.price?.match(/[\d.,]+/);
  if (priceMatch) {
    cleaned.price = parseFloat(priceMatch[0].replace(',', '.'));
    if (cleaned.price <= 0 || cleaned.price > 1000000) {
      errors.push('Invalid price range');
    }
  } else {
    errors.push('Invalid price format');
  }

  // Validate URL
  try {
    cleaned.url = new URL(item.url).href;
  } catch (e) {
    errors.push('Invalid URL format');
  }

  // Validate email if present
  if (item.email) {
    const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
    if (emailRegex.test(item.email)) {
      cleaned.email = item.email.toLowerCase();
    } else {
      errors.push('Invalid email format');
    }
  }

  // Clean and validate phone
  if (item.phone) {
    cleaned.phone = item.phone.replace(/\D/g, '');
    if (cleaned.phone.length < 10) {
      errors.push('Invalid phone number');
    }
  }

  return {
    json: {
      ...cleaned,
      is_valid: errors.length === 0,
      validation_errors: errors,
      validated_at: new Date().toISOString()
    }
  };
}

// Apply validation to all items
return items.map(item => validateAndCleanProduct(item.json));

Template Features: - Schema validation - Data type conversion - Format normalization - Duplicate detection - Quality scoring

10. Webhook-Triggered Scraping Template

On-demand scraping triggered by external systems or user requests.

Webhook Configuration:

{
  "name": "Webhook Scraper",
  "nodes": [
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "scrape-product",
        "authentication": "headerAuth",
        "options": {
          "rawBody": true
        }
      },
      "name": "Webhook",
      "type": "n8n-nodes-base.webhook",
      "webhookId": "scrape-webhook",
      "position": [100, 300]
    },
    {
      "parameters": {
        "jsCode": "const payload = items[0].json.body;\n\n// Validate request\nif (!payload.url) {\n  throw new Error('URL is required');\n}\n\n// Parse options\nreturn [{\n  json: {\n    target_url: payload.url,\n    selector: payload.selector || 'body',\n    js_render: payload.js_render !== false,\n    proxy_type: payload.proxy || 'datacenter',\n    webhook_id: payload.request_id || Date.now()\n  }\n}];"
      },
      "name": "Parse Request",
      "type": "n8n-nodes-base.code",
      "position": [300, 300]
    }
  ]
}

API Usage:

# Trigger scraping via webhook
curl -X POST https://your-n8n.com/webhook/scrape-product \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/12345",
    "selector": ".product-details",
    "js_render": true,
    "proxy": "residential",
    "request_id": "req_abc123"
  }'

Python Integration:

import requests

def trigger_scrape(url, selector=None):
    webhook_url = "https://your-n8n.com/webhook/scrape-product"
    headers = {
        "Authorization": "Bearer YOUR_TOKEN",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "selector": selector or "body",
        "js_render": True
    }

    response = requests.post(webhook_url, json=payload, headers=headers)
    return response.json()

# Use the function
result = trigger_scrape("https://example.com/product/123", ".product-info")
print(result)

JavaScript Integration:

async function triggerScrape(url, options = {}) {
  const webhookUrl = 'https://your-n8n.com/webhook/scrape-product';

  const response = await fetch(webhookUrl, {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_TOKEN',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: url,
      selector: options.selector || 'body',
      js_render: options.jsRender !== false,
      proxy: options.proxy || 'datacenter'
    })
  });

  return await response.json();
}

// Usage
const result = await triggerScrape('https://example.com/product', {
  selector: '.product-details',
  jsRender: true,
  proxy: 'residential'
});

Template Customization Best Practices

Environment Variables

Store sensitive data securely:

// Access environment variables
const apiKey = $env.WEBSCRAPING_AI_KEY;
const dbHost = $env.DATABASE_HOST;
const slackWebhook = $env.SLACK_WEBHOOK_URL;

// Use in nodes
{
  "parameters": {
    "queryParameters": {
      "parameters": [
        {
          "name": "api_key",
          "value": "={{$env.WEBSCRAPING_AI_KEY}}"
        }
      ]
    }
  }
}

Dynamic Configuration

Make templates reusable with dynamic inputs:

// Load configuration from external source
const config = await fetch('https://api.example.com/scrape-config')
  .then(r => r.json());

return [{
  json: {
    urls: config.target_urls,
    selectors: config.css_selectors,
    schedule: config.cron_schedule,
    notifications: config.notification_settings
  }
}];

Modular Sub-Workflows

Break complex templates into reusable components:

  1. Data Fetching Module: HTTP requests and error handling
  2. Data Extraction Module: Parsing and extraction logic
  3. Data Validation Module: Validation and cleaning
  4. Storage Module: Database/file storage operations
  5. Notification Module: Alerts and reporting

Importing and Using Templates

Import from JSON

# Export workflow to JSON
n8n export:workflow --id=123 --output=my-template.json

# Import workflow from JSON
n8n import:workflow --input=my-template.json

Share Templates

# Create shareable template URL
# Go to n8n UI -> Workflow -> Share
# Copy the template URL
# Share: https://n8n.io/workflows/1234

Version Control

# Initialize git repository
git init
git add workflows/*.json
git commit -m "Add scraping templates"

# Push to repository
git remote add origin https://github.com/yourname/n8n-templates
git push -u origin main

Performance Optimization Tips

Batch Processing

Process items in batches to improve efficiency:

// Process 50 items at a time
{
  "parameters": {
    "batchSize": 50,
    "options": {
      "reset": false
    }
  },
  "name": "Split In Batches",
  "type": "n8n-nodes-base.splitInBatches"
}

Parallel Execution

Enable parallel processing for independent tasks:

{
  "settings": {
    "executionOrder": "v1"
  },
  "nodes": [
    // Nodes will execute in parallel when possible
  ]
}

Caching

Implement caching to reduce redundant requests:

// Check cache before scraping
const cacheKey = `scrape_${url}_${Date.now() / 1000 / 3600 | 0}`;
const cached = await redis.get(cacheKey);

if (cached) {
  return JSON.parse(cached);
}

// Scrape and cache
const data = await scrapeUrl(url);
await redis.setex(cacheKey, 3600, JSON.stringify(data));

return data;

Advanced Template Features

Dynamic Waiting

Similar to handling authentication in Puppeteer, proper wait strategies are crucial:

// Wait for specific elements
{
  "parameters": {
    "queryParameters": {
      "parameters": [
        {
          "name": "wait_for",
          "value": ".product-loaded"
        },
        {
          "name": "js_timeout",
          "value": "10000"
        }
      ]
    }
  }
}

Custom Headers and Authentication

// Configure custom headers
const headers = {
  'User-Agent': 'Mozilla/5.0 (Custom Bot)',
  'Accept-Language': 'en-US,en;q=0.9',
  'Accept-Encoding': 'gzip, deflate, br',
  'Referer': 'https://example.com'
};

{
  "parameters": {
    "headerParameters": {
      "parameters": Object.entries(headers).map(([name, value]) => ({
        name, value
      }))
    }
  }
}

Proxy Rotation

// Rotate between different proxy types
const proxyTypes = ['datacenter', 'residential'];
const proxyCountries = ['us', 'gb', 'de'];

const proxy = proxyTypes[Math.floor(Math.random() * proxyTypes.length)];
const country = proxyCountries[Math.floor(Math.random() * proxyCountries.length)];

{
  "parameters": {
    "queryParameters": {
      "parameters": [
        {"name": "proxy", "value": proxy},
        {"name": "country", "value": country}
      ]
    }
  }
}

Monitoring and Debugging Templates

Execution Logging

// Add detailed logging
const executionLog = {
  workflow_id: $workflow.id,
  workflow_name: $workflow.name,
  execution_id: $execution.id,
  started_at: $execution.startedAt,
  node_name: $node.name,
  items_count: items.length,
  timestamp: new Date().toISOString()
};

console.log('Execution Log:', JSON.stringify(executionLog, null, 2));

// Continue processing
return items;

Performance Metrics

// Track performance metrics
const startTime = Date.now();

// ... scraping logic ...

const endTime = Date.now();
const metrics = {
  duration_ms: endTime - startTime,
  items_processed: items.length,
  items_per_second: items.length / ((endTime - startTime) / 1000),
  success_rate: successCount / items.length * 100
};

return [{ json: metrics }];

Conclusion

The best n8n templates for web scraping provide flexible, production-ready solutions for various data extraction scenarios. Whether you need simple HTML scraping, JavaScript-rendered content extraction, pagination handling, or complex multi-source aggregation, these templates offer proven patterns that save development time while ensuring reliability and maintainability.

By combining these templates with WebScraping.AI's robust API, you can build sophisticated scraping workflows that handle proxies, JavaScript rendering, and anti-bot measures automatically. Start with a template that matches your use case, customize it for your specific needs, and scale your data extraction operations efficiently.

For more complex scenarios involving browser automation, consider exploring how to monitor network requests in Puppeteer to enhance your debugging capabilities.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon