What are the best n8n templates for web scraping?
n8n templates are pre-built workflows that help developers quickly set up web scraping automation without building everything from scratch. These templates provide battle-tested patterns for common scraping scenarios, saving significant development time while following best practices for data extraction, error handling, and workflow orchestration.
Understanding n8n Templates for Web Scraping
n8n templates are JSON-based workflow definitions that can be imported directly into your n8n instance. They typically include multiple nodes configured to work together, handle edge cases, and demonstrate effective patterns for web scraping tasks. Templates range from simple single-page scrapers to complex multi-step workflows with data transformation, storage, and notification systems.
Top n8n Web Scraping Templates
1. Basic HTML Scraping Template
This foundational template extracts data from static HTML pages using CSS selectors.
Template Structure: - HTTP Request node: Fetches the target webpage - HTML Extract node: Parses HTML and extracts data - Set node: Formats extracted data - Storage node: Saves to database or spreadsheet
Implementation:
{
"name": "Basic HTML Scraper",
"nodes": [
{
"parameters": {
"url": "={{$json[\"url\"]}}",
"options": {
"timeout": 30000
}
},
"name": "HTTP Request",
"type": "n8n-nodes-base.httpRequest",
"position": [250, 300]
},
{
"parameters": {
"mode": "htmlExtract",
"dataPropertyName": "data",
"extractionValues": {
"values": [
{
"key": "title",
"cssSelector": "h1.product-title",
"returnValue": "text"
},
{
"key": "price",
"cssSelector": ".price-value",
"returnValue": "text"
},
{
"key": "image",
"cssSelector": "img.product-image",
"returnValue": "attribute",
"attribute": "src"
}
]
}
},
"name": "Extract Data",
"type": "n8n-nodes-base.html",
"position": [450, 300]
}
],
"connections": {
"HTTP Request": {
"main": [[{"node": "Extract Data", "type": "main", "index": 0}]]
}
}
}
Use Cases: - Product information scraping - Blog post extraction - News article collection - Real estate listings
2. JavaScript-Rendered Content Template
For modern websites using React, Vue, or Angular that require JavaScript execution.
Template Features: - WebScraping.AI API integration - JavaScript rendering enabled - Configurable wait times - Proxy support
{
"name": "JS-Rendered Scraper",
"nodes": [
{
"parameters": {
"method": "GET",
"url": "https://api.webscraping.ai/html",
"authentication": "queryAuth",
"queryParameters": {
"parameters": [
{
"name": "api_key",
"value": "={{$env.WEBSCRAPING_AI_KEY}}"
},
{
"name": "url",
"value": "={{$json[\"target_url\"]}}"
},
{
"name": "js",
"value": "true"
},
{
"name": "js_timeout",
"value": "5000"
},
{
"name": "proxy",
"value": "datacenter"
}
]
},
"options": {
"response": {
"response": {
"fullResponse": true
}
}
}
},
"name": "Fetch with JS",
"type": "n8n-nodes-base.httpRequest",
"position": [250, 300]
},
{
"parameters": {
"jsCode": "const html = items[0].json.body;\nconst cheerio = require('cheerio');\nconst $ = cheerio.load(html);\n\nconst products = [];\n$('.product-card').each((i, el) => {\n products.push({\n name: $(el).find('.name').text().trim(),\n price: $(el).find('.price').text().trim(),\n rating: $(el).find('.rating').attr('data-rating'),\n url: $(el).find('a').attr('href')\n });\n});\n\nreturn products.map(p => ({json: p}));"
},
"name": "Parse HTML",
"type": "n8n-nodes-base.code",
"position": [450, 300]
}
]
}
Use Cases: - Single-page applications (SPAs) - Social media content - Dynamic pricing pages - AJAX-loaded content
3. Pagination Scraping Template
This template handles multi-page scraping with automatic pagination detection and processing.
Workflow Logic:
// Initialize pagination
const startPage = 1;
const maxPages = 100;
let currentPage = startPage;
let allResults = [];
// Pagination loop function
function buildPaginationUrl(baseUrl, page) {
return `${baseUrl}?page=${page}`;
}
// Check if next page exists
function hasNextPage(html) {
const $ = cheerio.load(html);
return $('.next-page').length > 0 || $('.pagination .next').length > 0;
}
// Main extraction logic
while (currentPage <= maxPages) {
const url = buildPaginationUrl($json.base_url, currentPage);
// Fetch page (implement via HTTP Request node)
const html = await fetchPage(url);
// Extract data (implement via HTML Extract or Code node)
const pageData = extractData(html);
allResults.push(...pageData);
// Check for next page
if (!hasNextPage(html)) break;
currentPage++;
// Rate limiting
await sleep(1000);
}
return allResults.map(item => ({json: item}));
Template Configuration: 1. Loop node: Iterates through pages 2. HTTP Request node: Fetches each page 3. Code node: Extracts data and checks for next page 4. IF node: Determines whether to continue pagination 5. Merge node: Combines results from all pages
Use Cases: - E-commerce product catalogs - Search result scraping - Directory listings - Forum post extraction
4. Scheduled Monitoring Template
Automates regular data collection with change detection and notifications.
Template Structure:
{
"name": "Price Monitor",
"nodes": [
{
"parameters": {
"triggerTimes": {
"item": [
{
"mode": "everyHour",
"hour": 6
}
]
}
},
"name": "Schedule Trigger",
"type": "n8n-nodes-base.cron",
"position": [100, 300]
},
{
"parameters": {
"operation": "executeQuery",
"query": "SELECT id, url, last_price FROM products WHERE active = true"
},
"name": "Load Products",
"type": "n8n-nodes-base.postgres",
"position": [300, 300]
},
{
"parameters": {
"batchSize": 10,
"options": {}
},
"name": "Split In Batches",
"type": "n8n-nodes-base.splitInBatches",
"position": [500, 300]
},
{
"parameters": {
"method": "GET",
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{
"name": "api_key",
"value": "={{$env.API_KEY}}"
},
{
"name": "url",
"value": "={{$json[\"url\"]}}"
},
{
"name": "js",
"value": "true"
}
]
}
},
"name": "Scrape Current Price",
"type": "n8n-nodes-base.httpRequest",
"position": [700, 300]
},
{
"parameters": {
"jsCode": "const html = items[0].json.body;\nconst $ = cheerio.load(html);\nconst currentPrice = parseFloat($('.price').text().replace(/[^0-9.]/g, ''));\nconst lastPrice = items[0].json.last_price;\nconst priceChange = ((currentPrice - lastPrice) / lastPrice * 100).toFixed(2);\n\nreturn [{\n json: {\n product_id: items[0].json.id,\n url: items[0].json.url,\n current_price: currentPrice,\n last_price: lastPrice,\n price_change_percent: priceChange,\n alert: Math.abs(priceChange) > 10\n }\n}];"
},
"name": "Compare Prices",
"type": "n8n-nodes-base.code",
"position": [900, 300]
}
]
}
Use Cases: - Competitor price monitoring - Stock availability tracking - Content change detection - Job posting alerts
5. Multi-Source Aggregation Template
Collects data from multiple websites simultaneously and aggregates results.
Implementation Pattern:
// Define sources to scrape
const sources = [
{
name: 'Source A',
url: 'https://example-a.com/products',
selector: '.product-item'
},
{
name: 'Source B',
url: 'https://example-b.com/listings',
selector: '.listing-card'
},
{
name: 'Source C',
url: 'https://example-c.com/items',
selector: '.item-box'
}
];
// Process each source in parallel
const results = await Promise.all(
sources.map(async source => {
const html = await fetchWithWebScrapingAI(source.url);
return extractData(html, source.selector, source.name);
})
);
// Aggregate and normalize
const aggregated = results.flat().map(item => ({
json: {
...item,
normalized_price: convertToUSD(item.price, item.currency),
scraped_at: new Date().toISOString()
}
}));
return aggregated;
Template Nodes: 1. Split In Batches: Divides sources into groups 2. HTTP Request: Fetches from each source 3. Code: Extracts and normalizes data 4. Merge: Combines all results 5. Dedupe: Removes duplicates 6. Sort: Orders by relevance or price
Use Cases: - Price comparison engines - Job board aggregation - News aggregation - Market research
6. API-First Scraping Template
Leverages WebScraping.AI's advanced features for production-grade scraping.
Advanced Configuration:
{
"name": "API-First Scraper",
"nodes": [
{
"parameters": {
"method": "GET",
"url": "https://api.webscraping.ai/selected",
"queryParameters": {
"parameters": [
{
"name": "api_key",
"value": "={{$env.WEBSCRAPING_AI_KEY}}"
},
{
"name": "url",
"value": "={{$json[\"target_url\"]}}"
},
{
"name": "selector",
"value": "={{$json[\"css_selector\"]}}"
},
{
"name": "js",
"value": "true"
},
{
"name": "proxy",
"value": "residential"
},
{
"name": "headers",
"value": "={{JSON.stringify({'User-Agent': 'Custom-Agent'})}}
},
{
"name": "timeout",
"value": "20000"
}
]
},
"options": {
"response": {
"response": {
"neverError": true
}
}
}
},
"name": "WebScraping.AI Selected",
"type": "n8n-nodes-base.httpRequest",
"position": [250, 300]
}
]
}
Features: - Automatic proxy rotation - JavaScript rendering - Custom headers support - Timeout handling for reliable execution - Residential proxy support
7. AI-Powered Data Extraction Template
Uses WebScraping.AI's question-answering endpoint for intelligent data extraction.
Template Example:
{
"name": "AI Extraction",
"nodes": [
{
"parameters": {
"method": "GET",
"url": "https://api.webscraping.ai/question",
"queryParameters": {
"parameters": [
{
"name": "api_key",
"value": "={{$env.API_KEY}}"
},
{
"name": "url",
"value": "={{$json[\"product_url\"]}}"
},
{
"name": "question",
"value": "Extract the product name, price in USD, brand name, availability status, shipping cost, and customer rating. Return as JSON."
}
]
}
},
"name": "AI Extract",
"type": "n8n-nodes-base.httpRequest",
"position": [250, 300]
},
{
"parameters": {
"jsCode": "const answer = JSON.parse(items[0].json.body);\n\nreturn [{\n json: {\n product_name: answer.product_name,\n price_usd: parseFloat(answer.price),\n brand: answer.brand,\n in_stock: answer.availability === 'in stock',\n shipping_cost: parseFloat(answer.shipping || 0),\n rating: parseFloat(answer.rating),\n extracted_at: new Date().toISOString()\n }\n}];"
},
"name": "Parse AI Response",
"type": "n8n-nodes-base.code",
"position": [450, 300]
}
]
}
Use Cases: - Unstructured data extraction - Multi-format content parsing - Complex data relationships - Schema-less scraping
8. Error Handling and Retry Template
Robust template with comprehensive error handling, retries, and fallback mechanisms.
Error Workflow Structure:
// Retry configuration
const MAX_RETRIES = 3;
const BACKOFF_MULTIPLIER = 2;
let retryCount = 0;
let lastError = null;
async function scrapeWithRetry(url) {
while (retryCount < MAX_RETRIES) {
try {
const response = await fetchUrl(url);
if (response.statusCode === 200) {
return response.body;
} else if (response.statusCode === 429) {
// Rate limit - exponential backoff
const waitTime = Math.pow(BACKOFF_MULTIPLIER, retryCount) * 1000;
await sleep(waitTime);
retryCount++;
continue;
} else if (response.statusCode >= 500) {
// Server error - retry
retryCount++;
await sleep(2000);
continue;
} else {
// Client error - don't retry
throw new Error(`HTTP ${response.statusCode}: ${response.statusMessage}`);
}
} catch (error) {
lastError = error;
retryCount++;
if (retryCount >= MAX_RETRIES) {
return {
json: {
success: false,
url: url,
error: error.message,
retries: retryCount,
timestamp: new Date().toISOString()
}
};
}
await sleep(1000 * retryCount);
}
}
}
Template Components: - Error trigger node - Retry logic with exponential backoff - Error logging to database - Slack/Email notifications - Fallback data sources
Use Cases: - Production scraping workflows - High-reliability data collection - Mission-critical monitoring
9. Data Validation and Cleaning Template
Ensures data quality with validation, cleaning, and normalization steps.
Validation Logic:
function validateAndCleanProduct(item) {
const errors = [];
const cleaned = {};
// Validate and clean name
if (!item.name || item.name.trim().length === 0) {
errors.push('Missing product name');
} else {
cleaned.name = item.name.trim()
.replace(/\s+/g, ' ')
.substring(0, 255);
}
// Validate and clean price
const priceMatch = item.price?.match(/[\d.,]+/);
if (priceMatch) {
cleaned.price = parseFloat(priceMatch[0].replace(',', '.'));
if (cleaned.price <= 0 || cleaned.price > 1000000) {
errors.push('Invalid price range');
}
} else {
errors.push('Invalid price format');
}
// Validate URL
try {
cleaned.url = new URL(item.url).href;
} catch (e) {
errors.push('Invalid URL format');
}
// Validate email if present
if (item.email) {
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
if (emailRegex.test(item.email)) {
cleaned.email = item.email.toLowerCase();
} else {
errors.push('Invalid email format');
}
}
// Clean and validate phone
if (item.phone) {
cleaned.phone = item.phone.replace(/\D/g, '');
if (cleaned.phone.length < 10) {
errors.push('Invalid phone number');
}
}
return {
json: {
...cleaned,
is_valid: errors.length === 0,
validation_errors: errors,
validated_at: new Date().toISOString()
}
};
}
// Apply validation to all items
return items.map(item => validateAndCleanProduct(item.json));
Template Features: - Schema validation - Data type conversion - Format normalization - Duplicate detection - Quality scoring
10. Webhook-Triggered Scraping Template
On-demand scraping triggered by external systems or user requests.
Webhook Configuration:
{
"name": "Webhook Scraper",
"nodes": [
{
"parameters": {
"httpMethod": "POST",
"path": "scrape-product",
"authentication": "headerAuth",
"options": {
"rawBody": true
}
},
"name": "Webhook",
"type": "n8n-nodes-base.webhook",
"webhookId": "scrape-webhook",
"position": [100, 300]
},
{
"parameters": {
"jsCode": "const payload = items[0].json.body;\n\n// Validate request\nif (!payload.url) {\n throw new Error('URL is required');\n}\n\n// Parse options\nreturn [{\n json: {\n target_url: payload.url,\n selector: payload.selector || 'body',\n js_render: payload.js_render !== false,\n proxy_type: payload.proxy || 'datacenter',\n webhook_id: payload.request_id || Date.now()\n }\n}];"
},
"name": "Parse Request",
"type": "n8n-nodes-base.code",
"position": [300, 300]
}
]
}
API Usage:
# Trigger scraping via webhook
curl -X POST https://your-n8n.com/webhook/scrape-product \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/12345",
"selector": ".product-details",
"js_render": true,
"proxy": "residential",
"request_id": "req_abc123"
}'
Python Integration:
import requests
def trigger_scrape(url, selector=None):
webhook_url = "https://your-n8n.com/webhook/scrape-product"
headers = {
"Authorization": "Bearer YOUR_TOKEN",
"Content-Type": "application/json"
}
payload = {
"url": url,
"selector": selector or "body",
"js_render": True
}
response = requests.post(webhook_url, json=payload, headers=headers)
return response.json()
# Use the function
result = trigger_scrape("https://example.com/product/123", ".product-info")
print(result)
JavaScript Integration:
async function triggerScrape(url, options = {}) {
const webhookUrl = 'https://your-n8n.com/webhook/scrape-product';
const response = await fetch(webhookUrl, {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_TOKEN',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: url,
selector: options.selector || 'body',
js_render: options.jsRender !== false,
proxy: options.proxy || 'datacenter'
})
});
return await response.json();
}
// Usage
const result = await triggerScrape('https://example.com/product', {
selector: '.product-details',
jsRender: true,
proxy: 'residential'
});
Template Customization Best Practices
Environment Variables
Store sensitive data securely:
// Access environment variables
const apiKey = $env.WEBSCRAPING_AI_KEY;
const dbHost = $env.DATABASE_HOST;
const slackWebhook = $env.SLACK_WEBHOOK_URL;
// Use in nodes
{
"parameters": {
"queryParameters": {
"parameters": [
{
"name": "api_key",
"value": "={{$env.WEBSCRAPING_AI_KEY}}"
}
]
}
}
}
Dynamic Configuration
Make templates reusable with dynamic inputs:
// Load configuration from external source
const config = await fetch('https://api.example.com/scrape-config')
.then(r => r.json());
return [{
json: {
urls: config.target_urls,
selectors: config.css_selectors,
schedule: config.cron_schedule,
notifications: config.notification_settings
}
}];
Modular Sub-Workflows
Break complex templates into reusable components:
- Data Fetching Module: HTTP requests and error handling
- Data Extraction Module: Parsing and extraction logic
- Data Validation Module: Validation and cleaning
- Storage Module: Database/file storage operations
- Notification Module: Alerts and reporting
Importing and Using Templates
Import from JSON
# Export workflow to JSON
n8n export:workflow --id=123 --output=my-template.json
# Import workflow from JSON
n8n import:workflow --input=my-template.json
Share Templates
# Create shareable template URL
# Go to n8n UI -> Workflow -> Share
# Copy the template URL
# Share: https://n8n.io/workflows/1234
Version Control
# Initialize git repository
git init
git add workflows/*.json
git commit -m "Add scraping templates"
# Push to repository
git remote add origin https://github.com/yourname/n8n-templates
git push -u origin main
Performance Optimization Tips
Batch Processing
Process items in batches to improve efficiency:
// Process 50 items at a time
{
"parameters": {
"batchSize": 50,
"options": {
"reset": false
}
},
"name": "Split In Batches",
"type": "n8n-nodes-base.splitInBatches"
}
Parallel Execution
Enable parallel processing for independent tasks:
{
"settings": {
"executionOrder": "v1"
},
"nodes": [
// Nodes will execute in parallel when possible
]
}
Caching
Implement caching to reduce redundant requests:
// Check cache before scraping
const cacheKey = `scrape_${url}_${Date.now() / 1000 / 3600 | 0}`;
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Scrape and cache
const data = await scrapeUrl(url);
await redis.setex(cacheKey, 3600, JSON.stringify(data));
return data;
Advanced Template Features
Dynamic Waiting
Similar to handling authentication in Puppeteer, proper wait strategies are crucial:
// Wait for specific elements
{
"parameters": {
"queryParameters": {
"parameters": [
{
"name": "wait_for",
"value": ".product-loaded"
},
{
"name": "js_timeout",
"value": "10000"
}
]
}
}
}
Custom Headers and Authentication
// Configure custom headers
const headers = {
'User-Agent': 'Mozilla/5.0 (Custom Bot)',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://example.com'
};
{
"parameters": {
"headerParameters": {
"parameters": Object.entries(headers).map(([name, value]) => ({
name, value
}))
}
}
}
Proxy Rotation
// Rotate between different proxy types
const proxyTypes = ['datacenter', 'residential'];
const proxyCountries = ['us', 'gb', 'de'];
const proxy = proxyTypes[Math.floor(Math.random() * proxyTypes.length)];
const country = proxyCountries[Math.floor(Math.random() * proxyCountries.length)];
{
"parameters": {
"queryParameters": {
"parameters": [
{"name": "proxy", "value": proxy},
{"name": "country", "value": country}
]
}
}
}
Monitoring and Debugging Templates
Execution Logging
// Add detailed logging
const executionLog = {
workflow_id: $workflow.id,
workflow_name: $workflow.name,
execution_id: $execution.id,
started_at: $execution.startedAt,
node_name: $node.name,
items_count: items.length,
timestamp: new Date().toISOString()
};
console.log('Execution Log:', JSON.stringify(executionLog, null, 2));
// Continue processing
return items;
Performance Metrics
// Track performance metrics
const startTime = Date.now();
// ... scraping logic ...
const endTime = Date.now();
const metrics = {
duration_ms: endTime - startTime,
items_processed: items.length,
items_per_second: items.length / ((endTime - startTime) / 1000),
success_rate: successCount / items.length * 100
};
return [{ json: metrics }];
Conclusion
The best n8n templates for web scraping provide flexible, production-ready solutions for various data extraction scenarios. Whether you need simple HTML scraping, JavaScript-rendered content extraction, pagination handling, or complex multi-source aggregation, these templates offer proven patterns that save development time while ensuring reliability and maintainability.
By combining these templates with WebScraping.AI's robust API, you can build sophisticated scraping workflows that handle proxies, JavaScript rendering, and anti-bot measures automatically. Start with a template that matches your use case, customize it for your specific needs, and scale your data extraction operations efficiently.
For more complex scenarios involving browser automation, consider exploring how to monitor network requests in Puppeteer to enhance your debugging capabilities.