What is the n8n Code Node and How Do I Use It for Scraping?
The n8n Code node is a powerful built-in component that allows you to execute custom JavaScript code within your n8n workflows. It's particularly useful for web scraping tasks where you need fine-grained control over data extraction, transformation, and processing logic that goes beyond what pre-built nodes can offer.
Understanding the n8n Code Node
The Code node in n8n provides a JavaScript runtime environment where you can write custom logic to manipulate data, make HTTP requests, parse HTML, and perform complex transformations. It's available in two variants:
- Code node: For executing synchronous JavaScript code with access to workflow items
- Function node (legacy): The older version with similar functionality but different syntax
As of n8n version 1.0, the Code node is the recommended approach, offering better TypeScript support, improved error handling, and access to modern JavaScript features.
Basic Structure of the Code Node
When you add a Code node to your workflow, it provides access to incoming data through the $input
object. Here's the basic structure:
// Access input items from previous nodes
const items = $input.all();
// Process each item
for (const item of items) {
// Your custom logic here
const data = item.json;
// Transform or extract data
item.json.processedData = data.someField.toUpperCase();
}
// Return the modified items
return items;
Web Scraping with the Code Node
Method 1: Using Built-in HTTP Functionality
The Code node can make HTTP requests using the built-in $http
object or standard fetch API:
// Using n8n's $http helper
const items = $input.all();
const results = [];
for (const item of items) {
const url = item.json.url;
// Make HTTP request
const response = await $http.get(url);
results.push({
json: {
url: url,
html: response.body,
statusCode: response.statusCode
}
});
}
return results;
Method 2: Parsing HTML with Cheerio
For HTML parsing tasks, you can use the Cheerio library, which is pre-installed in n8n's Code node environment. Using Cheerio in n8n provides a jQuery-like syntax for traversing and manipulating HTML:
const cheerio = require('cheerio');
const items = $input.all();
const results = [];
for (const item of items) {
const html = item.json.html;
const $ = cheerio.load(html);
// Extract data using CSS selectors
const products = [];
$('.product-item').each((index, element) => {
const product = {
title: $(element).find('.product-title').text().trim(),
price: $(element).find('.product-price').text().trim(),
image: $(element).find('img').attr('src'),
link: $(element).find('a').attr('href')
};
products.push(product);
});
results.push({
json: {
url: item.json.url,
products: products,
count: products.length
}
});
}
return results;
Method 3: Advanced Scraping with API Requests
When working with APIs or JSON responses, the Code node excels at handling complex data structures:
const items = $input.all();
const results = [];
for (const item of items) {
const apiUrl = `https://api.example.com/search?q=${encodeURIComponent(item.json.query)}`;
// Make API request with custom headers
const response = await $http.request({
method: 'GET',
url: apiUrl,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json'
},
timeout: 10000
});
// Parse and transform the response
const data = JSON.parse(response.body);
results.push({
json: {
query: item.json.query,
results: data.items.map(item => ({
id: item.id,
title: item.title,
description: item.description
})),
totalResults: data.total
}
});
}
return results;
Practical Web Scraping Examples
Example 1: Scraping Product Details from Multiple Pages
This example demonstrates how to scrape product information from an e-commerce site:
const cheerio = require('cheerio');
const items = $input.all();
const allProducts = [];
for (const item of items) {
const pageUrl = item.json.url;
try {
// Fetch the page
const response = await $http.get(pageUrl);
const $ = cheerio.load(response.body);
// Extract product information
$('.product-card').each((i, elem) => {
const $elem = $(elem);
const product = {
name: $elem.find('h3.product-name').text().trim(),
price: parseFloat($elem.find('.price').text().replace(/[^0-9.]/g, '')),
rating: parseFloat($elem.find('.rating').attr('data-rating')),
inStock: $elem.find('.stock-status').text().includes('In Stock'),
imageUrl: $elem.find('img').attr('src'),
productUrl: $elem.find('a.product-link').attr('href'),
description: $elem.find('.product-desc').text().trim()
};
allProducts.push(product);
});
} catch (error) {
console.error(`Error scraping ${pageUrl}:`, error.message);
}
}
return [{
json: {
products: allProducts,
totalCount: allProducts.length,
scrapedAt: new Date().toISOString()
}
}];
Example 2: Extracting Data from Dynamic Content
When dealing with websites that load content dynamically, you might need to combine the Code node with other tools. While the Code node itself doesn't execute JavaScript on the page, you can pair it with n8n's Puppeteer node for browser automation:
// This code assumes HTML is already fetched via HTTP Request or Puppeteer node
const cheerio = require('cheerio');
const items = $input.all();
const results = [];
for (const item of items) {
const html = item.json.html;
const $ = cheerio.load(html);
// Extract data from JSON-LD structured data
const jsonLdScript = $('script[type="application/ld+json"]').html();
if (jsonLdScript) {
try {
const structuredData = JSON.parse(jsonLdScript);
results.push({
json: {
type: structuredData['@type'],
name: structuredData.name,
description: structuredData.description,
price: structuredData.offers?.price,
currency: structuredData.offers?.priceCurrency,
availability: structuredData.offers?.availability
}
});
} catch (e) {
console.error('Failed to parse JSON-LD:', e);
}
}
}
return results;
Example 3: Pagination Handling
Handle multi-page scraping with automatic pagination:
const cheerio = require('cheerio');
const baseUrl = 'https://example.com/products';
const maxPages = 5;
const allItems = [];
for (let page = 1; page <= maxPages; page++) {
const url = `${baseUrl}?page=${page}`;
try {
const response = await $http.get(url);
const $ = cheerio.load(response.body);
// Check if page has content
const items = $('.item');
if (items.length === 0) {
break; // No more items, stop pagination
}
items.each((i, elem) => {
allItems.push({
title: $(elem).find('.title').text().trim(),
url: $(elem).find('a').attr('href'),
page: page
});
});
// Check for next page link
const hasNextPage = $('.pagination .next').length > 0;
if (!hasNextPage) {
break;
}
} catch (error) {
console.error(`Error on page ${page}:`, error.message);
break;
}
}
return [{
json: {
items: allItems,
totalPages: Math.ceil(allItems.length / 20),
totalItems: allItems.length
}
}];
Advanced Techniques
Error Handling and Retry Logic
Implement robust error handling for reliable scraping:
async function fetchWithRetry(url, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const response = await $http.get(url);
return response;
} catch (error) {
if (attempt === maxRetries) {
throw new Error(`Failed after ${maxRetries} attempts: ${error.message}`);
}
// Wait before retry (exponential backoff)
await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
}
}
}
const items = $input.all();
const results = [];
for (const item of items) {
try {
const response = await fetchWithRetry(item.json.url);
results.push({
json: {
url: item.json.url,
success: true,
data: response.body
}
});
} catch (error) {
results.push({
json: {
url: item.json.url,
success: false,
error: error.message
}
});
}
}
return results;
Data Cleaning and Transformation
Clean and normalize scraped data within the Code node:
const items = $input.all();
function cleanPrice(priceStr) {
// Remove currency symbols and convert to number
return parseFloat(priceStr.replace(/[^0-9.]/g, '')) || 0;
}
function cleanText(text) {
// Remove extra whitespace and normalize
return text.replace(/\s+/g, ' ').trim();
}
const results = items.map(item => {
const data = item.json;
return {
json: {
title: cleanText(data.title),
price: cleanPrice(data.price),
description: cleanText(data.description),
category: data.category?.toLowerCase().trim(),
tags: data.tags?.map(tag => tag.toLowerCase().trim()).filter(Boolean),
publishedDate: new Date(data.date).toISOString(),
slug: cleanText(data.title).toLowerCase().replace(/[^a-z0-9]+/g, '-')
}
};
});
return results;
Best Practices for Code Node Scraping
1. Rate Limiting and Delays
Implement delays to avoid overwhelming target servers:
function delay(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
const items = $input.all();
const results = [];
for (const item of items) {
const response = await $http.get(item.json.url);
results.push({ json: { url: item.json.url, data: response.body } });
// Wait 2 seconds between requests
await delay(2000);
}
return results;
2. User-Agent Rotation
Set appropriate user agents to avoid detection:
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];
const items = $input.all();
const results = [];
for (let i = 0; i < items.length; i++) {
const userAgent = userAgents[i % userAgents.length];
const response = await $http.request({
method: 'GET',
url: items[i].json.url,
headers: { 'User-Agent': userAgent }
});
results.push({ json: { data: response.body } });
}
return results;
3. Memory Management
For large-scale scraping, process data in batches to manage memory:
const items = $input.all();
const BATCH_SIZE = 10;
const results = [];
for (let i = 0; i < items.length; i += BATCH_SIZE) {
const batch = items.slice(i, i + BATCH_SIZE);
const batchResults = await Promise.all(
batch.map(async (item) => {
try {
const response = await $http.get(item.json.url);
return { json: { url: item.json.url, success: true, data: response.body } };
} catch (error) {
return { json: { url: item.json.url, success: false, error: error.message } };
}
})
);
results.push(...batchResults);
}
return results;
Integrating Code Node with Other n8n Nodes
The Code node works seamlessly with other n8n nodes for comprehensive scraping workflows:
- HTTP Request node → Code node: Fetch HTML, then parse with Cheerio
- Webhook node → Code node: Process incoming scraping requests
- Code node → Spreadsheet node: Export scraped data to Google Sheets
- Schedule Trigger → Code node: Automated periodic scraping
- Code node → Database node: Store results in PostgreSQL/MySQL
When to Use Code Node vs. Other Scraping Solutions
Use the Code node when: - You need custom logic for data extraction and transformation - Working with complex HTML structures or nested JSON - Implementing custom retry logic or error handling - Processing data requires JavaScript-specific libraries
Consider alternatives when: - Simple HTTP requests are sufficient (use HTTP Request node) - Browser automation is required for JavaScript-heavy sites (use Puppeteer or Playwright nodes) - Visual workflow building is preferred over coding (use HTML Extract node) - You need professional-grade scraping with proxy rotation and anti-bot bypass (use WebScraping.AI API)
Conclusion
The n8n Code node is a versatile tool for web scraping that bridges the gap between no-code automation and custom programming. It provides developers with the flexibility to implement sophisticated scraping logic while staying within the n8n ecosystem. By combining the Code node with other n8n components and following best practices for rate limiting, error handling, and data processing, you can build robust and maintainable web scraping workflows.
Whether you're extracting product data, monitoring competitor prices, aggregating content, or building data pipelines, the Code node gives you the power to customize every aspect of your scraping process while benefiting from n8n's workflow orchestration capabilities.