Table of contents

What is the n8n Code Node and How Do I Use It for Scraping?

The n8n Code node is a powerful built-in component that allows you to execute custom JavaScript code within your n8n workflows. It's particularly useful for web scraping tasks where you need fine-grained control over data extraction, transformation, and processing logic that goes beyond what pre-built nodes can offer.

Understanding the n8n Code Node

The Code node in n8n provides a JavaScript runtime environment where you can write custom logic to manipulate data, make HTTP requests, parse HTML, and perform complex transformations. It's available in two variants:

  • Code node: For executing synchronous JavaScript code with access to workflow items
  • Function node (legacy): The older version with similar functionality but different syntax

As of n8n version 1.0, the Code node is the recommended approach, offering better TypeScript support, improved error handling, and access to modern JavaScript features.

Basic Structure of the Code Node

When you add a Code node to your workflow, it provides access to incoming data through the $input object. Here's the basic structure:

// Access input items from previous nodes
const items = $input.all();

// Process each item
for (const item of items) {
  // Your custom logic here
  const data = item.json;

  // Transform or extract data
  item.json.processedData = data.someField.toUpperCase();
}

// Return the modified items
return items;

Web Scraping with the Code Node

Method 1: Using Built-in HTTP Functionality

The Code node can make HTTP requests using the built-in $http object or standard fetch API:

// Using n8n's $http helper
const items = $input.all();
const results = [];

for (const item of items) {
  const url = item.json.url;

  // Make HTTP request
  const response = await $http.get(url);

  results.push({
    json: {
      url: url,
      html: response.body,
      statusCode: response.statusCode
    }
  });
}

return results;

Method 2: Parsing HTML with Cheerio

For HTML parsing tasks, you can use the Cheerio library, which is pre-installed in n8n's Code node environment. Using Cheerio in n8n provides a jQuery-like syntax for traversing and manipulating HTML:

const cheerio = require('cheerio');
const items = $input.all();
const results = [];

for (const item of items) {
  const html = item.json.html;
  const $ = cheerio.load(html);

  // Extract data using CSS selectors
  const products = [];

  $('.product-item').each((index, element) => {
    const product = {
      title: $(element).find('.product-title').text().trim(),
      price: $(element).find('.product-price').text().trim(),
      image: $(element).find('img').attr('src'),
      link: $(element).find('a').attr('href')
    };
    products.push(product);
  });

  results.push({
    json: {
      url: item.json.url,
      products: products,
      count: products.length
    }
  });
}

return results;

Method 3: Advanced Scraping with API Requests

When working with APIs or JSON responses, the Code node excels at handling complex data structures:

const items = $input.all();
const results = [];

for (const item of items) {
  const apiUrl = `https://api.example.com/search?q=${encodeURIComponent(item.json.query)}`;

  // Make API request with custom headers
  const response = await $http.request({
    method: 'GET',
    url: apiUrl,
    headers: {
      'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Accept': 'application/json'
    },
    timeout: 10000
  });

  // Parse and transform the response
  const data = JSON.parse(response.body);

  results.push({
    json: {
      query: item.json.query,
      results: data.items.map(item => ({
        id: item.id,
        title: item.title,
        description: item.description
      })),
      totalResults: data.total
    }
  });
}

return results;

Practical Web Scraping Examples

Example 1: Scraping Product Details from Multiple Pages

This example demonstrates how to scrape product information from an e-commerce site:

const cheerio = require('cheerio');
const items = $input.all();
const allProducts = [];

for (const item of items) {
  const pageUrl = item.json.url;

  try {
    // Fetch the page
    const response = await $http.get(pageUrl);
    const $ = cheerio.load(response.body);

    // Extract product information
    $('.product-card').each((i, elem) => {
      const $elem = $(elem);

      const product = {
        name: $elem.find('h3.product-name').text().trim(),
        price: parseFloat($elem.find('.price').text().replace(/[^0-9.]/g, '')),
        rating: parseFloat($elem.find('.rating').attr('data-rating')),
        inStock: $elem.find('.stock-status').text().includes('In Stock'),
        imageUrl: $elem.find('img').attr('src'),
        productUrl: $elem.find('a.product-link').attr('href'),
        description: $elem.find('.product-desc').text().trim()
      };

      allProducts.push(product);
    });
  } catch (error) {
    console.error(`Error scraping ${pageUrl}:`, error.message);
  }
}

return [{
  json: {
    products: allProducts,
    totalCount: allProducts.length,
    scrapedAt: new Date().toISOString()
  }
}];

Example 2: Extracting Data from Dynamic Content

When dealing with websites that load content dynamically, you might need to combine the Code node with other tools. While the Code node itself doesn't execute JavaScript on the page, you can pair it with n8n's Puppeteer node for browser automation:

// This code assumes HTML is already fetched via HTTP Request or Puppeteer node
const cheerio = require('cheerio');
const items = $input.all();
const results = [];

for (const item of items) {
  const html = item.json.html;
  const $ = cheerio.load(html);

  // Extract data from JSON-LD structured data
  const jsonLdScript = $('script[type="application/ld+json"]').html();

  if (jsonLdScript) {
    try {
      const structuredData = JSON.parse(jsonLdScript);

      results.push({
        json: {
          type: structuredData['@type'],
          name: structuredData.name,
          description: structuredData.description,
          price: structuredData.offers?.price,
          currency: structuredData.offers?.priceCurrency,
          availability: structuredData.offers?.availability
        }
      });
    } catch (e) {
      console.error('Failed to parse JSON-LD:', e);
    }
  }
}

return results;

Example 3: Pagination Handling

Handle multi-page scraping with automatic pagination:

const cheerio = require('cheerio');
const baseUrl = 'https://example.com/products';
const maxPages = 5;
const allItems = [];

for (let page = 1; page <= maxPages; page++) {
  const url = `${baseUrl}?page=${page}`;

  try {
    const response = await $http.get(url);
    const $ = cheerio.load(response.body);

    // Check if page has content
    const items = $('.item');
    if (items.length === 0) {
      break; // No more items, stop pagination
    }

    items.each((i, elem) => {
      allItems.push({
        title: $(elem).find('.title').text().trim(),
        url: $(elem).find('a').attr('href'),
        page: page
      });
    });

    // Check for next page link
    const hasNextPage = $('.pagination .next').length > 0;
    if (!hasNextPage) {
      break;
    }

  } catch (error) {
    console.error(`Error on page ${page}:`, error.message);
    break;
  }
}

return [{
  json: {
    items: allItems,
    totalPages: Math.ceil(allItems.length / 20),
    totalItems: allItems.length
  }
}];

Advanced Techniques

Error Handling and Retry Logic

Implement robust error handling for reliable scraping:

async function fetchWithRetry(url, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await $http.get(url);
      return response;
    } catch (error) {
      if (attempt === maxRetries) {
        throw new Error(`Failed after ${maxRetries} attempts: ${error.message}`);
      }
      // Wait before retry (exponential backoff)
      await new Promise(resolve => setTimeout(resolve, 1000 * attempt));
    }
  }
}

const items = $input.all();
const results = [];

for (const item of items) {
  try {
    const response = await fetchWithRetry(item.json.url);
    results.push({
      json: {
        url: item.json.url,
        success: true,
        data: response.body
      }
    });
  } catch (error) {
    results.push({
      json: {
        url: item.json.url,
        success: false,
        error: error.message
      }
    });
  }
}

return results;

Data Cleaning and Transformation

Clean and normalize scraped data within the Code node:

const items = $input.all();

function cleanPrice(priceStr) {
  // Remove currency symbols and convert to number
  return parseFloat(priceStr.replace(/[^0-9.]/g, '')) || 0;
}

function cleanText(text) {
  // Remove extra whitespace and normalize
  return text.replace(/\s+/g, ' ').trim();
}

const results = items.map(item => {
  const data = item.json;

  return {
    json: {
      title: cleanText(data.title),
      price: cleanPrice(data.price),
      description: cleanText(data.description),
      category: data.category?.toLowerCase().trim(),
      tags: data.tags?.map(tag => tag.toLowerCase().trim()).filter(Boolean),
      publishedDate: new Date(data.date).toISOString(),
      slug: cleanText(data.title).toLowerCase().replace(/[^a-z0-9]+/g, '-')
    }
  };
});

return results;

Best Practices for Code Node Scraping

1. Rate Limiting and Delays

Implement delays to avoid overwhelming target servers:

function delay(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

const items = $input.all();
const results = [];

for (const item of items) {
  const response = await $http.get(item.json.url);
  results.push({ json: { url: item.json.url, data: response.body } });

  // Wait 2 seconds between requests
  await delay(2000);
}

return results;

2. User-Agent Rotation

Set appropriate user agents to avoid detection:

const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
];

const items = $input.all();
const results = [];

for (let i = 0; i < items.length; i++) {
  const userAgent = userAgents[i % userAgents.length];

  const response = await $http.request({
    method: 'GET',
    url: items[i].json.url,
    headers: { 'User-Agent': userAgent }
  });

  results.push({ json: { data: response.body } });
}

return results;

3. Memory Management

For large-scale scraping, process data in batches to manage memory:

const items = $input.all();
const BATCH_SIZE = 10;
const results = [];

for (let i = 0; i < items.length; i += BATCH_SIZE) {
  const batch = items.slice(i, i + BATCH_SIZE);

  const batchResults = await Promise.all(
    batch.map(async (item) => {
      try {
        const response = await $http.get(item.json.url);
        return { json: { url: item.json.url, success: true, data: response.body } };
      } catch (error) {
        return { json: { url: item.json.url, success: false, error: error.message } };
      }
    })
  );

  results.push(...batchResults);
}

return results;

Integrating Code Node with Other n8n Nodes

The Code node works seamlessly with other n8n nodes for comprehensive scraping workflows:

  1. HTTP Request nodeCode node: Fetch HTML, then parse with Cheerio
  2. Webhook nodeCode node: Process incoming scraping requests
  3. Code nodeSpreadsheet node: Export scraped data to Google Sheets
  4. Schedule TriggerCode node: Automated periodic scraping
  5. Code nodeDatabase node: Store results in PostgreSQL/MySQL

When to Use Code Node vs. Other Scraping Solutions

Use the Code node when: - You need custom logic for data extraction and transformation - Working with complex HTML structures or nested JSON - Implementing custom retry logic or error handling - Processing data requires JavaScript-specific libraries

Consider alternatives when: - Simple HTTP requests are sufficient (use HTTP Request node) - Browser automation is required for JavaScript-heavy sites (use Puppeteer or Playwright nodes) - Visual workflow building is preferred over coding (use HTML Extract node) - You need professional-grade scraping with proxy rotation and anti-bot bypass (use WebScraping.AI API)

Conclusion

The n8n Code node is a versatile tool for web scraping that bridges the gap between no-code automation and custom programming. It provides developers with the flexibility to implement sophisticated scraping logic while staying within the n8n ecosystem. By combining the Code node with other n8n components and following best practices for rate limiting, error handling, and data processing, you can build robust and maintainable web scraping workflows.

Whether you're extracting product data, monitoring competitor prices, aggregating content, or building data pipelines, the Code node gives you the power to customize every aspect of your scraping process while benefiting from n8n's workflow orchestration capabilities.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon