What is the n8n API and how can I use it for scraping?

The n8n API is a RESTful API that allows developers to programmatically interact with n8n workflows, enabling automated creation, execution, and management of web scraping pipelines. Unlike the visual workflow editor, the API provides code-based control over your automation workflows, making it ideal for integrating n8n scraping capabilities into existing applications, creating dynamic workflows, or managing large-scale scraping operations.

Understanding the n8n API

n8n offers two distinct APIs for different purposes:

n8n REST API: Manages workflows, executions, and credentials programmatically
n8n Webhook API: Triggers workflows via HTTP requests with custom payloads

The REST API is particularly powerful for web scraping use cases because it allows you to:

Create and update workflows dynamically based on scraping requirements
Trigger scraping workflows programmatically from your applications
Monitor execution status and retrieve scraped data
Manage credentials and API keys for scraping services
Schedule and orchestrate multiple concurrent scraping tasks

Setting Up n8n API Access

Enabling API Access

First, you need to enable API access in your n8n instance. Add the following to your environment variables or .env file:

# Enable n8n API
N8N_API_ENABLED=true

# Set basic auth for API access (recommended for production)
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=your_username
N8N_BASIC_AUTH_PASSWORD=your_secure_password

# API host and port
N8N_HOST=localhost
N8N_PORT=5678
N8N_PROTOCOL=http

Restart your n8n instance after configuring these settings:

# Using Docker
docker restart n8n

# Using npm
pkill n8n && n8n start

# Using systemd
sudo systemctl restart n8n

API Authentication

The n8n API uses Basic Authentication. Include your credentials with every request:

Using curl:

curl -X GET https://your-n8n-instance.com/api/v1/workflows \
  -u username:password \
  -H "Content-Type: application/json"

Using Python:

import requests
from requests.auth import HTTPBasicAuth

API_URL = "https://your-n8n-instance.com/api/v1"
USERNAME = "your_username"
PASSWORD = "your_password"

auth = HTTPBasicAuth(USERNAME, PASSWORD)

response = requests.get(
    f"{API_URL}/workflows",
    auth=auth
)

print(response.json())

Using JavaScript/Node.js:

const axios = require('axios');

const API_URL = 'https://your-n8n-instance.com/api/v1';
const auth = {
  username: 'your_username',
  password: 'your_password'
};

async function getWorkflows() {
  const response = await axios.get(`${API_URL}/workflows`, { auth });
  return response.data;
}

getWorkflows()
  .then(data => console.log(data))
  .catch(error => console.error(error));

Creating Scraping Workflows via API

Basic Workflow Structure

Create a web scraping workflow programmatically using the n8n API. Here's a complete example that scrapes product data:

Python Example:

import requests
from requests.auth import HTTPBasicAuth

API_URL = "https://your-n8n-instance.com/api/v1"
auth = HTTPBasicAuth("username", "password")

# Define workflow for web scraping
workflow = {
    "name": "Product Price Scraper",
    "nodes": [
        {
            "parameters": {},
            "name": "Start",
            "type": "n8n-nodes-base.start",
            "typeVersion": 1,
            "position": [250, 300]
        },
        {
            "parameters": {
                "url": "https://api.webscraping.ai/html",
                "queryParameters": {
                    "parameters": [
                        {
                            "name": "api_key",
                            "value": "YOUR_API_KEY"
                        },
                        {
                            "name": "url",
                            "value": "https://example.com/products"
                        },
                        {
                            "name": "js",
                            "value": "true"
                        }
                    ]
                },
                "method": "GET"
            },
            "name": "Scrape Website",
            "type": "n8n-nodes-base.httpRequest",
            "typeVersion": 3,
            "position": [450, 300]
        },
        {
            "parameters": {
                "functionCode": """
const html = items[0].json.html;
const cheerio = require('cheerio');
const $ = cheerio.load(html);

const products = [];
$('.product').each((i, elem) => {
  products.push({
    name: $(elem).find('.product-name').text().trim(),
    price: $(elem).find('.price').text().trim(),
    url: $(elem).find('a').attr('href')
  });
});

return products.map(p => ({json: p}));
"""
            },
            "name": "Parse HTML",
            "type": "n8n-nodes-base.function",
            "typeVersion": 1,
            "position": [650, 300]
        }
    ],
    "connections": {
        "Start": {
            "main": [[{"node": "Scrape Website", "type": "main", "index": 0}]]
        },
        "Scrape Website": {
            "main": [[{"node": "Parse HTML", "type": "main", "index": 0}]]
        }
    },
    "active": False,
    "settings": {}
}

# Create workflow
response = requests.post(
    f"{API_URL}/workflows",
    json=workflow,
    auth=auth
)

workflow_id = response.json()['id']
print(f"Workflow created with ID: {workflow_id}")

JavaScript Example:

const axios = require('axios');

const API_URL = 'https://your-n8n-instance.com/api/v1';
const auth = {
  username: 'your_username',
  password: 'your_password'
};

const workflow = {
  name: 'E-commerce Data Scraper',
  nodes: [
    {
      parameters: {},
      name: 'Webhook',
      type: 'n8n-nodes-base.webhook',
      typeVersion: 1,
      position: [250, 300],
      webhookId: 'scraping-webhook'
    },
    {
      parameters: {
        url: 'https://api.webscraping.ai/html',
        queryParameters: {
          parameters: [
            { name: 'api_key', value: 'YOUR_API_KEY' },
            { name: 'url', value: '={{$json["target_url"]}}' },
            { name: 'js', value: 'true' },
            { name: 'proxy', value: 'datacenter' }
          ]
        },
        method: 'GET'
      },
      name: 'Fetch Page',
      type: 'n8n-nodes-base.httpRequest',
      typeVersion: 3,
      position: [450, 300]
    }
  ],
  connections: {
    'Webhook': {
      main: [[{ node: 'Fetch Page', type: 'main', index: 0 }]]
    }
  },
  active: false
};

async function createScrapingWorkflow() {
  try {
    const response = await axios.post(
      `${API_URL}/workflows`,
      workflow,
      { auth }
    );

    console.log('Workflow created:', response.data.id);
    return response.data;
  } catch (error) {
    console.error('Error creating workflow:', error.response.data);
  }
}

createScrapingWorkflow();

Executing Workflows Programmatically

Triggering Workflow Execution

Once you've created a workflow, execute it via the API:

Python:

import requests
from requests.auth import HTTPBasicAuth

def execute_scraping_workflow(workflow_id, target_url):
    API_URL = "https://your-n8n-instance.com/api/v1"
    auth = HTTPBasicAuth("username", "password")

    # Execute workflow with input data
    response = requests.post(
        f"{API_URL}/workflows/{workflow_id}/execute",
        json={
            "data": {
                "target_url": target_url,
                "js_rendering": True,
                "timeout": 30000
            }
        },
        auth=auth
    )

    execution_id = response.json()['executionId']
    return execution_id

# Execute scraping for multiple URLs
urls = [
    "https://example.com/product/1",
    "https://example.com/product/2",
    "https://example.com/product/3"
]

for url in urls:
    execution_id = execute_scraping_workflow("workflow_id_here", url)
    print(f"Started scraping {url}: {execution_id}")

JavaScript:

const axios = require('axios');

async function executeWorkflow(workflowId, payload) {
  const API_URL = 'https://your-n8n-instance.com/api/v1';
  const auth = {
    username: 'your_username',
    password: 'your_password'
  };

  try {
    const response = await axios.post(
      `${API_URL}/workflows/${workflowId}/execute`,
      { data: payload },
      { auth }
    );

    return response.data;
  } catch (error) {
    console.error('Execution failed:', error.response.data);
    throw error;
  }
}

// Execute scraping workflow
const payload = {
  urls: [
    'https://example.com/page1',
    'https://example.com/page2'
  ],
  selectors: {
    title: 'h1.product-title',
    price: '.price-value',
    description: '.product-description'
  }
};

executeWorkflow('workflow_123', payload)
  .then(result => console.log('Execution started:', result.executionId))
  .catch(error => console.error('Error:', error));

Using Webhooks for Scraping Triggers

Webhooks provide the simplest way to trigger scraping workflows. Similar to handling AJAX requests using Puppeteer, webhooks enable event-driven scraping.

Setting Up a Webhook-Triggered Scraper

Create a webhook workflow:

import requests

def create_webhook_scraper():
    workflow = {
        "name": "Webhook-Triggered Scraper",
        "nodes": [
            {
                "parameters": {
                    "path": "scrape-data",
                    "responseMode": "lastNode",
                    "options": {}
                },
                "name": "Webhook",
                "type": "n8n-nodes-base.webhook",
                "typeVersion": 1,
                "position": [250, 300]
            },
            {
                "parameters": {
                    "url": "https://api.webscraping.ai/html",
                    "queryParameters": {
                        "parameters": [
                            {"name": "api_key", "value": "YOUR_API_KEY"},
                            {"name": "url", "value": "={{$json['url']}}"},
                            {"name": "js", "value": "={{$json['js'] || 'true'}}"}
                        ]
                    },
                    "method": "GET"
                },
                "name": "Scrape URL",
                "type": "n8n-nodes-base.httpRequest",
                "typeVersion": 3,
                "position": [450, 300]
            }
        ],
        "connections": {
            "Webhook": {
                "main": [[{"node": "Scrape URL", "type": "main", "index": 0}]]
            }
        },
        "active": True
    }

    response = requests.post(
        "https://your-n8n-instance.com/api/v1/workflows",
        json=workflow,
        auth=HTTPBasicAuth("username", "password")
    )

    return response.json()

# Create and activate webhook
workflow_data = create_webhook_scraper()
webhook_url = f"https://your-n8n-instance.com/webhook/{workflow_data['id']}/scrape-data"
print(f"Webhook URL: {webhook_url}")

Trigger the webhook:

curl -X POST https://your-n8n-instance.com/webhook/scrape-data \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "js": true,
    "selectors": {
      "title": "h1",
      "content": ".main-content"
    }
  }'

Python webhook trigger:

import requests

def trigger_scraping(target_url, selectors):
    webhook_url = "https://your-n8n-instance.com/webhook/scrape-data"

    payload = {
        "url": target_url,
        "js": True,
        "selectors": selectors,
        "timeout": 30000
    }

    response = requests.post(webhook_url, json=payload)
    return response.json()

# Trigger scraping
result = trigger_scraping(
    "https://example.com/product",
    {"title": ".product-title", "price": ".price"}
)
print(result)

Monitoring Execution Status

Check the status of your scraping jobs and retrieve results:

Python monitoring script:

import requests
import time
from requests.auth import HTTPBasicAuth

def monitor_execution(execution_id, timeout=300):
    API_URL = "https://your-n8n-instance.com/api/v1"
    auth = HTTPBasicAuth("username", "password")

    start_time = time.time()

    while time.time() - start_time < timeout:
        response = requests.get(
            f"{API_URL}/executions/{execution_id}",
            auth=auth
        )

        execution = response.json()
        status = execution['finished']

        if status:
            if execution['data']['resultData']['error']:
                print(f"Execution failed: {execution['data']['resultData']['error']}")
                return None
            else:
                print("Execution completed successfully")
                return execution['data']['resultData']['runData']

        print(f"Execution status: Running... ({int(time.time() - start_time)}s)")
        time.sleep(5)

    print("Execution timeout")
    return None

# Monitor scraping job
execution_id = "execution_123"
results = monitor_execution(execution_id)

if results:
    # Process scraped data
    for node_name, node_data in results.items():
        print(f"Node: {node_name}")
        for run in node_data:
            for item in run['data']['main'][0]:
                print(item['json'])

JavaScript monitoring:

const axios = require('axios');

async function waitForExecution(executionId, maxWaitTime = 300000) {
  const API_URL = 'https://your-n8n-instance.com/api/v1';
  const auth = {
    username: 'your_username',
    password: 'your_password'
  };

  const startTime = Date.now();

  while (Date.now() - startTime < maxWaitTime) {
    try {
      const response = await axios.get(
        `${API_URL}/executions/${executionId}`,
        { auth }
      );

      const execution = response.data;

      if (execution.finished) {
        if (execution.data.resultData.error) {
          throw new Error(`Execution failed: ${execution.data.resultData.error.message}`);
        }
        return execution.data.resultData.runData;
      }

      console.log(`Waiting for execution... (${Math.floor((Date.now() - startTime) / 1000)}s)`);
      await new Promise(resolve => setTimeout(resolve, 5000));

    } catch (error) {
      console.error('Error checking execution:', error.message);
      throw error;
    }
  }

  throw new Error('Execution timeout');
}

// Usage
waitForExecution('execution_123')
  .then(results => {
    console.log('Scraping completed:', results);
  })
  .catch(error => {
    console.error('Scraping failed:', error.message);
  });

Advanced Scraping Patterns with n8n API

Batch Processing Multiple URLs

Process large lists of URLs efficiently:

Python batch scraper:

import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests.auth import HTTPBasicAuth

class N8nScraperAPI:
    def __init__(self, api_url, username, password):
        self.api_url = api_url
        self.auth = HTTPBasicAuth(username, password)

    def create_scraping_job(self, url, workflow_id):
        """Create a scraping job for a single URL"""
        response = requests.post(
            f"{self.api_url}/workflows/{workflow_id}/execute",
            json={"data": {"url": url}},
            auth=self.auth
        )
        return response.json()['executionId']

    def get_execution_result(self, execution_id):
        """Retrieve results from completed execution"""
        response = requests.get(
            f"{self.api_url}/executions/{execution_id}",
            auth=self.auth
        )
        execution = response.json()

        if not execution['finished']:
            return None

        # Extract scraped data from last node
        node_data = execution['data']['resultData']['runData']
        last_node = list(node_data.keys())[-1]
        items = node_data[last_node][0]['data']['main'][0]

        return [item['json'] for item in items]

    def scrape_urls_parallel(self, urls, workflow_id, max_workers=5):
        """Scrape multiple URLs in parallel"""
        execution_ids = []

        # Start all executions
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            futures = {
                executor.submit(self.create_scraping_job, url, workflow_id): url
                for url in urls
            }

            for future in as_completed(futures):
                url = futures[future]
                try:
                    execution_id = future.result()
                    execution_ids.append((url, execution_id))
                    print(f"Started scraping: {url}")
                except Exception as e:
                    print(f"Failed to start scraping {url}: {e}")

        # Wait for all executions to complete
        import time
        results = {}
        pending = set(execution_ids)

        while pending:
            for url, execution_id in list(pending):
                try:
                    result = self.get_execution_result(execution_id)
                    if result is not None:
                        results[url] = result
                        pending.remove((url, execution_id))
                        print(f"Completed: {url}")
                except Exception as e:
                    print(f"Error retrieving results for {url}: {e}")
                    pending.remove((url, execution_id))

            if pending:
                time.sleep(5)

        return results

# Usage
scraper = N8nScraperAPI(
    "https://your-n8n-instance.com/api/v1",
    "username",
    "password"
)

urls_to_scrape = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3",
    # ... add more URLs
]

results = scraper.scrape_urls_parallel(urls_to_scrape, "workflow_id", max_workers=10)

# Process results
for url, data in results.items():
    print(f"\nResults from {url}:")
    print(data)

Dynamic Workflow Generation

Create workflows dynamically based on scraping requirements, similar to how you would handle browser sessions in Puppeteer:

def generate_scraping_workflow(config):
    """Generate a custom workflow based on scraping configuration"""

    nodes = [
        {
            "parameters": {},
            "name": "Start",
            "type": "n8n-nodes-base.start",
            "typeVersion": 1,
            "position": [250, 300]
        }
    ]

    connections = {}
    x_position = 450
    last_node = "Start"

    # Add scraping node
    scraping_node = {
        "parameters": {
            "url": "https://api.webscraping.ai/html",
            "queryParameters": {
                "parameters": [
                    {"name": "api_key", "value": config['api_key']},
                    {"name": "url", "value": config['target_url']},
                    {"name": "js", "value": str(config.get('js', True)).lower()},
                    {"name": "timeout", "value": str(config.get('timeout', 30000))}
                ]
            },
            "method": "GET"
        },
        "name": "Scrape Page",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 3,
        "position": [x_position, 300]
    }
    nodes.append(scraping_node)
    connections[last_node] = {"main": [[{"node": "Scrape Page", "type": "main", "index": 0}]]}
    last_node = "Scrape Page"
    x_position += 200

    # Add parsing node if selectors provided
    if 'selectors' in config:
        parse_code = "const html = items[0].json.html;\n"
        parse_code += "const cheerio = require('cheerio');\n"
        parse_code += "const $ = cheerio.load(html);\n\n"
        parse_code += "const result = {\n"

        for field, selector in config['selectors'].items():
            parse_code += f"  {field}: $('{selector}').text().trim(),\n"

        parse_code += "};\n\nreturn [{json: result}];"

        parse_node = {
            "parameters": {
                "functionCode": parse_code
            },
            "name": "Parse Data",
            "type": "n8n-nodes-base.function",
            "typeVersion": 1,
            "position": [x_position, 300]
        }
        nodes.append(parse_node)
        connections[last_node] = {"main": [[{"node": "Parse Data", "type": "main", "index": 0}]]}
        last_node = "Parse Data"
        x_position += 200

    # Add storage node if configured
    if config.get('storage_type') == 'postgres':
        storage_node = {
            "parameters": {
                "operation": "insert",
                "table": config['storage_table'],
                "columns": ",".join(config['selectors'].keys())
            },
            "name": "Store Results",
            "type": "n8n-nodes-base.postgres",
            "typeVersion": 1,
            "position": [x_position, 300]
        }
        nodes.append(storage_node)
        connections[last_node] = {"main": [[{"node": "Store Results", "type": "main", "index": 0}]]}

    workflow = {
        "name": config.get('workflow_name', 'Dynamic Scraper'),
        "nodes": nodes,
        "connections": connections,
        "active": False
    }

    return workflow

# Create a custom scraping workflow
config = {
    "workflow_name": "Product Details Scraper",
    "api_key": "YOUR_WEBSCRAPING_AI_API_KEY",
    "target_url": "https://example.com/products",
    "js": True,
    "timeout": 30000,
    "selectors": {
        "product_name": ".product-title",
        "price": ".price-value",
        "rating": ".rating-score",
        "availability": ".stock-status"
    },
    "storage_type": "postgres",
    "storage_table": "scraped_products"
}

workflow = generate_scraping_workflow(config)

# Create workflow via API
response = requests.post(
    "https://your-n8n-instance.com/api/v1/workflows",
    json=workflow,
    auth=HTTPBasicAuth("username", "password")
)

print(f"Created workflow: {response.json()['id']}")

Error Handling and Retry Logic

Implement robust error handling for production scraping:

const axios = require('axios');

class N8nScrapingClient {
  constructor(apiUrl, auth) {
    this.apiUrl = apiUrl;
    this.auth = auth;
    this.maxRetries = 3;
  }

  async executeWithRetry(workflowId, data, retries = 0) {
    try {
      const response = await axios.post(
        `${this.apiUrl}/workflows/${workflowId}/execute`,
        { data },
        { auth: this.auth }
      );

      const executionId = response.data.executionId;
      const result = await this.waitForCompletion(executionId);

      return result;

    } catch (error) {
      if (retries < this.maxRetries) {
        const delay = Math.pow(2, retries) * 1000; // Exponential backoff
        console.log(`Retry ${retries + 1}/${this.maxRetries} after ${delay}ms`);

        await new Promise(resolve => setTimeout(resolve, delay));
        return this.executeWithRetry(workflowId, data, retries + 1);
      }

      throw error;
    }
  }

  async waitForCompletion(executionId, timeout = 300000) {
    const startTime = Date.now();

    while (Date.now() - startTime < timeout) {
      const response = await axios.get(
        `${this.apiUrl}/executions/${executionId}`,
        { auth: this.auth }
      );

      const execution = response.data;

      if (execution.finished) {
        if (execution.data.resultData.error) {
          throw new Error(`Execution failed: ${execution.data.resultData.error.message}`);
        }
        return this.extractResults(execution);
      }

      await new Promise(resolve => setTimeout(resolve, 5000));
    }

    throw new Error('Execution timeout');
  }

  extractResults(execution) {
    const nodeData = execution.data.resultData.runData;
    const lastNode = Object.keys(nodeData).pop();
    const items = nodeData[lastNode][0].data.main[0];
    return items.map(item => item.json);
  }
}

// Usage
const client = new N8nScrapingClient(
  'https://your-n8n-instance.com/api/v1',
  { username: 'your_username', password: 'your_password' }
);

client.executeWithRetry('workflow_123', { url: 'https://example.com' })
  .then(results => console.log('Scraping successful:', results))
  .catch(error => console.error('Scraping failed after retries:', error));

Best Practices for n8n API Scraping

Rate Limiting: Implement client-side rate limiting to avoid overwhelming your n8n instance
Error Handling: Always implement retry logic with exponential backoff for failed requests
Monitoring: Track execution status and set up alerts for failed scraping jobs
Resource Management: Limit concurrent executions to prevent resource exhaustion
Data Validation: Validate scraped data before storage to ensure quality
Security: Use environment variables for API keys and credentials, never hardcode them
Timeout Configuration: Set appropriate timeouts based on expected page load times, similar to handling timeouts in Puppeteer
Logging: Implement comprehensive logging for debugging and auditing
Webhook Security: Add authentication to webhook endpoints to prevent unauthorized access
Version Control: Store workflow definitions in version control for reproducibility

Integrating with WebScraping.AI API

For production-grade scraping that handles JavaScript rendering, proxies, and anti-bot protection, integrate WebScraping.AI with your n8n API workflows:

import requests

def create_webscraping_ai_workflow():
    """Create n8n workflow that uses WebScraping.AI for robust scraping"""

    workflow = {
        "name": "WebScraping.AI Integration",
        "nodes": [
            {
                "parameters": {},
                "name": "Start",
                "type": "n8n-nodes-base.start",
                "typeVersion": 1,
                "position": [250, 300]
            },
            {
                "parameters": {
                    "url": "https://api.webscraping.ai/html",
                    "queryParameters": {
                        "parameters": [
                            {"name": "api_key", "value": "YOUR_API_KEY"},
                            {"name": "url", "value": "={{$json['target_url']}}"},
                            {"name": "js", "value": "true"},
                            {"name": "proxy", "value": "residential"},
                            {"name": "device", "value": "desktop"},
                            {"name": "timeout", "value": "30000"},
                            {"name": "js_timeout", "value": "5000"}
                        ]
                    },
                    "method": "GET",
                    "options": {
                        "timeout": 60000
                    }
                },
                "name": "WebScraping.AI",
                "type": "n8n-nodes-base.httpRequest",
                "typeVersion": 3,
                "position": [450, 300]
            }
        ],
        "connections": {
            "Start": {
                "main": [[{"node": "WebScraping.AI", "type": "main", "index": 0}]]
            }
        },
        "active": False
    }

    return workflow

Conclusion

The n8n API provides powerful programmatic control over web scraping workflows, enabling developers to build sophisticated, scalable data extraction systems. By combining the n8n API with robust scraping services like WebScraping.AI, you can create production-ready scraping pipelines that handle authentication, JavaScript rendering, proxy rotation, and anti-bot protection automatically. Whether you're building a monitoring system, data aggregation platform, or research tool, the n8n API offers the flexibility and control needed for enterprise-grade web scraping operations.

Table of contents

What is the n8n API and how can I use it for scraping?

Understanding the n8n API

Setting Up n8n API Access

Enabling API Access

API Authentication

Creating Scraping Workflows via API

Basic Workflow Structure

Executing Workflows Programmatically

Triggering Workflow Execution

Using Webhooks for Scraping Triggers

Setting Up a Webhook-Triggered Scraper

Monitoring Execution Status

Advanced Scraping Patterns with n8n API

Batch Processing Multiple URLs

Dynamic Workflow Generation

Error Handling and Retry Logic

Best Practices for n8n API Scraping

Integrating with WebScraping.AI API

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

How do I set up an n8n webhook for automated scraping?

What are the best n8n tutorials for learning web scraping?

How do I use Puppeteer with n8n for web scraping?

Get Started Now

Support