What is the n8n API and how can I use it for scraping?
The n8n API is a RESTful API that allows developers to programmatically interact with n8n workflows, enabling automated creation, execution, and management of web scraping pipelines. Unlike the visual workflow editor, the API provides code-based control over your automation workflows, making it ideal for integrating n8n scraping capabilities into existing applications, creating dynamic workflows, or managing large-scale scraping operations.
Understanding the n8n API
n8n offers two distinct APIs for different purposes:
- n8n REST API: Manages workflows, executions, and credentials programmatically
- n8n Webhook API: Triggers workflows via HTTP requests with custom payloads
The REST API is particularly powerful for web scraping use cases because it allows you to:
- Create and update workflows dynamically based on scraping requirements
- Trigger scraping workflows programmatically from your applications
- Monitor execution status and retrieve scraped data
- Manage credentials and API keys for scraping services
- Schedule and orchestrate multiple concurrent scraping tasks
Setting Up n8n API Access
Enabling API Access
First, you need to enable API access in your n8n instance. Add the following to your environment variables or .env
file:
# Enable n8n API
N8N_API_ENABLED=true
# Set basic auth for API access (recommended for production)
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=your_username
N8N_BASIC_AUTH_PASSWORD=your_secure_password
# API host and port
N8N_HOST=localhost
N8N_PORT=5678
N8N_PROTOCOL=http
Restart your n8n instance after configuring these settings:
# Using Docker
docker restart n8n
# Using npm
pkill n8n && n8n start
# Using systemd
sudo systemctl restart n8n
API Authentication
The n8n API uses Basic Authentication. Include your credentials with every request:
Using curl:
curl -X GET https://your-n8n-instance.com/api/v1/workflows \
-u username:password \
-H "Content-Type: application/json"
Using Python:
import requests
from requests.auth import HTTPBasicAuth
API_URL = "https://your-n8n-instance.com/api/v1"
USERNAME = "your_username"
PASSWORD = "your_password"
auth = HTTPBasicAuth(USERNAME, PASSWORD)
response = requests.get(
f"{API_URL}/workflows",
auth=auth
)
print(response.json())
Using JavaScript/Node.js:
const axios = require('axios');
const API_URL = 'https://your-n8n-instance.com/api/v1';
const auth = {
username: 'your_username',
password: 'your_password'
};
async function getWorkflows() {
const response = await axios.get(`${API_URL}/workflows`, { auth });
return response.data;
}
getWorkflows()
.then(data => console.log(data))
.catch(error => console.error(error));
Creating Scraping Workflows via API
Basic Workflow Structure
Create a web scraping workflow programmatically using the n8n API. Here's a complete example that scrapes product data:
Python Example:
import requests
from requests.auth import HTTPBasicAuth
API_URL = "https://your-n8n-instance.com/api/v1"
auth = HTTPBasicAuth("username", "password")
# Define workflow for web scraping
workflow = {
"name": "Product Price Scraper",
"nodes": [
{
"parameters": {},
"name": "Start",
"type": "n8n-nodes-base.start",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{
"name": "api_key",
"value": "YOUR_API_KEY"
},
{
"name": "url",
"value": "https://example.com/products"
},
{
"name": "js",
"value": "true"
}
]
},
"method": "GET"
},
"name": "Scrape Website",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [450, 300]
},
{
"parameters": {
"functionCode": """
const html = items[0].json.html;
const cheerio = require('cheerio');
const $ = cheerio.load(html);
const products = [];
$('.product').each((i, elem) => {
products.push({
name: $(elem).find('.product-name').text().trim(),
price: $(elem).find('.price').text().trim(),
url: $(elem).find('a').attr('href')
});
});
return products.map(p => ({json: p}));
"""
},
"name": "Parse HTML",
"type": "n8n-nodes-base.function",
"typeVersion": 1,
"position": [650, 300]
}
],
"connections": {
"Start": {
"main": [[{"node": "Scrape Website", "type": "main", "index": 0}]]
},
"Scrape Website": {
"main": [[{"node": "Parse HTML", "type": "main", "index": 0}]]
}
},
"active": False,
"settings": {}
}
# Create workflow
response = requests.post(
f"{API_URL}/workflows",
json=workflow,
auth=auth
)
workflow_id = response.json()['id']
print(f"Workflow created with ID: {workflow_id}")
JavaScript Example:
const axios = require('axios');
const API_URL = 'https://your-n8n-instance.com/api/v1';
const auth = {
username: 'your_username',
password: 'your_password'
};
const workflow = {
name: 'E-commerce Data Scraper',
nodes: [
{
parameters: {},
name: 'Webhook',
type: 'n8n-nodes-base.webhook',
typeVersion: 1,
position: [250, 300],
webhookId: 'scraping-webhook'
},
{
parameters: {
url: 'https://api.webscraping.ai/html',
queryParameters: {
parameters: [
{ name: 'api_key', value: 'YOUR_API_KEY' },
{ name: 'url', value: '={{$json["target_url"]}}' },
{ name: 'js', value: 'true' },
{ name: 'proxy', value: 'datacenter' }
]
},
method: 'GET'
},
name: 'Fetch Page',
type: 'n8n-nodes-base.httpRequest',
typeVersion: 3,
position: [450, 300]
}
],
connections: {
'Webhook': {
main: [[{ node: 'Fetch Page', type: 'main', index: 0 }]]
}
},
active: false
};
async function createScrapingWorkflow() {
try {
const response = await axios.post(
`${API_URL}/workflows`,
workflow,
{ auth }
);
console.log('Workflow created:', response.data.id);
return response.data;
} catch (error) {
console.error('Error creating workflow:', error.response.data);
}
}
createScrapingWorkflow();
Executing Workflows Programmatically
Triggering Workflow Execution
Once you've created a workflow, execute it via the API:
Python:
import requests
from requests.auth import HTTPBasicAuth
def execute_scraping_workflow(workflow_id, target_url):
API_URL = "https://your-n8n-instance.com/api/v1"
auth = HTTPBasicAuth("username", "password")
# Execute workflow with input data
response = requests.post(
f"{API_URL}/workflows/{workflow_id}/execute",
json={
"data": {
"target_url": target_url,
"js_rendering": True,
"timeout": 30000
}
},
auth=auth
)
execution_id = response.json()['executionId']
return execution_id
# Execute scraping for multiple URLs
urls = [
"https://example.com/product/1",
"https://example.com/product/2",
"https://example.com/product/3"
]
for url in urls:
execution_id = execute_scraping_workflow("workflow_id_here", url)
print(f"Started scraping {url}: {execution_id}")
JavaScript:
const axios = require('axios');
async function executeWorkflow(workflowId, payload) {
const API_URL = 'https://your-n8n-instance.com/api/v1';
const auth = {
username: 'your_username',
password: 'your_password'
};
try {
const response = await axios.post(
`${API_URL}/workflows/${workflowId}/execute`,
{ data: payload },
{ auth }
);
return response.data;
} catch (error) {
console.error('Execution failed:', error.response.data);
throw error;
}
}
// Execute scraping workflow
const payload = {
urls: [
'https://example.com/page1',
'https://example.com/page2'
],
selectors: {
title: 'h1.product-title',
price: '.price-value',
description: '.product-description'
}
};
executeWorkflow('workflow_123', payload)
.then(result => console.log('Execution started:', result.executionId))
.catch(error => console.error('Error:', error));
Using Webhooks for Scraping Triggers
Webhooks provide the simplest way to trigger scraping workflows. Similar to handling AJAX requests using Puppeteer, webhooks enable event-driven scraping.
Setting Up a Webhook-Triggered Scraper
Create a webhook workflow:
import requests
def create_webhook_scraper():
workflow = {
"name": "Webhook-Triggered Scraper",
"nodes": [
{
"parameters": {
"path": "scrape-data",
"responseMode": "lastNode",
"options": {}
},
"name": "Webhook",
"type": "n8n-nodes-base.webhook",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{"name": "api_key", "value": "YOUR_API_KEY"},
{"name": "url", "value": "={{$json['url']}}"},
{"name": "js", "value": "={{$json['js'] || 'true'}}"}
]
},
"method": "GET"
},
"name": "Scrape URL",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [450, 300]
}
],
"connections": {
"Webhook": {
"main": [[{"node": "Scrape URL", "type": "main", "index": 0}]]
}
},
"active": True
}
response = requests.post(
"https://your-n8n-instance.com/api/v1/workflows",
json=workflow,
auth=HTTPBasicAuth("username", "password")
)
return response.json()
# Create and activate webhook
workflow_data = create_webhook_scraper()
webhook_url = f"https://your-n8n-instance.com/webhook/{workflow_data['id']}/scrape-data"
print(f"Webhook URL: {webhook_url}")
Trigger the webhook:
curl -X POST https://your-n8n-instance.com/webhook/scrape-data \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"js": true,
"selectors": {
"title": "h1",
"content": ".main-content"
}
}'
Python webhook trigger:
import requests
def trigger_scraping(target_url, selectors):
webhook_url = "https://your-n8n-instance.com/webhook/scrape-data"
payload = {
"url": target_url,
"js": True,
"selectors": selectors,
"timeout": 30000
}
response = requests.post(webhook_url, json=payload)
return response.json()
# Trigger scraping
result = trigger_scraping(
"https://example.com/product",
{"title": ".product-title", "price": ".price"}
)
print(result)
Monitoring Execution Status
Check the status of your scraping jobs and retrieve results:
Python monitoring script:
import requests
import time
from requests.auth import HTTPBasicAuth
def monitor_execution(execution_id, timeout=300):
API_URL = "https://your-n8n-instance.com/api/v1"
auth = HTTPBasicAuth("username", "password")
start_time = time.time()
while time.time() - start_time < timeout:
response = requests.get(
f"{API_URL}/executions/{execution_id}",
auth=auth
)
execution = response.json()
status = execution['finished']
if status:
if execution['data']['resultData']['error']:
print(f"Execution failed: {execution['data']['resultData']['error']}")
return None
else:
print("Execution completed successfully")
return execution['data']['resultData']['runData']
print(f"Execution status: Running... ({int(time.time() - start_time)}s)")
time.sleep(5)
print("Execution timeout")
return None
# Monitor scraping job
execution_id = "execution_123"
results = monitor_execution(execution_id)
if results:
# Process scraped data
for node_name, node_data in results.items():
print(f"Node: {node_name}")
for run in node_data:
for item in run['data']['main'][0]:
print(item['json'])
JavaScript monitoring:
const axios = require('axios');
async function waitForExecution(executionId, maxWaitTime = 300000) {
const API_URL = 'https://your-n8n-instance.com/api/v1';
const auth = {
username: 'your_username',
password: 'your_password'
};
const startTime = Date.now();
while (Date.now() - startTime < maxWaitTime) {
try {
const response = await axios.get(
`${API_URL}/executions/${executionId}`,
{ auth }
);
const execution = response.data;
if (execution.finished) {
if (execution.data.resultData.error) {
throw new Error(`Execution failed: ${execution.data.resultData.error.message}`);
}
return execution.data.resultData.runData;
}
console.log(`Waiting for execution... (${Math.floor((Date.now() - startTime) / 1000)}s)`);
await new Promise(resolve => setTimeout(resolve, 5000));
} catch (error) {
console.error('Error checking execution:', error.message);
throw error;
}
}
throw new Error('Execution timeout');
}
// Usage
waitForExecution('execution_123')
.then(results => {
console.log('Scraping completed:', results);
})
.catch(error => {
console.error('Scraping failed:', error.message);
});
Advanced Scraping Patterns with n8n API
Batch Processing Multiple URLs
Process large lists of URLs efficiently:
Python batch scraper:
import requests
from concurrent.futures import ThreadPoolExecutor, as_completed
from requests.auth import HTTPBasicAuth
class N8nScraperAPI:
def __init__(self, api_url, username, password):
self.api_url = api_url
self.auth = HTTPBasicAuth(username, password)
def create_scraping_job(self, url, workflow_id):
"""Create a scraping job for a single URL"""
response = requests.post(
f"{self.api_url}/workflows/{workflow_id}/execute",
json={"data": {"url": url}},
auth=self.auth
)
return response.json()['executionId']
def get_execution_result(self, execution_id):
"""Retrieve results from completed execution"""
response = requests.get(
f"{self.api_url}/executions/{execution_id}",
auth=self.auth
)
execution = response.json()
if not execution['finished']:
return None
# Extract scraped data from last node
node_data = execution['data']['resultData']['runData']
last_node = list(node_data.keys())[-1]
items = node_data[last_node][0]['data']['main'][0]
return [item['json'] for item in items]
def scrape_urls_parallel(self, urls, workflow_id, max_workers=5):
"""Scrape multiple URLs in parallel"""
execution_ids = []
# Start all executions
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {
executor.submit(self.create_scraping_job, url, workflow_id): url
for url in urls
}
for future in as_completed(futures):
url = futures[future]
try:
execution_id = future.result()
execution_ids.append((url, execution_id))
print(f"Started scraping: {url}")
except Exception as e:
print(f"Failed to start scraping {url}: {e}")
# Wait for all executions to complete
import time
results = {}
pending = set(execution_ids)
while pending:
for url, execution_id in list(pending):
try:
result = self.get_execution_result(execution_id)
if result is not None:
results[url] = result
pending.remove((url, execution_id))
print(f"Completed: {url}")
except Exception as e:
print(f"Error retrieving results for {url}: {e}")
pending.remove((url, execution_id))
if pending:
time.sleep(5)
return results
# Usage
scraper = N8nScraperAPI(
"https://your-n8n-instance.com/api/v1",
"username",
"password"
)
urls_to_scrape = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3",
# ... add more URLs
]
results = scraper.scrape_urls_parallel(urls_to_scrape, "workflow_id", max_workers=10)
# Process results
for url, data in results.items():
print(f"\nResults from {url}:")
print(data)
Dynamic Workflow Generation
Create workflows dynamically based on scraping requirements, similar to how you would handle browser sessions in Puppeteer:
def generate_scraping_workflow(config):
"""Generate a custom workflow based on scraping configuration"""
nodes = [
{
"parameters": {},
"name": "Start",
"type": "n8n-nodes-base.start",
"typeVersion": 1,
"position": [250, 300]
}
]
connections = {}
x_position = 450
last_node = "Start"
# Add scraping node
scraping_node = {
"parameters": {
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{"name": "api_key", "value": config['api_key']},
{"name": "url", "value": config['target_url']},
{"name": "js", "value": str(config.get('js', True)).lower()},
{"name": "timeout", "value": str(config.get('timeout', 30000))}
]
},
"method": "GET"
},
"name": "Scrape Page",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [x_position, 300]
}
nodes.append(scraping_node)
connections[last_node] = {"main": [[{"node": "Scrape Page", "type": "main", "index": 0}]]}
last_node = "Scrape Page"
x_position += 200
# Add parsing node if selectors provided
if 'selectors' in config:
parse_code = "const html = items[0].json.html;\n"
parse_code += "const cheerio = require('cheerio');\n"
parse_code += "const $ = cheerio.load(html);\n\n"
parse_code += "const result = {\n"
for field, selector in config['selectors'].items():
parse_code += f" {field}: $('{selector}').text().trim(),\n"
parse_code += "};\n\nreturn [{json: result}];"
parse_node = {
"parameters": {
"functionCode": parse_code
},
"name": "Parse Data",
"type": "n8n-nodes-base.function",
"typeVersion": 1,
"position": [x_position, 300]
}
nodes.append(parse_node)
connections[last_node] = {"main": [[{"node": "Parse Data", "type": "main", "index": 0}]]}
last_node = "Parse Data"
x_position += 200
# Add storage node if configured
if config.get('storage_type') == 'postgres':
storage_node = {
"parameters": {
"operation": "insert",
"table": config['storage_table'],
"columns": ",".join(config['selectors'].keys())
},
"name": "Store Results",
"type": "n8n-nodes-base.postgres",
"typeVersion": 1,
"position": [x_position, 300]
}
nodes.append(storage_node)
connections[last_node] = {"main": [[{"node": "Store Results", "type": "main", "index": 0}]]}
workflow = {
"name": config.get('workflow_name', 'Dynamic Scraper'),
"nodes": nodes,
"connections": connections,
"active": False
}
return workflow
# Create a custom scraping workflow
config = {
"workflow_name": "Product Details Scraper",
"api_key": "YOUR_WEBSCRAPING_AI_API_KEY",
"target_url": "https://example.com/products",
"js": True,
"timeout": 30000,
"selectors": {
"product_name": ".product-title",
"price": ".price-value",
"rating": ".rating-score",
"availability": ".stock-status"
},
"storage_type": "postgres",
"storage_table": "scraped_products"
}
workflow = generate_scraping_workflow(config)
# Create workflow via API
response = requests.post(
"https://your-n8n-instance.com/api/v1/workflows",
json=workflow,
auth=HTTPBasicAuth("username", "password")
)
print(f"Created workflow: {response.json()['id']}")
Error Handling and Retry Logic
Implement robust error handling for production scraping:
const axios = require('axios');
class N8nScrapingClient {
constructor(apiUrl, auth) {
this.apiUrl = apiUrl;
this.auth = auth;
this.maxRetries = 3;
}
async executeWithRetry(workflowId, data, retries = 0) {
try {
const response = await axios.post(
`${this.apiUrl}/workflows/${workflowId}/execute`,
{ data },
{ auth: this.auth }
);
const executionId = response.data.executionId;
const result = await this.waitForCompletion(executionId);
return result;
} catch (error) {
if (retries < this.maxRetries) {
const delay = Math.pow(2, retries) * 1000; // Exponential backoff
console.log(`Retry ${retries + 1}/${this.maxRetries} after ${delay}ms`);
await new Promise(resolve => setTimeout(resolve, delay));
return this.executeWithRetry(workflowId, data, retries + 1);
}
throw error;
}
}
async waitForCompletion(executionId, timeout = 300000) {
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
const response = await axios.get(
`${this.apiUrl}/executions/${executionId}`,
{ auth: this.auth }
);
const execution = response.data;
if (execution.finished) {
if (execution.data.resultData.error) {
throw new Error(`Execution failed: ${execution.data.resultData.error.message}`);
}
return this.extractResults(execution);
}
await new Promise(resolve => setTimeout(resolve, 5000));
}
throw new Error('Execution timeout');
}
extractResults(execution) {
const nodeData = execution.data.resultData.runData;
const lastNode = Object.keys(nodeData).pop();
const items = nodeData[lastNode][0].data.main[0];
return items.map(item => item.json);
}
}
// Usage
const client = new N8nScrapingClient(
'https://your-n8n-instance.com/api/v1',
{ username: 'your_username', password: 'your_password' }
);
client.executeWithRetry('workflow_123', { url: 'https://example.com' })
.then(results => console.log('Scraping successful:', results))
.catch(error => console.error('Scraping failed after retries:', error));
Best Practices for n8n API Scraping
- Rate Limiting: Implement client-side rate limiting to avoid overwhelming your n8n instance
- Error Handling: Always implement retry logic with exponential backoff for failed requests
- Monitoring: Track execution status and set up alerts for failed scraping jobs
- Resource Management: Limit concurrent executions to prevent resource exhaustion
- Data Validation: Validate scraped data before storage to ensure quality
- Security: Use environment variables for API keys and credentials, never hardcode them
- Timeout Configuration: Set appropriate timeouts based on expected page load times, similar to handling timeouts in Puppeteer
- Logging: Implement comprehensive logging for debugging and auditing
- Webhook Security: Add authentication to webhook endpoints to prevent unauthorized access
- Version Control: Store workflow definitions in version control for reproducibility
Integrating with WebScraping.AI API
For production-grade scraping that handles JavaScript rendering, proxies, and anti-bot protection, integrate WebScraping.AI with your n8n API workflows:
import requests
def create_webscraping_ai_workflow():
"""Create n8n workflow that uses WebScraping.AI for robust scraping"""
workflow = {
"name": "WebScraping.AI Integration",
"nodes": [
{
"parameters": {},
"name": "Start",
"type": "n8n-nodes-base.start",
"typeVersion": 1,
"position": [250, 300]
},
{
"parameters": {
"url": "https://api.webscraping.ai/html",
"queryParameters": {
"parameters": [
{"name": "api_key", "value": "YOUR_API_KEY"},
{"name": "url", "value": "={{$json['target_url']}}"},
{"name": "js", "value": "true"},
{"name": "proxy", "value": "residential"},
{"name": "device", "value": "desktop"},
{"name": "timeout", "value": "30000"},
{"name": "js_timeout", "value": "5000"}
]
},
"method": "GET",
"options": {
"timeout": 60000
}
},
"name": "WebScraping.AI",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 3,
"position": [450, 300]
}
],
"connections": {
"Start": {
"main": [[{"node": "WebScraping.AI", "type": "main", "index": 0}]]
}
},
"active": False
}
return workflow
Conclusion
The n8n API provides powerful programmatic control over web scraping workflows, enabling developers to build sophisticated, scalable data extraction systems. By combining the n8n API with robust scraping services like WebScraping.AI, you can create production-ready scraping pipelines that handle authentication, JavaScript rendering, proxy rotation, and anti-bot protection automatically. Whether you're building a monitoring system, data aggregation platform, or research tool, the n8n API offers the flexibility and control needed for enterprise-grade web scraping operations.