How do I Integrate Firecrawl with n8n for Workflow Automation?
Integrating Firecrawl with n8n allows you to build powerful, automated web scraping workflows without writing extensive code. n8n is an open-source workflow automation tool that connects various services through a visual interface, making it ideal for developers who want to orchestrate complex data extraction pipelines.
This guide will walk you through setting up Firecrawl with n8n, from basic API calls to advanced multi-step workflows.
What is Firecrawl and n8n?
Firecrawl is a web scraping API that converts websites into clean, LLM-ready markdown or structured data. It handles JavaScript rendering, bypasses anti-bot measures, and provides features like crawling entire websites and extracting specific data fields.
n8n is a fair-code workflow automation platform that lets you connect APIs, databases, and services through a node-based interface. It's perfect for automating repetitive tasks like data collection, processing, and storage.
Together, they create a powerful combination for automated web scraping workflows.
Prerequisites
Before integrating Firecrawl with n8n, you'll need:
- Firecrawl API Key: Sign up at firecrawl.dev to get your API key
- n8n Instance: Either self-hosted or using n8n Cloud
- Basic Understanding: Familiarity with REST APIs and workflow concepts
Setting Up n8n
Installing n8n Locally
If you don't have n8n installed, you can set it up using npm or Docker:
# Using npm
npm install -g n8n
# Start n8n
n8n start
# Using Docker
docker run -it --rm \
--name n8n \
-p 5678:5678 \
-v ~/.n8n:/home/node/.n8n \
n8nio/n8n
Once running, access n8n at http://localhost:5678
.
n8n Cloud Alternative
For a managed solution, sign up at n8n.cloud to get started without installation.
Basic Firecrawl Integration with n8n
Method 1: Using HTTP Request Node
The most flexible way to integrate Firecrawl is through n8n's HTTP Request node, which gives you full control over API parameters.
Step 1: Create a New Workflow
- Open n8n and click "Add Workflow"
- Name your workflow (e.g., "Firecrawl Web Scraping")
Step 2: Add HTTP Request Node
- Click the "+" button to add a node
- Search for "HTTP Request" and select it
- Configure the node with these settings:
For Scraping a Single Page:
Method: POST
URL: https://api.firecrawl.dev/v0/scrape
Authentication: Header Auth
Header Name: Authorization
Header Value: Bearer YOUR_FIRECRAWL_API_KEY
Body Content Type: JSON
Body:
{
"url": "https://example.com",
"formats": ["markdown", "html"],
"onlyMainContent": true
}
Step 3: Test the Configuration
Click "Execute Node" to test your configuration. You should receive a response containing the scraped content in markdown and HTML formats.
Method 2: Creating a Firecrawl Credentials Type
For reusability across workflows, create a custom credential:
- Go to Settings → Credentials → New
- Create a generic credential type with your Firecrawl API key
- Reference this credential in your HTTP Request nodes
Advanced Firecrawl Workflows in n8n
Crawling Entire Websites
To crawl multiple pages from a website, use Firecrawl's /crawl
endpoint:
{
"url": "https://example.com",
"crawlerOptions": {
"maxDepth": 3,
"limit": 100
},
"formats": ["markdown"]
}
This initiates a crawl job. You'll need to:
- Store the job ID from the response
- Set up a polling mechanism using n8n's Wait and Loop nodes
- Check crawl status with GET requests to
/crawl/status/{id}
- Process results when the crawl completes
Here's a complete workflow structure:
Node 1: Start Crawl (HTTP Request)
- POST to /crawl
- Store job ID in workflow variable
Node 2: Wait Node - Wait 10 seconds between status checks
Node 3: Check Status (HTTP Request)
- GET to /crawl/status/{{$node["Start Crawl"].json["id"]}}
- Check if status is "completed"
Node 4: IF Node - If status = "completed", proceed to processing - If status = "active", loop back to Wait node
Node 5: Process Results - Extract and transform the scraped data
Extracting Structured Data
Firecrawl can extract structured data using its /scrape
endpoint with a schema:
{
"url": "https://example.com/products",
"formats": ["extract"],
"extract": {
"schema": {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
}
}
}
}
}
}
}
This approach is particularly useful when you need consistent data structures for downstream processing, similar to how you would handle AJAX requests using Puppeteer for dynamic content.
Real-World Workflow Examples
Example 1: Daily News Scraping
Workflow Overview: 1. Schedule Trigger: Runs daily at 9 AM 2. Firecrawl Scrape: Fetches news articles 3. Filter: Removes duplicates 4. Database: Stores articles in PostgreSQL 5. Slack Notification: Alerts team of new articles
Configuration:
{
"nodes": [
{
"name": "Daily Trigger",
"type": "n8n-nodes-base.cron",
"parameters": {
"triggerTimes": {
"hour": 9,
"minute": 0
}
}
},
{
"name": "Scrape News",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"method": "POST",
"url": "https://api.firecrawl.dev/v0/scrape",
"authentication": "headerAuth",
"bodyParametersJson": "={\"url\": \"https://news-site.com\", \"formats\": [\"markdown\"]}"
}
}
]
}
Example 2: Competitive Price Monitoring
Monitor competitor prices and get alerts when prices change:
Workflow: 1. Webhook Trigger: Activated by external system 2. Firecrawl Extract: Scrapes product prices with structured schema 3. Compare Prices: Checks against database 4. Condition Node: Triggers alert if price drops >10% 5. Email Alert: Notifies pricing team
Example 3: Content Aggregation Pipeline
Aggregate content from multiple sources for AI processing:
- CSV File: Contains list of URLs
- Split in Batches: Process 10 URLs at a time
- Firecrawl Scrape: Converts to markdown
- Data Transformation: Cleans and formats content
- Vector Database: Stores in Pinecone/Weaviate for RAG
Handling Authentication and Complex Scenarios
Scraping Protected Content
When dealing with websites requiring authentication, similar to handling authentication in Puppeteer, you can pass custom headers:
{
"url": "https://protected-site.com",
"formats": ["markdown"],
"headers": {
"Authorization": "Bearer USER_TOKEN",
"Cookie": "session=abc123"
}
}
Waiting for JavaScript Content
For heavily dynamic sites, configure wait times:
{
"url": "https://spa-website.com",
"formats": ["markdown"],
"waitFor": 5000,
"timeout": 30000
}
This is particularly useful when crawling single page applications (SPAs) that load content asynchronously.
Error Handling in n8n Workflows
Implement robust error handling to ensure workflow reliability:
Using Error Trigger Nodes
- Add an Error Trigger node to catch failed executions
- Configure retry logic with exponential backoff
- Send error notifications to monitoring systems
{
"name": "Error Handler",
"type": "n8n-nodes-base.errorTrigger",
"parameters": {
"errorType": "all"
}
}
Implementing Retry Logic
Use the Set node to track retry attempts:
// In a Function node
const maxRetries = 3;
const currentRetry = $node["Set"].json.retryCount || 0;
if (currentRetry < maxRetries) {
return {
retry: true,
retryCount: currentRetry + 1
};
}
return { retry: false };
Optimizing Performance
Batch Processing
When scraping multiple URLs, use batch processing to avoid rate limits:
// Function node to create batches
const urls = $input.all();
const batchSize = 5;
const batches = [];
for (let i = 0; i < urls.length; i += batchSize) {
batches.push(urls.slice(i, i + batchSize));
}
return batches.map(batch => ({ json: { urls: batch } }));
Caching Results
Implement caching to reduce API calls:
- Use n8n's Redis node to store results
- Check cache before making Firecrawl requests
- Set appropriate TTL for cached data
Data Storage and Export Options
Storing Scraped Data
n8n offers multiple storage options:
PostgreSQL:
-- Insert scraped data
INSERT INTO articles (title, content, url, scraped_at)
VALUES ($1, $2, $3, NOW())
Google Sheets: - Use the Google Sheets node to append rows - Ideal for non-technical team members to access data
Airtable: - Structured storage with relationships - Built-in views and filtering
S3/Cloud Storage: - Store raw HTML or markdown files - Best for large-scale archiving
Exporting Data
Create automated export workflows:
- Schedule Node: Daily export trigger
- Database Query: Fetch day's scraped data
- Convert to CSV: Format data
- Send Email: Attach CSV or upload to cloud storage
Monitoring and Debugging
Execution Logs
n8n provides detailed execution logs for each workflow run:
- View input/output data for each node
- Check execution time and errors
- Export logs for analysis
Setting Up Alerts
Create monitoring workflows:
Cron Trigger (every hour)
→ Check Failed Executions
→ IF failures > threshold
→ Send Alert (Slack/Email)
Performance Metrics
Track key metrics: - Scraping success rate - Average execution time - API credit usage - Error frequency
Security Best Practices
- API Key Management: Store Firecrawl API keys in n8n credentials, never in workflow JSON
- Environment Variables: Use n8n's environment variables for sensitive data
- Access Control: Limit workflow access to authorized users
- Data Encryption: Enable encryption for stored credentials
- Audit Logs: Review execution logs regularly
Cost Optimization
Minimizing API Calls
- Deduplication: Check if URL was recently scraped before making API call
- Conditional Scraping: Only scrape when content changes detected
- Smart Scheduling: Scrape during off-peak hours when possible
Credit Monitoring
Create a workflow to track Firecrawl API usage:
// Function node to calculate daily usage
const executions = $input.all();
const totalCredits = executions.reduce((sum, exec) => {
return sum + (exec.json.creditsUsed || 0);
}, 0);
return { json: { dailyCredits: totalCredits } };
Python Integration Example
While n8n provides a visual interface, you can also trigger n8n workflows from Python scripts:
import requests
# Trigger n8n workflow via webhook
webhook_url = "https://your-n8n-instance.com/webhook/firecrawl-scrape"
payload = {
"url": "https://example.com",
"formats": ["markdown"]
}
response = requests.post(webhook_url, json=payload)
result = response.json()
print(f"Scraping initiated: {result}")
Conclusion
Integrating Firecrawl with n8n creates a powerful automation platform for web scraping workflows. The visual interface of n8n combined with Firecrawl's robust scraping capabilities enables developers to build sophisticated data pipelines without extensive coding.
Key takeaways:
- Use HTTP Request nodes for flexible Firecrawl integration
- Implement proper error handling and retry logic
- Leverage n8n's ecosystem for data storage and notifications
- Monitor performance and optimize API usage
- Follow security best practices for credential management
Whether you're building a content aggregation system, competitive intelligence platform, or data enrichment pipeline, the Firecrawl-n8n combination provides the scalability and reliability needed for production workflows.
Start small with simple scraping tasks, then gradually build more complex workflows as you become familiar with both platforms. The investment in workflow automation pays dividends in saved development time and operational efficiency.