Table of contents

How do I Integrate Firecrawl with n8n for Workflow Automation?

Integrating Firecrawl with n8n allows you to build powerful, automated web scraping workflows without writing extensive code. n8n is an open-source workflow automation tool that connects various services through a visual interface, making it ideal for developers who want to orchestrate complex data extraction pipelines.

This guide will walk you through setting up Firecrawl with n8n, from basic API calls to advanced multi-step workflows.

What is Firecrawl and n8n?

Firecrawl is a web scraping API that converts websites into clean, LLM-ready markdown or structured data. It handles JavaScript rendering, bypasses anti-bot measures, and provides features like crawling entire websites and extracting specific data fields.

n8n is a fair-code workflow automation platform that lets you connect APIs, databases, and services through a node-based interface. It's perfect for automating repetitive tasks like data collection, processing, and storage.

Together, they create a powerful combination for automated web scraping workflows.

Prerequisites

Before integrating Firecrawl with n8n, you'll need:

  1. Firecrawl API Key: Sign up at firecrawl.dev to get your API key
  2. n8n Instance: Either self-hosted or using n8n Cloud
  3. Basic Understanding: Familiarity with REST APIs and workflow concepts

Setting Up n8n

Installing n8n Locally

If you don't have n8n installed, you can set it up using npm or Docker:

# Using npm
npm install -g n8n

# Start n8n
n8n start

# Using Docker
docker run -it --rm \
  --name n8n \
  -p 5678:5678 \
  -v ~/.n8n:/home/node/.n8n \
  n8nio/n8n

Once running, access n8n at http://localhost:5678.

n8n Cloud Alternative

For a managed solution, sign up at n8n.cloud to get started without installation.

Basic Firecrawl Integration with n8n

Method 1: Using HTTP Request Node

The most flexible way to integrate Firecrawl is through n8n's HTTP Request node, which gives you full control over API parameters.

Step 1: Create a New Workflow

  1. Open n8n and click "Add Workflow"
  2. Name your workflow (e.g., "Firecrawl Web Scraping")

Step 2: Add HTTP Request Node

  1. Click the "+" button to add a node
  2. Search for "HTTP Request" and select it
  3. Configure the node with these settings:

For Scraping a Single Page:

Method: POST
URL: https://api.firecrawl.dev/v0/scrape
Authentication: Header Auth
  Header Name: Authorization
  Header Value: Bearer YOUR_FIRECRAWL_API_KEY

Body Content Type: JSON
Body:
{
  "url": "https://example.com",
  "formats": ["markdown", "html"],
  "onlyMainContent": true
}

Step 3: Test the Configuration

Click "Execute Node" to test your configuration. You should receive a response containing the scraped content in markdown and HTML formats.

Method 2: Creating a Firecrawl Credentials Type

For reusability across workflows, create a custom credential:

  1. Go to Settings → Credentials → New
  2. Create a generic credential type with your Firecrawl API key
  3. Reference this credential in your HTTP Request nodes

Advanced Firecrawl Workflows in n8n

Crawling Entire Websites

To crawl multiple pages from a website, use Firecrawl's /crawl endpoint:

{
  "url": "https://example.com",
  "crawlerOptions": {
    "maxDepth": 3,
    "limit": 100
  },
  "formats": ["markdown"]
}

This initiates a crawl job. You'll need to:

  1. Store the job ID from the response
  2. Set up a polling mechanism using n8n's Wait and Loop nodes
  3. Check crawl status with GET requests to /crawl/status/{id}
  4. Process results when the crawl completes

Here's a complete workflow structure:

Node 1: Start Crawl (HTTP Request) - POST to /crawl - Store job ID in workflow variable

Node 2: Wait Node - Wait 10 seconds between status checks

Node 3: Check Status (HTTP Request) - GET to /crawl/status/{{$node["Start Crawl"].json["id"]}} - Check if status is "completed"

Node 4: IF Node - If status = "completed", proceed to processing - If status = "active", loop back to Wait node

Node 5: Process Results - Extract and transform the scraped data

Extracting Structured Data

Firecrawl can extract structured data using its /scrape endpoint with a schema:

{
  "url": "https://example.com/products",
  "formats": ["extract"],
  "extract": {
    "schema": {
      "type": "object",
      "properties": {
        "products": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "name": { "type": "string" },
              "price": { "type": "number" },
              "description": { "type": "string" }
            }
          }
        }
      }
    }
  }
}

This approach is particularly useful when you need consistent data structures for downstream processing, similar to how you would handle AJAX requests using Puppeteer for dynamic content.

Real-World Workflow Examples

Example 1: Daily News Scraping

Workflow Overview: 1. Schedule Trigger: Runs daily at 9 AM 2. Firecrawl Scrape: Fetches news articles 3. Filter: Removes duplicates 4. Database: Stores articles in PostgreSQL 5. Slack Notification: Alerts team of new articles

Configuration:

{
  "nodes": [
    {
      "name": "Daily Trigger",
      "type": "n8n-nodes-base.cron",
      "parameters": {
        "triggerTimes": {
          "hour": 9,
          "minute": 0
        }
      }
    },
    {
      "name": "Scrape News",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "https://api.firecrawl.dev/v0/scrape",
        "authentication": "headerAuth",
        "bodyParametersJson": "={\"url\": \"https://news-site.com\", \"formats\": [\"markdown\"]}"
      }
    }
  ]
}

Example 2: Competitive Price Monitoring

Monitor competitor prices and get alerts when prices change:

Workflow: 1. Webhook Trigger: Activated by external system 2. Firecrawl Extract: Scrapes product prices with structured schema 3. Compare Prices: Checks against database 4. Condition Node: Triggers alert if price drops >10% 5. Email Alert: Notifies pricing team

Example 3: Content Aggregation Pipeline

Aggregate content from multiple sources for AI processing:

  1. CSV File: Contains list of URLs
  2. Split in Batches: Process 10 URLs at a time
  3. Firecrawl Scrape: Converts to markdown
  4. Data Transformation: Cleans and formats content
  5. Vector Database: Stores in Pinecone/Weaviate for RAG

Handling Authentication and Complex Scenarios

Scraping Protected Content

When dealing with websites requiring authentication, similar to handling authentication in Puppeteer, you can pass custom headers:

{
  "url": "https://protected-site.com",
  "formats": ["markdown"],
  "headers": {
    "Authorization": "Bearer USER_TOKEN",
    "Cookie": "session=abc123"
  }
}

Waiting for JavaScript Content

For heavily dynamic sites, configure wait times:

{
  "url": "https://spa-website.com",
  "formats": ["markdown"],
  "waitFor": 5000,
  "timeout": 30000
}

This is particularly useful when crawling single page applications (SPAs) that load content asynchronously.

Error Handling in n8n Workflows

Implement robust error handling to ensure workflow reliability:

Using Error Trigger Nodes

  1. Add an Error Trigger node to catch failed executions
  2. Configure retry logic with exponential backoff
  3. Send error notifications to monitoring systems
{
  "name": "Error Handler",
  "type": "n8n-nodes-base.errorTrigger",
  "parameters": {
    "errorType": "all"
  }
}

Implementing Retry Logic

Use the Set node to track retry attempts:

// In a Function node
const maxRetries = 3;
const currentRetry = $node["Set"].json.retryCount || 0;

if (currentRetry < maxRetries) {
  return {
    retry: true,
    retryCount: currentRetry + 1
  };
}

return { retry: false };

Optimizing Performance

Batch Processing

When scraping multiple URLs, use batch processing to avoid rate limits:

// Function node to create batches
const urls = $input.all();
const batchSize = 5;
const batches = [];

for (let i = 0; i < urls.length; i += batchSize) {
  batches.push(urls.slice(i, i + batchSize));
}

return batches.map(batch => ({ json: { urls: batch } }));

Caching Results

Implement caching to reduce API calls:

  1. Use n8n's Redis node to store results
  2. Check cache before making Firecrawl requests
  3. Set appropriate TTL for cached data

Data Storage and Export Options

Storing Scraped Data

n8n offers multiple storage options:

PostgreSQL:

-- Insert scraped data
INSERT INTO articles (title, content, url, scraped_at)
VALUES ($1, $2, $3, NOW())

Google Sheets: - Use the Google Sheets node to append rows - Ideal for non-technical team members to access data

Airtable: - Structured storage with relationships - Built-in views and filtering

S3/Cloud Storage: - Store raw HTML or markdown files - Best for large-scale archiving

Exporting Data

Create automated export workflows:

  1. Schedule Node: Daily export trigger
  2. Database Query: Fetch day's scraped data
  3. Convert to CSV: Format data
  4. Send Email: Attach CSV or upload to cloud storage

Monitoring and Debugging

Execution Logs

n8n provides detailed execution logs for each workflow run:

  • View input/output data for each node
  • Check execution time and errors
  • Export logs for analysis

Setting Up Alerts

Create monitoring workflows:

Cron Trigger (every hour)
  → Check Failed Executions
    → IF failures > threshold
      → Send Alert (Slack/Email)

Performance Metrics

Track key metrics: - Scraping success rate - Average execution time - API credit usage - Error frequency

Security Best Practices

  1. API Key Management: Store Firecrawl API keys in n8n credentials, never in workflow JSON
  2. Environment Variables: Use n8n's environment variables for sensitive data
  3. Access Control: Limit workflow access to authorized users
  4. Data Encryption: Enable encryption for stored credentials
  5. Audit Logs: Review execution logs regularly

Cost Optimization

Minimizing API Calls

  1. Deduplication: Check if URL was recently scraped before making API call
  2. Conditional Scraping: Only scrape when content changes detected
  3. Smart Scheduling: Scrape during off-peak hours when possible

Credit Monitoring

Create a workflow to track Firecrawl API usage:

// Function node to calculate daily usage
const executions = $input.all();
const totalCredits = executions.reduce((sum, exec) => {
  return sum + (exec.json.creditsUsed || 0);
}, 0);

return { json: { dailyCredits: totalCredits } };

Python Integration Example

While n8n provides a visual interface, you can also trigger n8n workflows from Python scripts:

import requests

# Trigger n8n workflow via webhook
webhook_url = "https://your-n8n-instance.com/webhook/firecrawl-scrape"

payload = {
    "url": "https://example.com",
    "formats": ["markdown"]
}

response = requests.post(webhook_url, json=payload)
result = response.json()

print(f"Scraping initiated: {result}")

Conclusion

Integrating Firecrawl with n8n creates a powerful automation platform for web scraping workflows. The visual interface of n8n combined with Firecrawl's robust scraping capabilities enables developers to build sophisticated data pipelines without extensive coding.

Key takeaways:

  • Use HTTP Request nodes for flexible Firecrawl integration
  • Implement proper error handling and retry logic
  • Leverage n8n's ecosystem for data storage and notifications
  • Monitor performance and optimize API usage
  • Follow security best practices for credential management

Whether you're building a content aggregation system, competitive intelligence platform, or data enrichment pipeline, the Firecrawl-n8n combination provides the scalability and reliability needed for production workflows.

Start small with simple scraping tasks, then gradually build more complex workflows as you become familiar with both platforms. The investment in workflow automation pays dividends in saved development time and operational efficiency.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon