What is the Firecrawl API reference documentation?

The Firecrawl API reference documentation is a comprehensive guide that outlines all available endpoints, parameters, request/response formats, and authentication methods for the Firecrawl web scraping and crawling service. It serves as the primary technical resource for developers integrating Firecrawl into their applications, providing detailed specifications for scraping web pages, crawling entire websites, and extracting structured data.

Overview of Firecrawl API Documentation

Firecrawl provides a RESTful API that allows developers to programmatically scrape and crawl websites with JavaScript rendering capabilities. The API documentation is typically hosted at docs.firecrawl.dev and covers the following core components:

Authentication: API key-based authentication system
Endpoints: /scrape, /crawl, /map, and /extract endpoints
Request formats: JSON-based request bodies with various configuration options
Response structures: Standardized JSON responses with metadata and content
Rate limits: Usage quotas and throttling policies
Error handling: HTTP status codes and error messages

Core API Endpoints

Scrape Endpoint

The /scrape endpoint is used to extract content from a single web page. It returns the page content in various formats including HTML, Markdown, and structured data.

Python Example:

import requests

API_KEY = 'your_api_key_here'
url = 'https://api.firecrawl.dev/v1/scrape'

headers = {
    'Authorization': f'Bearer {API_KEY}',
    'Content-Type': 'application/json'
}

payload = {
    'url': 'https://example.com',
    'formats': ['markdown', 'html'],
    'onlyMainContent': True,
    'waitFor': 1000
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

print(data['data']['markdown'])

JavaScript Example:

const FIRECRAWL_API_KEY = 'your_api_key_here';

async function scrapePage(targetUrl) {
  const response = await fetch('https://api.firecrawl.dev/v1/scrape', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${FIRECRAWL_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: targetUrl,
      formats: ['markdown', 'html'],
      onlyMainContent: true,
      includeTags: ['article', 'main'],
      excludeTags: ['nav', 'footer']
    })
  });

  const data = await response.json();
  return data.data;
}

scrapePage('https://example.com')
  .then(result => console.log(result.markdown))
  .catch(error => console.error('Error:', error));

Crawl Endpoint

The /crawl endpoint enables crawling multiple pages from a website, following links and extracting content according to specified rules. This is particularly useful when you need to handle AJAX requests across multiple pages or scrape entire website sections.

Python Example:

import requests
import time

API_KEY = 'your_api_key_here'

def start_crawl(start_url, max_pages=10):
    url = 'https://api.firecrawl.dev/v1/crawl'

    headers = {
        'Authorization': f'Bearer {API_KEY}',
        'Content-Type': 'application/json'
    }

    payload = {
        'url': start_url,
        'limit': max_pages,
        'scrapeOptions': {
            'formats': ['markdown'],
            'onlyMainContent': True
        },
        'crawlerOptions': {
            'maxDepth': 3,
            'allowBackwardLinks': False,
            'allowExternalLinks': False
        }
    }

    response = requests.post(url, json=payload, headers=headers)
    result = response.json()

    return result['id']

def check_crawl_status(crawl_id):
    url = f'https://api.firecrawl.dev/v1/crawl/{crawl_id}'

    headers = {
        'Authorization': f'Bearer {API_KEY}'
    }

    response = requests.get(url, headers=headers)
    return response.json()

# Start a crawl job
crawl_id = start_crawl('https://example.com', max_pages=50)
print(f'Crawl started with ID: {crawl_id}')

# Poll for completion
while True:
    status = check_crawl_status(crawl_id)

    if status['status'] == 'completed':
        print(f'Crawl completed! Found {len(status["data"])} pages')
        for page in status['data']:
            print(f'URL: {page["metadata"]["url"]}')
        break
    elif status['status'] == 'failed':
        print('Crawl failed:', status.get('error'))
        break

    time.sleep(5)

Map Endpoint

The /map endpoint discovers all accessible URLs on a website without scraping their content, useful for understanding site structure before crawling.

JavaScript Example:

async function mapWebsite(siteUrl) {
  const response = await fetch('https://api.firecrawl.dev/v1/map', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.FIRECRAWL_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: siteUrl,
      search: 'documentation', // Optional: filter URLs
      ignoreSitemap: false,
      includeSubdomains: false,
      limit: 1000
    })
  });

  const data = await response.json();
  return data.links;
}

mapWebsite('https://docs.example.com')
  .then(links => {
    console.log(`Found ${links.length} URLs`);
    links.forEach(link => console.log(link));
  });

Extract Endpoint

The /extract endpoint uses AI to extract structured data from web pages based on a schema you define, similar to how you might handle browser events to capture specific data points.

Python Example:

import requests

def extract_structured_data(url, schema):
    api_url = 'https://api.firecrawl.dev/v1/extract'

    headers = {
        'Authorization': f'Bearer {API_KEY}',
        'Content-Type': 'application/json'
    }

    payload = {
        'url': url,
        'schema': schema
    }

    response = requests.post(api_url, json=payload, headers=headers)
    return response.json()

# Define a schema for product data
product_schema = {
    'type': 'object',
    'properties': {
        'name': {'type': 'string'},
        'price': {'type': 'number'},
        'currency': {'type': 'string'},
        'description': {'type': 'string'},
        'availability': {'type': 'string'},
        'rating': {'type': 'number'},
        'reviewCount': {'type': 'integer'}
    },
    'required': ['name', 'price']
}

result = extract_structured_data(
    'https://example.com/product/12345',
    product_schema
)

print(result['data'])

Authentication

Firecrawl uses Bearer token authentication. You need to include your API key in the Authorization header of every request:

curl -X POST https://api.firecrawl.dev/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown"]
  }'

Advanced Configuration Options

JavaScript Rendering Options

Firecrawl supports advanced JavaScript rendering configurations, particularly useful when working with single page applications:

scrape_options = {
    'url': 'https://spa-example.com',
    'formats': ['markdown'],
    'waitFor': 5000,  # Wait 5 seconds for JS to load
    'screenshot': True,  # Capture screenshot
    'headers': {
        'User-Agent': 'Custom User Agent'
    },
    'actions': [
        {
            'type': 'click',
            'selector': '#load-more-button'
        },
        {
            'type': 'wait',
            'milliseconds': 2000
        },
        {
            'type': 'scroll',
            'direction': 'down'
        }
    ]
}

Proxy Configuration

const scrapeConfig = {
  url: 'https://example.com',
  formats: ['markdown'],
  location: {
    country: 'US',
    languages: ['en-US']
  },
  mobile: false,
  skipTlsVerification: false,
  timeout: 30000
};

Response Structure

Standard Firecrawl API responses follow this structure:

{
  "success": true,
  "data": {
    "markdown": "# Page Title\n\nContent here...",
    "html": "<html>...</html>",
    "metadata": {
      "title": "Page Title",
      "description": "Page description",
      "language": "en",
      "sourceURL": "https://example.com",
      "statusCode": 200,
      "error": null
    },
    "links": ["https://example.com/page1", "https://example.com/page2"]
  }
}

Error Handling

Firecrawl uses standard HTTP status codes and provides detailed error messages:

def scrape_with_error_handling(url):
    try:
        response = requests.post(
            'https://api.firecrawl.dev/v1/scrape',
            headers={
                'Authorization': f'Bearer {API_KEY}',
                'Content-Type': 'application/json'
            },
            json={'url': url}
        )

        response.raise_for_status()
        return response.json()

    except requests.exceptions.HTTPError as e:
        if response.status_code == 401:
            print('Authentication failed: Invalid API key')
        elif response.status_code == 402:
            print('Payment required: Insufficient credits')
        elif response.status_code == 429:
            print('Rate limit exceeded: Too many requests')
        elif response.status_code == 500:
            print('Server error: Try again later')
        else:
            print(f'HTTP error occurred: {e}')
    except Exception as e:
        print(f'An error occurred: {e}')

Rate Limits and Quotas

Firecrawl implements rate limiting based on your subscription tier:

Free tier: 500 requests/month
Starter tier: 10,000 requests/month, 10 requests/second
Growth tier: 100,000 requests/month, 50 requests/second
Enterprise tier: Custom limits

Monitor your usage through response headers:

response = requests.post(url, json=payload, headers=headers)

print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")

SDK Support

Firecrawl provides official SDKs for multiple languages:

Python SDK Installation

pip install firecrawl-py

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key')

# Scrape a single page
scrape_result = app.scrape_url('https://example.com',
                                params={'formats': ['markdown']})

# Crawl a website
crawl_result = app.crawl_url('https://example.com',
                              params={'limit': 100})

Node.js SDK Installation

npm install @mendable/firecrawl-js

import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({apiKey: 'your_api_key'});

// Scrape a page
const scrapeResult = await app.scrapeUrl('https://example.com', {
  formats: ['markdown', 'html']
});

// Crawl a website
const crawlResult = await app.crawlUrl('https://example.com', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown']
  }
});

Best Practices

Use appropriate formats: Request only the formats you need (markdown, html, etc.) to reduce response size
Implement retry logic: Handle rate limits and temporary failures with exponential backoff
Set reasonable timeouts: Configure timeout values based on expected page load times
Filter content effectively: Use onlyMainContent, includeTags, and excludeTags to reduce noise
Monitor usage: Track your API usage to avoid unexpected quota exhaustion
Cache responses: Store results locally when re-scraping the same URLs
Use webhooks for crawls: For large crawl jobs, use webhook callbacks instead of polling

Conclusion

The Firecrawl API reference documentation provides a complete technical specification for integrating web scraping and crawling capabilities into your applications. By understanding the available endpoints, configuration options, and best practices, you can efficiently extract data from websites at scale. Whether you're scraping single pages, crawling entire sites, or extracting structured data with AI, Firecrawl's well-documented API makes it accessible through simple HTTP requests or official SDKs in your preferred programming language.

For developers needing even more control over browser automation, consider exploring how to programmatically interact with web pages using headless browser tools alongside Firecrawl's API capabilities.

Table of contents

What is the Firecrawl API reference documentation?

Overview of Firecrawl API Documentation

Core API Endpoints

Scrape Endpoint

Crawl Endpoint

Map Endpoint

Extract Endpoint

Authentication

Advanced Configuration Options

JavaScript Rendering Options

Proxy Configuration

Response Structure

Error Handling

Rate Limits and Quotas

SDK Support

Python SDK Installation

Node.js SDK Installation

Best Practices

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

How do I integrate Firecrawl with n8n for workflow automation?

Can I use Firecrawl for batch web scraping tasks?

How does Firecrawl handle API throttling and rate limiting?

Get Started Now

Support