Table of contents

What is the Firecrawl API reference documentation?

The Firecrawl API reference documentation is a comprehensive guide that outlines all available endpoints, parameters, request/response formats, and authentication methods for the Firecrawl web scraping and crawling service. It serves as the primary technical resource for developers integrating Firecrawl into their applications, providing detailed specifications for scraping web pages, crawling entire websites, and extracting structured data.

Overview of Firecrawl API Documentation

Firecrawl provides a RESTful API that allows developers to programmatically scrape and crawl websites with JavaScript rendering capabilities. The API documentation is typically hosted at docs.firecrawl.dev and covers the following core components:

  • Authentication: API key-based authentication system
  • Endpoints: /scrape, /crawl, /map, and /extract endpoints
  • Request formats: JSON-based request bodies with various configuration options
  • Response structures: Standardized JSON responses with metadata and content
  • Rate limits: Usage quotas and throttling policies
  • Error handling: HTTP status codes and error messages

Core API Endpoints

Scrape Endpoint

The /scrape endpoint is used to extract content from a single web page. It returns the page content in various formats including HTML, Markdown, and structured data.

Python Example:

import requests

API_KEY = 'your_api_key_here'
url = 'https://api.firecrawl.dev/v1/scrape'

headers = {
    'Authorization': f'Bearer {API_KEY}',
    'Content-Type': 'application/json'
}

payload = {
    'url': 'https://example.com',
    'formats': ['markdown', 'html'],
    'onlyMainContent': True,
    'waitFor': 1000
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()

print(data['data']['markdown'])

JavaScript Example:

const FIRECRAWL_API_KEY = 'your_api_key_here';

async function scrapePage(targetUrl) {
  const response = await fetch('https://api.firecrawl.dev/v1/scrape', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${FIRECRAWL_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: targetUrl,
      formats: ['markdown', 'html'],
      onlyMainContent: true,
      includeTags: ['article', 'main'],
      excludeTags: ['nav', 'footer']
    })
  });

  const data = await response.json();
  return data.data;
}

scrapePage('https://example.com')
  .then(result => console.log(result.markdown))
  .catch(error => console.error('Error:', error));

Crawl Endpoint

The /crawl endpoint enables crawling multiple pages from a website, following links and extracting content according to specified rules. This is particularly useful when you need to handle AJAX requests across multiple pages or scrape entire website sections.

Python Example:

import requests
import time

API_KEY = 'your_api_key_here'

def start_crawl(start_url, max_pages=10):
    url = 'https://api.firecrawl.dev/v1/crawl'

    headers = {
        'Authorization': f'Bearer {API_KEY}',
        'Content-Type': 'application/json'
    }

    payload = {
        'url': start_url,
        'limit': max_pages,
        'scrapeOptions': {
            'formats': ['markdown'],
            'onlyMainContent': True
        },
        'crawlerOptions': {
            'maxDepth': 3,
            'allowBackwardLinks': False,
            'allowExternalLinks': False
        }
    }

    response = requests.post(url, json=payload, headers=headers)
    result = response.json()

    return result['id']

def check_crawl_status(crawl_id):
    url = f'https://api.firecrawl.dev/v1/crawl/{crawl_id}'

    headers = {
        'Authorization': f'Bearer {API_KEY}'
    }

    response = requests.get(url, headers=headers)
    return response.json()

# Start a crawl job
crawl_id = start_crawl('https://example.com', max_pages=50)
print(f'Crawl started with ID: {crawl_id}')

# Poll for completion
while True:
    status = check_crawl_status(crawl_id)

    if status['status'] == 'completed':
        print(f'Crawl completed! Found {len(status["data"])} pages')
        for page in status['data']:
            print(f'URL: {page["metadata"]["url"]}')
        break
    elif status['status'] == 'failed':
        print('Crawl failed:', status.get('error'))
        break

    time.sleep(5)

Map Endpoint

The /map endpoint discovers all accessible URLs on a website without scraping their content, useful for understanding site structure before crawling.

JavaScript Example:

async function mapWebsite(siteUrl) {
  const response = await fetch('https://api.firecrawl.dev/v1/map', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.FIRECRAWL_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      url: siteUrl,
      search: 'documentation', // Optional: filter URLs
      ignoreSitemap: false,
      includeSubdomains: false,
      limit: 1000
    })
  });

  const data = await response.json();
  return data.links;
}

mapWebsite('https://docs.example.com')
  .then(links => {
    console.log(`Found ${links.length} URLs`);
    links.forEach(link => console.log(link));
  });

Extract Endpoint

The /extract endpoint uses AI to extract structured data from web pages based on a schema you define, similar to how you might handle browser events to capture specific data points.

Python Example:

import requests

def extract_structured_data(url, schema):
    api_url = 'https://api.firecrawl.dev/v1/extract'

    headers = {
        'Authorization': f'Bearer {API_KEY}',
        'Content-Type': 'application/json'
    }

    payload = {
        'url': url,
        'schema': schema
    }

    response = requests.post(api_url, json=payload, headers=headers)
    return response.json()

# Define a schema for product data
product_schema = {
    'type': 'object',
    'properties': {
        'name': {'type': 'string'},
        'price': {'type': 'number'},
        'currency': {'type': 'string'},
        'description': {'type': 'string'},
        'availability': {'type': 'string'},
        'rating': {'type': 'number'},
        'reviewCount': {'type': 'integer'}
    },
    'required': ['name', 'price']
}

result = extract_structured_data(
    'https://example.com/product/12345',
    product_schema
)

print(result['data'])

Authentication

Firecrawl uses Bearer token authentication. You need to include your API key in the Authorization header of every request:

curl -X POST https://api.firecrawl.dev/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown"]
  }'

Advanced Configuration Options

JavaScript Rendering Options

Firecrawl supports advanced JavaScript rendering configurations, particularly useful when working with single page applications:

scrape_options = {
    'url': 'https://spa-example.com',
    'formats': ['markdown'],
    'waitFor': 5000,  # Wait 5 seconds for JS to load
    'screenshot': True,  # Capture screenshot
    'headers': {
        'User-Agent': 'Custom User Agent'
    },
    'actions': [
        {
            'type': 'click',
            'selector': '#load-more-button'
        },
        {
            'type': 'wait',
            'milliseconds': 2000
        },
        {
            'type': 'scroll',
            'direction': 'down'
        }
    ]
}

Proxy Configuration

const scrapeConfig = {
  url: 'https://example.com',
  formats: ['markdown'],
  location: {
    country: 'US',
    languages: ['en-US']
  },
  mobile: false,
  skipTlsVerification: false,
  timeout: 30000
};

Response Structure

Standard Firecrawl API responses follow this structure:

{
  "success": true,
  "data": {
    "markdown": "# Page Title\n\nContent here...",
    "html": "<html>...</html>",
    "metadata": {
      "title": "Page Title",
      "description": "Page description",
      "language": "en",
      "sourceURL": "https://example.com",
      "statusCode": 200,
      "error": null
    },
    "links": ["https://example.com/page1", "https://example.com/page2"]
  }
}

Error Handling

Firecrawl uses standard HTTP status codes and provides detailed error messages:

def scrape_with_error_handling(url):
    try:
        response = requests.post(
            'https://api.firecrawl.dev/v1/scrape',
            headers={
                'Authorization': f'Bearer {API_KEY}',
                'Content-Type': 'application/json'
            },
            json={'url': url}
        )

        response.raise_for_status()
        return response.json()

    except requests.exceptions.HTTPError as e:
        if response.status_code == 401:
            print('Authentication failed: Invalid API key')
        elif response.status_code == 402:
            print('Payment required: Insufficient credits')
        elif response.status_code == 429:
            print('Rate limit exceeded: Too many requests')
        elif response.status_code == 500:
            print('Server error: Try again later')
        else:
            print(f'HTTP error occurred: {e}')
    except Exception as e:
        print(f'An error occurred: {e}')

Rate Limits and Quotas

Firecrawl implements rate limiting based on your subscription tier:

  • Free tier: 500 requests/month
  • Starter tier: 10,000 requests/month, 10 requests/second
  • Growth tier: 100,000 requests/month, 50 requests/second
  • Enterprise tier: Custom limits

Monitor your usage through response headers:

response = requests.post(url, json=payload, headers=headers)

print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")

SDK Support

Firecrawl provides official SDKs for multiple languages:

Python SDK Installation

pip install firecrawl-py
from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key')

# Scrape a single page
scrape_result = app.scrape_url('https://example.com',
                                params={'formats': ['markdown']})

# Crawl a website
crawl_result = app.crawl_url('https://example.com',
                              params={'limit': 100})

Node.js SDK Installation

npm install @mendable/firecrawl-js
import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({apiKey: 'your_api_key'});

// Scrape a page
const scrapeResult = await app.scrapeUrl('https://example.com', {
  formats: ['markdown', 'html']
});

// Crawl a website
const crawlResult = await app.crawlUrl('https://example.com', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown']
  }
});

Best Practices

  1. Use appropriate formats: Request only the formats you need (markdown, html, etc.) to reduce response size
  2. Implement retry logic: Handle rate limits and temporary failures with exponential backoff
  3. Set reasonable timeouts: Configure timeout values based on expected page load times
  4. Filter content effectively: Use onlyMainContent, includeTags, and excludeTags to reduce noise
  5. Monitor usage: Track your API usage to avoid unexpected quota exhaustion
  6. Cache responses: Store results locally when re-scraping the same URLs
  7. Use webhooks for crawls: For large crawl jobs, use webhook callbacks instead of polling

Conclusion

The Firecrawl API reference documentation provides a complete technical specification for integrating web scraping and crawling capabilities into your applications. By understanding the available endpoints, configuration options, and best practices, you can efficiently extract data from websites at scale. Whether you're scraping single pages, crawling entire sites, or extracting structured data with AI, Firecrawl's well-documented API makes it accessible through simple HTTP requests or official SDKs in your preferred programming language.

For developers needing even more control over browser automation, consider exploring how to programmatically interact with web pages using headless browser tools alongside Firecrawl's API capabilities.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon