What is the Firecrawl API reference documentation?
The Firecrawl API reference documentation is a comprehensive guide that outlines all available endpoints, parameters, request/response formats, and authentication methods for the Firecrawl web scraping and crawling service. It serves as the primary technical resource for developers integrating Firecrawl into their applications, providing detailed specifications for scraping web pages, crawling entire websites, and extracting structured data.
Overview of Firecrawl API Documentation
Firecrawl provides a RESTful API that allows developers to programmatically scrape and crawl websites with JavaScript rendering capabilities. The API documentation is typically hosted at docs.firecrawl.dev
and covers the following core components:
- Authentication: API key-based authentication system
- Endpoints:
/scrape
,/crawl
,/map
, and/extract
endpoints - Request formats: JSON-based request bodies with various configuration options
- Response structures: Standardized JSON responses with metadata and content
- Rate limits: Usage quotas and throttling policies
- Error handling: HTTP status codes and error messages
Core API Endpoints
Scrape Endpoint
The /scrape
endpoint is used to extract content from a single web page. It returns the page content in various formats including HTML, Markdown, and structured data.
Python Example:
import requests
API_KEY = 'your_api_key_here'
url = 'https://api.firecrawl.dev/v1/scrape'
headers = {
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
}
payload = {
'url': 'https://example.com',
'formats': ['markdown', 'html'],
'onlyMainContent': True,
'waitFor': 1000
}
response = requests.post(url, json=payload, headers=headers)
data = response.json()
print(data['data']['markdown'])
JavaScript Example:
const FIRECRAWL_API_KEY = 'your_api_key_here';
async function scrapePage(targetUrl) {
const response = await fetch('https://api.firecrawl.dev/v1/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: targetUrl,
formats: ['markdown', 'html'],
onlyMainContent: true,
includeTags: ['article', 'main'],
excludeTags: ['nav', 'footer']
})
});
const data = await response.json();
return data.data;
}
scrapePage('https://example.com')
.then(result => console.log(result.markdown))
.catch(error => console.error('Error:', error));
Crawl Endpoint
The /crawl
endpoint enables crawling multiple pages from a website, following links and extracting content according to specified rules. This is particularly useful when you need to handle AJAX requests across multiple pages or scrape entire website sections.
Python Example:
import requests
import time
API_KEY = 'your_api_key_here'
def start_crawl(start_url, max_pages=10):
url = 'https://api.firecrawl.dev/v1/crawl'
headers = {
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
}
payload = {
'url': start_url,
'limit': max_pages,
'scrapeOptions': {
'formats': ['markdown'],
'onlyMainContent': True
},
'crawlerOptions': {
'maxDepth': 3,
'allowBackwardLinks': False,
'allowExternalLinks': False
}
}
response = requests.post(url, json=payload, headers=headers)
result = response.json()
return result['id']
def check_crawl_status(crawl_id):
url = f'https://api.firecrawl.dev/v1/crawl/{crawl_id}'
headers = {
'Authorization': f'Bearer {API_KEY}'
}
response = requests.get(url, headers=headers)
return response.json()
# Start a crawl job
crawl_id = start_crawl('https://example.com', max_pages=50)
print(f'Crawl started with ID: {crawl_id}')
# Poll for completion
while True:
status = check_crawl_status(crawl_id)
if status['status'] == 'completed':
print(f'Crawl completed! Found {len(status["data"])} pages')
for page in status['data']:
print(f'URL: {page["metadata"]["url"]}')
break
elif status['status'] == 'failed':
print('Crawl failed:', status.get('error'))
break
time.sleep(5)
Map Endpoint
The /map
endpoint discovers all accessible URLs on a website without scraping their content, useful for understanding site structure before crawling.
JavaScript Example:
async function mapWebsite(siteUrl) {
const response = await fetch('https://api.firecrawl.dev/v1/map', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: siteUrl,
search: 'documentation', // Optional: filter URLs
ignoreSitemap: false,
includeSubdomains: false,
limit: 1000
})
});
const data = await response.json();
return data.links;
}
mapWebsite('https://docs.example.com')
.then(links => {
console.log(`Found ${links.length} URLs`);
links.forEach(link => console.log(link));
});
Extract Endpoint
The /extract
endpoint uses AI to extract structured data from web pages based on a schema you define, similar to how you might handle browser events to capture specific data points.
Python Example:
import requests
def extract_structured_data(url, schema):
api_url = 'https://api.firecrawl.dev/v1/extract'
headers = {
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
}
payload = {
'url': url,
'schema': schema
}
response = requests.post(api_url, json=payload, headers=headers)
return response.json()
# Define a schema for product data
product_schema = {
'type': 'object',
'properties': {
'name': {'type': 'string'},
'price': {'type': 'number'},
'currency': {'type': 'string'},
'description': {'type': 'string'},
'availability': {'type': 'string'},
'rating': {'type': 'number'},
'reviewCount': {'type': 'integer'}
},
'required': ['name', 'price']
}
result = extract_structured_data(
'https://example.com/product/12345',
product_schema
)
print(result['data'])
Authentication
Firecrawl uses Bearer token authentication. You need to include your API key in the Authorization header of every request:
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"]
}'
Advanced Configuration Options
JavaScript Rendering Options
Firecrawl supports advanced JavaScript rendering configurations, particularly useful when working with single page applications:
scrape_options = {
'url': 'https://spa-example.com',
'formats': ['markdown'],
'waitFor': 5000, # Wait 5 seconds for JS to load
'screenshot': True, # Capture screenshot
'headers': {
'User-Agent': 'Custom User Agent'
},
'actions': [
{
'type': 'click',
'selector': '#load-more-button'
},
{
'type': 'wait',
'milliseconds': 2000
},
{
'type': 'scroll',
'direction': 'down'
}
]
}
Proxy Configuration
const scrapeConfig = {
url: 'https://example.com',
formats: ['markdown'],
location: {
country: 'US',
languages: ['en-US']
},
mobile: false,
skipTlsVerification: false,
timeout: 30000
};
Response Structure
Standard Firecrawl API responses follow this structure:
{
"success": true,
"data": {
"markdown": "# Page Title\n\nContent here...",
"html": "<html>...</html>",
"metadata": {
"title": "Page Title",
"description": "Page description",
"language": "en",
"sourceURL": "https://example.com",
"statusCode": 200,
"error": null
},
"links": ["https://example.com/page1", "https://example.com/page2"]
}
}
Error Handling
Firecrawl uses standard HTTP status codes and provides detailed error messages:
def scrape_with_error_handling(url):
try:
response = requests.post(
'https://api.firecrawl.dev/v1/scrape',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={'url': url}
)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 401:
print('Authentication failed: Invalid API key')
elif response.status_code == 402:
print('Payment required: Insufficient credits')
elif response.status_code == 429:
print('Rate limit exceeded: Too many requests')
elif response.status_code == 500:
print('Server error: Try again later')
else:
print(f'HTTP error occurred: {e}')
except Exception as e:
print(f'An error occurred: {e}')
Rate Limits and Quotas
Firecrawl implements rate limiting based on your subscription tier:
- Free tier: 500 requests/month
- Starter tier: 10,000 requests/month, 10 requests/second
- Growth tier: 100,000 requests/month, 50 requests/second
- Enterprise tier: Custom limits
Monitor your usage through response headers:
response = requests.post(url, json=payload, headers=headers)
print(f"Rate limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"Reset time: {response.headers.get('X-RateLimit-Reset')}")
SDK Support
Firecrawl provides official SDKs for multiple languages:
Python SDK Installation
pip install firecrawl-py
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key='your_api_key')
# Scrape a single page
scrape_result = app.scrape_url('https://example.com',
params={'formats': ['markdown']})
# Crawl a website
crawl_result = app.crawl_url('https://example.com',
params={'limit': 100})
Node.js SDK Installation
npm install @mendable/firecrawl-js
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({apiKey: 'your_api_key'});
// Scrape a page
const scrapeResult = await app.scrapeUrl('https://example.com', {
formats: ['markdown', 'html']
});
// Crawl a website
const crawlResult = await app.crawlUrl('https://example.com', {
limit: 100,
scrapeOptions: {
formats: ['markdown']
}
});
Best Practices
- Use appropriate formats: Request only the formats you need (markdown, html, etc.) to reduce response size
- Implement retry logic: Handle rate limits and temporary failures with exponential backoff
- Set reasonable timeouts: Configure timeout values based on expected page load times
- Filter content effectively: Use
onlyMainContent
,includeTags
, andexcludeTags
to reduce noise - Monitor usage: Track your API usage to avoid unexpected quota exhaustion
- Cache responses: Store results locally when re-scraping the same URLs
- Use webhooks for crawls: For large crawl jobs, use webhook callbacks instead of polling
Conclusion
The Firecrawl API reference documentation provides a complete technical specification for integrating web scraping and crawling capabilities into your applications. By understanding the available endpoints, configuration options, and best practices, you can efficiently extract data from websites at scale. Whether you're scraping single pages, crawling entire sites, or extracting structured data with AI, Firecrawl's well-documented API makes it accessible through simple HTTP requests or official SDKs in your preferred programming language.
For developers needing even more control over browser automation, consider exploring how to programmatically interact with web pages using headless browser tools alongside Firecrawl's API capabilities.