How do I Deploy Firecrawl with Docker?
Deploying Firecrawl with Docker is the recommended approach for both development and production environments. Docker containerization ensures consistent behavior across different systems, simplifies dependency management, and makes scaling easier. This guide covers everything you need to know about deploying Firecrawl using Docker.
Prerequisites
Before deploying Firecrawl with Docker, ensure you have the following installed:
- Docker: Version 20.10 or later
- Docker Compose: Version 2.0 or later (optional but recommended)
- Git: For cloning the Firecrawl repository
You can verify your Docker installation with:
docker --version
docker-compose --version
Quick Start with Docker
The fastest way to get Firecrawl running is using the official Docker image:
# Pull the latest Firecrawl image
docker pull mendableai/firecrawl:latest
# Run Firecrawl with basic configuration
docker run -d \
--name firecrawl \
-p 3002:3002 \
-e REDIS_URL=redis://redis:6379 \
-e PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000 \
mendableai/firecrawl:latest
However, for a production-ready deployment, using Docker Compose is recommended as Firecrawl requires several supporting services.
Deploying with Docker Compose
Docker Compose allows you to manage multiple containers as a single application. Here's a complete docker-compose.yml
configuration for Firecrawl:
version: '3.8'
services:
firecrawl:
image: mendableai/firecrawl:latest
container_name: firecrawl-api
ports:
- "3002:3002"
environment:
- NODE_ENV=production
- REDIS_URL=redis://redis:6379
- PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000
- PORT=3002
- NUM_WORKERS=8
- API_KEY=${FIRECRAWL_API_KEY}
depends_on:
- redis
- playwright-service
restart: unless-stopped
networks:
- firecrawl-network
redis:
image: redis:7-alpine
container_name: firecrawl-redis
ports:
- "6379:6379"
volumes:
- redis-data:/data
restart: unless-stopped
networks:
- firecrawl-network
playwright-service:
image: browserless/chrome:latest
container_name: firecrawl-playwright
ports:
- "3000:3000"
environment:
- MAX_CONCURRENT_SESSIONS=10
- CONNECTION_TIMEOUT=60000
restart: unless-stopped
networks:
- firecrawl-network
volumes:
redis-data:
driver: local
networks:
firecrawl-network:
driver: bridge
Save this configuration as docker-compose.yml
and start the services:
# Set your API key (optional but recommended for production)
export FIRECRAWL_API_KEY=your-secret-api-key
# Start all services
docker-compose up -d
# Check service status
docker-compose ps
# View logs
docker-compose logs -f firecrawl
Building from Source
If you need to customize Firecrawl or build from the latest source code:
# Clone the Firecrawl repository
git clone https://github.com/mendableai/firecrawl.git
cd firecrawl
# Build the Docker image
docker build -t firecrawl-custom:latest .
# Or use Docker Compose to build
docker-compose build
Update your docker-compose.yml
to use the custom image:
services:
firecrawl:
build: .
# ... rest of configuration
Environment Variables and Configuration
Firecrawl supports numerous environment variables for customization:
Core Configuration
# Server settings
PORT=3002
NODE_ENV=production
HOST=0.0.0.0
# Redis configuration
REDIS_URL=redis://redis:6379
REDIS_RATE_LIMIT_URL=redis://redis:6379
# Playwright/Browser service
PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000
Worker and Performance Settings
# Number of worker processes
NUM_WORKERS=8
# Maximum concurrent jobs
MAX_JOBS_PER_WORKER=10
# Timeout settings (in milliseconds)
PAGE_TIMEOUT=30000
SCRAPE_TIMEOUT=60000
Security Settings
# API authentication
API_KEY=your-secret-api-key
RATE_LIMIT_ENABLED=true
RATE_LIMIT_MAX_REQUESTS=100
RATE_LIMIT_WINDOW_MS=60000
Production Deployment Best Practices
1. Use Environment Files
Create a .env
file for sensitive configuration:
# .env
FIRECRAWL_API_KEY=your-production-api-key
REDIS_PASSWORD=secure-redis-password
NODE_ENV=production
Reference it in your docker-compose.yml
:
services:
firecrawl:
env_file:
- .env
2. Implement Health Checks
Add health checks to ensure containers are running properly:
services:
firecrawl:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3002/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
3. Configure Resource Limits
Prevent containers from consuming excessive resources:
services:
firecrawl:
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
4. Set Up Persistent Storage
Ensure data persistence for Redis:
services:
redis:
volumes:
- ./redis-data:/data
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
5. Enable Logging
Configure proper logging for debugging and monitoring:
services:
firecrawl:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Running Browser-Based Scraping with Docker
Since Firecrawl uses Playwright for JavaScript-rendered pages, similar to how Puppeteer handles browser automation with Docker, you need to ensure proper browser service configuration:
playwright-service:
image: browserless/chrome:latest
environment:
- MAX_CONCURRENT_SESSIONS=10
- ENABLE_DEBUGGER=false
- PREBOOT_CHROME=true
shm_size: 2gb # Important for Chrome stability
The shm_size
setting is critical for preventing browser crashes in containerized environments.
Scaling Firecrawl with Docker
Horizontal Scaling with Docker Swarm
For handling high traffic, deploy multiple Firecrawl instances:
# Initialize Docker Swarm
docker swarm init
# Deploy stack
docker stack deploy -c docker-compose.yml firecrawl
# Scale the service
docker service scale firecrawl_firecrawl=5
Using a Reverse Proxy
Add Nginx for load balancing:
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
depends_on:
- firecrawl
networks:
- firecrawl-network
Monitoring and Debugging
Viewing Logs
# View all logs
docker-compose logs
# Follow logs for specific service
docker-compose logs -f firecrawl
# View last 100 lines
docker-compose logs --tail=100 firecrawl
Accessing Container Shell
# Access Firecrawl container
docker exec -it firecrawl-api /bin/sh
# Check running processes
docker top firecrawl-api
Monitoring Resource Usage
# Monitor real-time resource usage
docker stats
# Inspect specific container
docker inspect firecrawl-api
Troubleshooting Common Issues
Browser Service Connection Errors
If Firecrawl can't connect to the Playwright service, ensure proper network configuration:
# Test connectivity between containers
docker exec firecrawl-api ping playwright-service
# Check if Playwright service is running
docker-compose ps playwright-service
Redis Connection Issues
Verify Redis connectivity:
# Connect to Redis CLI
docker exec -it firecrawl-redis redis-cli
# Test connection
docker exec firecrawl-redis redis-cli ping
Memory Issues with Browser Automation
Browser-based scraping can be memory-intensive. When handling timeouts in browser automation, ensure adequate resources:
playwright-service:
shm_size: 2gb
deploy:
resources:
limits:
memory: 4G
Security Considerations
Network Isolation
Use Docker networks to isolate services:
networks:
firecrawl-network:
driver: bridge
internal: false
internal-network:
driver: bridge
internal: true # No external access
Running as Non-Root User
Modify your Dockerfile to run as a non-privileged user:
FROM node:18-alpine
# Create app user
RUN addgroup -g 1001 -S firecrawl && \
adduser -S firecrawl -u 1001
USER firecrawl
# ... rest of Dockerfile
Enable TLS/SSL
For production deployments, always use HTTPS with proper certificates.
Updating Firecrawl
Keep your Firecrawl deployment up to date:
# Pull latest images
docker-compose pull
# Restart with new images
docker-compose up -d
# Remove old images
docker image prune -a
Conclusion
Deploying Firecrawl with Docker provides a robust, scalable solution for web scraping needs. By following this guide, you can set up a production-ready Firecrawl instance with proper configuration, monitoring, and security practices. Whether you're scraping static HTML or handling JavaScript-heavy single-page applications, Docker ensures your Firecrawl deployment remains consistent and maintainable across different environments.
For additional configuration options and advanced use cases, refer to the official Firecrawl documentation and consider integrating with orchestration platforms like Kubernetes for enterprise-scale deployments.