How do I Access the Firecrawl GitHub Repository?
Firecrawl is an open-source web scraping and crawling tool that converts websites into clean, structured markdown or JSON data. The Firecrawl GitHub repository is publicly available and provides access to the source code, documentation, and community contributions. This guide will walk you through accessing, cloning, and working with the Firecrawl repository.
Accessing the Firecrawl Repository
The official Firecrawl GitHub repository is hosted at:
https://github.com/mendableai/firecrawl
You can access the repository directly through your web browser by visiting this URL. The repository is open-source under the AGPL-3.0 license, meaning you can view, fork, and contribute to the project freely.
Repository Structure
Once you access the repository, you'll find the following key directories:
apps/
- Contains the main application codedocs/
- Documentation and guidesexamples/
- Sample implementations and use casespackages/
- SDK packages for different programming languagesdocker/
- Docker configuration files.github/
- GitHub workflows and issue templates
Cloning the Repository
To work with Firecrawl locally, you'll need to clone the repository to your machine. Here's how to do it:
Prerequisites
Before cloning, ensure you have Git installed on your system:
# Check if Git is installed
git --version
# If not installed, download from https://git-scm.com/
Clone via HTTPS
The simplest method to clone the repository is using HTTPS:
# Clone the repository
git clone https://github.com/mendableai/firecrawl.git
# Navigate into the directory
cd firecrawl
Clone via SSH
If you have SSH keys configured with GitHub, you can clone using SSH:
# Clone via SSH
git clone git@github.com:mendableai/firecrawl.git
# Navigate into the directory
cd firecrawl
Clone a Specific Branch
To clone a specific branch instead of the default main branch:
# Clone a specific branch
git clone -b branch-name https://github.com/mendableai/firecrawl.git
Installing Firecrawl from the Repository
After cloning the repository, you can install and run Firecrawl locally. The installation process varies depending on whether you're using the self-hosted version or working with the SDKs.
Using Docker (Recommended)
The easiest way to run Firecrawl is using Docker:
# Navigate to the repository
cd firecrawl
# Build and run with Docker Compose
docker-compose up -d
# Check if containers are running
docker ps
This will start the Firecrawl API server, which you can access at http://localhost:3002
.
Installing the Python SDK
If you want to use the Python SDK from the repository:
# Navigate to the Python SDK directory
cd packages/python-sdk
# Install in development mode
pip install -e .
# Or install dependencies
pip install -r requirements.txt
Then use it in your Python code:
from firecrawl import FirecrawlApp
# Initialize with your API key
app = FirecrawlApp(api_key='your-api-key')
# Scrape a single page
result = app.scrape_url('https://example.com')
print(result['markdown'])
# Crawl a website
crawl_result = app.crawl_url('https://example.com', {'limit': 100})
for page in crawl_result:
print(f"URL: {page['url']}")
print(f"Content: {page['markdown'][:200]}...")
Installing the JavaScript/Node.js SDK
For the JavaScript SDK:
# Navigate to the JS SDK directory
cd packages/js-sdk
# Install dependencies
npm install
# Build the package
npm run build
Then use it in your JavaScript/Node.js code:
import FirecrawlApp from '@mendable/firecrawl-js';
// Initialize with your API key
const app = new FirecrawlApp({ apiKey: 'your-api-key' });
// Scrape a single page
const scrapeResult = await app.scrapeUrl('https://example.com');
console.log(scrapeResult.markdown);
// Crawl a website
const crawlResult = await app.crawlUrl('https://example.com', {
limit: 100,
scrapeOptions: {
formats: ['markdown', 'html']
}
});
console.log(crawlResult);
Working with the Repository
Keeping Your Local Copy Updated
To keep your local repository synchronized with the latest changes:
# Fetch the latest changes
git fetch origin
# Pull and merge changes
git pull origin main
# Or rebase your changes
git pull --rebase origin main
Creating a Fork
To contribute to Firecrawl, you'll typically want to fork the repository:
- Visit https://github.com/mendableai/firecrawl
- Click the "Fork" button in the top-right corner
- Clone your fork:
git clone https://github.com/YOUR-USERNAME/firecrawl.git
cd firecrawl
# Add upstream remote
git remote add upstream https://github.com/mendableai/firecrawl.git
Creating a Branch for Development
When working on new features or fixes:
# Create and switch to a new branch
git checkout -b feature/my-new-feature
# Make your changes, then stage them
git add .
# Commit your changes
git commit -m "Add new feature description"
# Push to your fork
git push origin feature/my-new-feature
Exploring the Documentation
The Firecrawl repository includes comprehensive documentation in the docs/
directory. Key documentation files include:
- API Reference - Detailed API endpoint documentation
- SDK Guides - Language-specific implementation guides
- Self-Hosting - Instructions for deploying Firecrawl on your infrastructure
- Contributing - Guidelines for contributing to the project
You can also access the online documentation at https://docs.firecrawl.dev
Using GitHub Features
Issues and Bug Reports
If you encounter bugs or have feature requests:
# Search existing issues
# Visit: https://github.com/mendableai/firecrawl/issues
# Create a new issue with detailed information
Pull Requests
To contribute code:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request with a clear description
Discussions
For questions and community support, use GitHub Discussions:
https://github.com/mendableai/firecrawl/discussions
Integrating Firecrawl with Browser Automation
Firecrawl works excellently with browser automation tools for handling dynamic content. When dealing with JavaScript-heavy websites, you might want to combine Firecrawl with tools that can handle AJAX requests using Puppeteer or crawl single page applications.
For scenarios where you need to interact with dynamic elements before scraping, understanding how to inject JavaScript into a page using Puppeteer can complement Firecrawl's capabilities.
Repository Statistics and Activity
The Firecrawl repository is actively maintained with:
- Regular commits and updates
- Active issue tracking and resolution
- Community contributions via pull requests
- Comprehensive CI/CD workflows
- Multi-language SDK support
You can view the repository's activity, stars, forks, and watchers directly on the GitHub page to gauge community engagement and project health.
Checking Repository Releases
To use stable versions of Firecrawl:
# List all tags/releases
git tag -l
# Checkout a specific version
git checkout tags/v1.0.0
# View releases on GitHub
# Visit: https://github.com/mendableai/firecrawl/releases
Alternative: Using NPM or PyPI Packages
If you don't need the full repository and just want to use Firecrawl in your projects:
Python (PyPI)
pip install firecrawl-py
JavaScript/Node.js (NPM)
npm install @mendable/firecrawl-js
These packages are built from the GitHub repository and published to their respective package managers.
Troubleshooting Common Issues
Clone Failures
If cloning fails due to network issues:
# Use shallow clone for large repositories
git clone --depth 1 https://github.com/mendableai/firecrawl.git
# Or clone with specific protocol
git clone https://github.com/mendableai/firecrawl.git --config http.sslVerify=false
Permission Issues
If you encounter permission errors:
# Ensure you have read access to the repository
# Check your GitHub authentication
gh auth status
# Or use HTTPS instead of SSH
git clone https://github.com/mendableai/firecrawl.git
Large Repository Size
To manage repository size:
# Shallow clone with limited history
git clone --depth 50 https://github.com/mendableai/firecrawl.git
# Clone specific branch only
git clone -b main --single-branch https://github.com/mendableai/firecrawl.git
Best Practices
When working with the Firecrawl repository:
- Star the Repository - Stay updated with notifications and show support
- Read CONTRIBUTING.md - Follow contribution guidelines before submitting PRs
- Check Issues First - Avoid duplicate bug reports by searching existing issues
- Keep Your Fork Updated - Regularly sync with the upstream repository
- Use Release Versions - For production use, stick to tagged releases rather than the main branch
- Review the License - Understand AGPL-3.0 requirements if using in commercial projects
Conclusion
Accessing the Firecrawl GitHub repository is straightforward and provides you with complete access to the source code, documentation, and community resources. Whether you're looking to use Firecrawl in your projects, contribute to its development, or simply explore how it works, the repository at https://github.com/mendableai/firecrawl is your starting point.
By cloning the repository, you can run Firecrawl locally, customize it for your needs, contribute improvements, and stay up-to-date with the latest features and fixes. The active community and comprehensive documentation make it an excellent choice for developers looking for a powerful, open-source web scraping solution.