Table of contents

How do I Access the Firecrawl GitHub Repository?

Firecrawl is an open-source web scraping and crawling tool that converts websites into clean, structured markdown or JSON data. The Firecrawl GitHub repository is publicly available and provides access to the source code, documentation, and community contributions. This guide will walk you through accessing, cloning, and working with the Firecrawl repository.

Accessing the Firecrawl Repository

The official Firecrawl GitHub repository is hosted at:

https://github.com/mendableai/firecrawl

You can access the repository directly through your web browser by visiting this URL. The repository is open-source under the AGPL-3.0 license, meaning you can view, fork, and contribute to the project freely.

Repository Structure

Once you access the repository, you'll find the following key directories:

  • apps/ - Contains the main application code
  • docs/ - Documentation and guides
  • examples/ - Sample implementations and use cases
  • packages/ - SDK packages for different programming languages
  • docker/ - Docker configuration files
  • .github/ - GitHub workflows and issue templates

Cloning the Repository

To work with Firecrawl locally, you'll need to clone the repository to your machine. Here's how to do it:

Prerequisites

Before cloning, ensure you have Git installed on your system:

# Check if Git is installed
git --version

# If not installed, download from https://git-scm.com/

Clone via HTTPS

The simplest method to clone the repository is using HTTPS:

# Clone the repository
git clone https://github.com/mendableai/firecrawl.git

# Navigate into the directory
cd firecrawl

Clone via SSH

If you have SSH keys configured with GitHub, you can clone using SSH:

# Clone via SSH
git clone git@github.com:mendableai/firecrawl.git

# Navigate into the directory
cd firecrawl

Clone a Specific Branch

To clone a specific branch instead of the default main branch:

# Clone a specific branch
git clone -b branch-name https://github.com/mendableai/firecrawl.git

Installing Firecrawl from the Repository

After cloning the repository, you can install and run Firecrawl locally. The installation process varies depending on whether you're using the self-hosted version or working with the SDKs.

Using Docker (Recommended)

The easiest way to run Firecrawl is using Docker:

# Navigate to the repository
cd firecrawl

# Build and run with Docker Compose
docker-compose up -d

# Check if containers are running
docker ps

This will start the Firecrawl API server, which you can access at http://localhost:3002.

Installing the Python SDK

If you want to use the Python SDK from the repository:

# Navigate to the Python SDK directory
cd packages/python-sdk

# Install in development mode
pip install -e .

# Or install dependencies
pip install -r requirements.txt

Then use it in your Python code:

from firecrawl import FirecrawlApp

# Initialize with your API key
app = FirecrawlApp(api_key='your-api-key')

# Scrape a single page
result = app.scrape_url('https://example.com')
print(result['markdown'])

# Crawl a website
crawl_result = app.crawl_url('https://example.com', {'limit': 100})
for page in crawl_result:
    print(f"URL: {page['url']}")
    print(f"Content: {page['markdown'][:200]}...")

Installing the JavaScript/Node.js SDK

For the JavaScript SDK:

# Navigate to the JS SDK directory
cd packages/js-sdk

# Install dependencies
npm install

# Build the package
npm run build

Then use it in your JavaScript/Node.js code:

import FirecrawlApp from '@mendable/firecrawl-js';

// Initialize with your API key
const app = new FirecrawlApp({ apiKey: 'your-api-key' });

// Scrape a single page
const scrapeResult = await app.scrapeUrl('https://example.com');
console.log(scrapeResult.markdown);

// Crawl a website
const crawlResult = await app.crawlUrl('https://example.com', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown', 'html']
  }
});

console.log(crawlResult);

Working with the Repository

Keeping Your Local Copy Updated

To keep your local repository synchronized with the latest changes:

# Fetch the latest changes
git fetch origin

# Pull and merge changes
git pull origin main

# Or rebase your changes
git pull --rebase origin main

Creating a Fork

To contribute to Firecrawl, you'll typically want to fork the repository:

  1. Visit https://github.com/mendableai/firecrawl
  2. Click the "Fork" button in the top-right corner
  3. Clone your fork:
git clone https://github.com/YOUR-USERNAME/firecrawl.git
cd firecrawl

# Add upstream remote
git remote add upstream https://github.com/mendableai/firecrawl.git

Creating a Branch for Development

When working on new features or fixes:

# Create and switch to a new branch
git checkout -b feature/my-new-feature

# Make your changes, then stage them
git add .

# Commit your changes
git commit -m "Add new feature description"

# Push to your fork
git push origin feature/my-new-feature

Exploring the Documentation

The Firecrawl repository includes comprehensive documentation in the docs/ directory. Key documentation files include:

  • API Reference - Detailed API endpoint documentation
  • SDK Guides - Language-specific implementation guides
  • Self-Hosting - Instructions for deploying Firecrawl on your infrastructure
  • Contributing - Guidelines for contributing to the project

You can also access the online documentation at https://docs.firecrawl.dev

Using GitHub Features

Issues and Bug Reports

If you encounter bugs or have feature requests:

# Search existing issues
# Visit: https://github.com/mendableai/firecrawl/issues

# Create a new issue with detailed information

Pull Requests

To contribute code:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request with a clear description

Discussions

For questions and community support, use GitHub Discussions:

https://github.com/mendableai/firecrawl/discussions

Integrating Firecrawl with Browser Automation

Firecrawl works excellently with browser automation tools for handling dynamic content. When dealing with JavaScript-heavy websites, you might want to combine Firecrawl with tools that can handle AJAX requests using Puppeteer or crawl single page applications.

For scenarios where you need to interact with dynamic elements before scraping, understanding how to inject JavaScript into a page using Puppeteer can complement Firecrawl's capabilities.

Repository Statistics and Activity

The Firecrawl repository is actively maintained with:

  • Regular commits and updates
  • Active issue tracking and resolution
  • Community contributions via pull requests
  • Comprehensive CI/CD workflows
  • Multi-language SDK support

You can view the repository's activity, stars, forks, and watchers directly on the GitHub page to gauge community engagement and project health.

Checking Repository Releases

To use stable versions of Firecrawl:

# List all tags/releases
git tag -l

# Checkout a specific version
git checkout tags/v1.0.0

# View releases on GitHub
# Visit: https://github.com/mendableai/firecrawl/releases

Alternative: Using NPM or PyPI Packages

If you don't need the full repository and just want to use Firecrawl in your projects:

Python (PyPI)

pip install firecrawl-py

JavaScript/Node.js (NPM)

npm install @mendable/firecrawl-js

These packages are built from the GitHub repository and published to their respective package managers.

Troubleshooting Common Issues

Clone Failures

If cloning fails due to network issues:

# Use shallow clone for large repositories
git clone --depth 1 https://github.com/mendableai/firecrawl.git

# Or clone with specific protocol
git clone https://github.com/mendableai/firecrawl.git --config http.sslVerify=false

Permission Issues

If you encounter permission errors:

# Ensure you have read access to the repository
# Check your GitHub authentication
gh auth status

# Or use HTTPS instead of SSH
git clone https://github.com/mendableai/firecrawl.git

Large Repository Size

To manage repository size:

# Shallow clone with limited history
git clone --depth 50 https://github.com/mendableai/firecrawl.git

# Clone specific branch only
git clone -b main --single-branch https://github.com/mendableai/firecrawl.git

Best Practices

When working with the Firecrawl repository:

  1. Star the Repository - Stay updated with notifications and show support
  2. Read CONTRIBUTING.md - Follow contribution guidelines before submitting PRs
  3. Check Issues First - Avoid duplicate bug reports by searching existing issues
  4. Keep Your Fork Updated - Regularly sync with the upstream repository
  5. Use Release Versions - For production use, stick to tagged releases rather than the main branch
  6. Review the License - Understand AGPL-3.0 requirements if using in commercial projects

Conclusion

Accessing the Firecrawl GitHub repository is straightforward and provides you with complete access to the source code, documentation, and community resources. Whether you're looking to use Firecrawl in your projects, contribute to its development, or simply explore how it works, the repository at https://github.com/mendableai/firecrawl is your starting point.

By cloning the repository, you can run Firecrawl locally, customize it for your needs, contribute improvements, and stay up-to-date with the latest features and fixes. The active community and comprehensive documentation make it an excellent choice for developers looking for a powerful, open-source web scraping solution.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon