How do I use Firecrawl to render HTML with JavaScript?

Firecrawl is designed to handle JavaScript-rendered websites out of the box, making it an excellent choice for scraping modern web applications, single-page applications (SPAs), and dynamic content. Unlike traditional web scrapers that only fetch static HTML, Firecrawl uses headless browser technology to execute JavaScript and wait for content to load before extracting data.

Understanding JavaScript Rendering in Firecrawl

Firecrawl automatically renders JavaScript by default when you use its API endpoints. This means that when you make a request to scrape a page, Firecrawl:

Launches a headless browser (typically Chromium-based)
Navigates to the target URL
Executes all JavaScript code on the page
Waits for dynamic content to load
Returns the fully rendered HTML

This process is similar to how to handle AJAX requests using Puppeteer, but Firecrawl abstracts away the complexity of browser automation.

Basic JavaScript Rendering with Firecrawl

Using Python

Here's how to scrape a JavaScript-rendered page using Firecrawl's Python SDK:

from firecrawl import FirecrawlApp

# Initialize the Firecrawl client
app = FirecrawlApp(api_key='your_api_key_here')

# Scrape a JavaScript-heavy website
result = app.scrape_url('https://example.com/spa-application')

# Access the fully rendered HTML
html_content = result['html']

# Access the markdown version (cleaned and formatted)
markdown_content = result['markdown']

# Access extracted metadata
metadata = result['metadata']

print(f"Title: {metadata['title']}")
print(f"Description: {metadata['description']}")

Using Node.js

With the Node.js SDK, the process is equally straightforward:

import FirecrawlApp from '@mendable/firecrawl-js';

// Initialize the Firecrawl client
const app = new FirecrawlApp({ apiKey: 'your_api_key_here' });

// Scrape a JavaScript-rendered page
async function scrapePage() {
  try {
    const result = await app.scrapeUrl('https://example.com/spa-application');

    // Access the fully rendered HTML
    console.log('HTML:', result.html);

    // Access the markdown version
    console.log('Markdown:', result.markdown);

    // Access metadata
    console.log('Title:', result.metadata.title);
    console.log('Description:', result.metadata.description);
  } catch (error) {
    console.error('Error scraping page:', error);
  }
}

scrapePage();

Advanced JavaScript Rendering Options

Wait for Specific Elements

Sometimes you need to wait for specific elements to appear before considering the page fully loaded. Firecrawl provides the waitFor parameter:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key_here')

# Wait for a specific CSS selector before scraping
result = app.scrape_url(
    'https://example.com/dynamic-content',
    params={
        'waitFor': 5000  # Wait up to 5 seconds for content to load
    }
)

This is particularly useful when dealing with lazy-loaded content or animations, similar to using the waitFor function in Puppeteer.

Using Direct API Calls

If you prefer to use the REST API directly without an SDK:

curl -X POST https://api.firecrawl.dev/v0/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "url": "https://example.com/spa-application",
    "formats": ["html", "markdown"],
    "waitFor": 3000
  }'

The response will include the fully rendered HTML:

{
  "success": true,
  "data": {
    "html": "<html>...</html>",
    "markdown": "# Page Title\n\nContent...",
    "metadata": {
      "title": "Page Title",
      "description": "Page description",
      "language": "en",
      "sourceURL": "https://example.com/spa-application"
    }
  }
}

Handling Different Types of JavaScript Content

Single Page Applications (SPAs)

SPAs built with React, Vue, Angular, or similar frameworks require JavaScript execution to render content. Firecrawl handles these automatically:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key_here')

# Scrape a React application
react_app = app.scrape_url('https://example.com/react-app')

# Scrape a Vue.js application
vue_app = app.scrape_url('https://example.com/vue-app')

# Scrape an Angular application
angular_app = app.scrape_url('https://example.com/angular-app')

# All will return fully rendered HTML with JavaScript executed

Lazy-Loaded Content

For pages that load content as users scroll or interact:

import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({ apiKey: 'your_api_key_here' });

async function scrapeLazyContent() {
  const result = await app.scrapeUrl('https://example.com/lazy-load', {
    waitFor: 5000,  // Give extra time for lazy loading
    formats: ['html', 'markdown']
  });

  return result;
}

Infinite Scroll Pages

While Firecrawl renders JavaScript, it doesn't automatically trigger infinite scroll. For such cases, you might need to use the crawl endpoint with pagination:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key_here')

# Crawl multiple pages with JavaScript rendering
crawl_result = app.crawl_url(
    'https://example.com/infinite-scroll',
    params={
        'limit': 50,  # Maximum pages to crawl
        'waitFor': 3000
    }
)

# Process each crawled page
for page in crawl_result['data']:
    print(f"URL: {page['metadata']['sourceURL']}")
    print(f"HTML Length: {len(page['html'])}")

Extracting Data from JavaScript-Rendered Pages

Using the Extract Endpoint

Firecrawl's extract endpoint allows you to define a schema and extract structured data from JavaScript-rendered pages:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key_here')

# Define the schema for data extraction
schema = {
    "type": "object",
    "properties": {
        "product_name": {"type": "string"},
        "price": {"type": "number"},
        "availability": {"type": "string"},
        "reviews": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "author": {"type": "string"},
                    "rating": {"type": "number"},
                    "comment": {"type": "string"}
                }
            }
        }
    },
    "required": ["product_name", "price"]
}

# Extract structured data from a JavaScript-heavy e-commerce page
result = app.extract_url(
    'https://example.com/product/123',
    schema=schema
)

print(result['data'])

Using the Map Endpoint for Site Discovery

Before scraping, you can map out a website's structure:

import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({ apiKey: 'your_api_key_here' });

async function mapWebsite() {
  const result = await app.mapUrl('https://example.com');

  // Get all discovered URLs
  console.log('Discovered URLs:', result.links);

  // Now scrape each URL with JavaScript rendering
  for (const url of result.links) {
    const pageData = await app.scrapeUrl(url, {
      formats: ['html', 'markdown']
    });
    console.log(`Scraped: ${url}`);
  }
}

mapWebsite();

Best Practices for JavaScript Rendering

1. Optimize Wait Times

Don't set unnecessarily long wait times. Monitor your target pages and adjust:

# Too long - wastes time and credits
result = app.scrape_url(url, params={'waitFor': 30000})

# Better - appropriate for most JavaScript apps
result = app.scrape_url(url, params={'waitFor': 3000})

# Best - no wait if page loads quickly
result = app.scrape_url(url)  # Default behavior is usually sufficient

2. Handle Errors Gracefully

JavaScript rendering can fail for various reasons. Always implement error handling:

from firecrawl import FirecrawlApp
import logging

app = FirecrawlApp(api_key='your_api_key_here')

def safe_scrape(url):
    try:
        result = app.scrape_url(url, params={'waitFor': 5000})

        if result.get('success'):
            return result['data']
        else:
            logging.error(f"Scraping failed for {url}: {result.get('error')}")
            return None

    except Exception as e:
        logging.error(f"Exception while scraping {url}: {str(e)}")
        return None

# Use the safe scraper
data = safe_scrape('https://example.com/spa-page')
if data:
    print(f"Successfully scraped: {data['metadata']['title']}")

3. Respect Rate Limits

Firecrawl has rate limits to ensure service quality. Implement proper throttling:

import FirecrawlApp from '@mendable/firecrawl-js';

const app = new FirecrawlApp({ apiKey: 'your_api_key_here' });

async function scrapeWithRateLimit(urls, delayMs = 1000) {
  const results = [];

  for (const url of urls) {
    try {
      const result = await app.scrapeUrl(url);
      results.push(result);

      // Wait before next request
      if (urls.indexOf(url) < urls.length - 1) {
        await new Promise(resolve => setTimeout(resolve, delayMs));
      }
    } catch (error) {
      console.error(`Error scraping ${url}:`, error);
    }
  }

  return results;
}

const urls = [
  'https://example.com/page1',
  'https://example.com/page2',
  'https://example.com/page3'
];

scrapeWithRateLimit(urls, 2000);  // 2 second delay between requests

4. Choose the Right Output Format

Firecrawl supports multiple output formats. Choose based on your needs:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key='your_api_key_here')

# Get HTML for detailed DOM manipulation
html_result = app.scrape_url(url, params={'formats': ['html']})

# Get Markdown for cleaner text extraction
markdown_result = app.scrape_url(url, params={'formats': ['markdown']})

# Get both formats
both_formats = app.scrape_url(url, params={'formats': ['html', 'markdown']})

Troubleshooting JavaScript Rendering

Page Not Fully Loaded

If content is missing, increase the wait time or check for specific elements:

# Increase wait time
result = app.scrape_url(url, params={'waitFor': 10000})

# Or use crawl with more options
result = app.crawl_url(
    url,
    params={
        'waitFor': 5000,
        'limit': 1
    }
)

Performance Issues

If scraping is slow:

Reduce the waitFor parameter
Use the formats parameter to only get what you need
Consider using the crawl endpoint for batch operations

# Optimized for performance
result = app.scrape_url(
    url,
    params={
        'formats': ['markdown'],  # Skip HTML if not needed
        'waitFor': 2000  # Minimal wait time
    }
)

Handling Dynamic URLs

For SPAs that use hash routing or query parameters:

const urls = [
  'https://example.com/#/products/1',
  'https://example.com/#/products/2',
  'https://example.com/?page=1',
  'https://example.com/?page=2'
];

for (const url of urls) {
  const result = await app.scrapeUrl(url, {
    waitFor: 3000,
    formats: ['markdown']
  });
  console.log(`Scraped: ${url}`);
}

Comparison with Other Tools

Firecrawl's JavaScript rendering capability is similar to crawling single page applications with Puppeteer, but with several advantages:

No infrastructure management: No need to maintain headless browsers
Built-in retries: Automatic retry logic for failed requests
Scalability: Handles concurrent requests without managing browser instances
Simplified API: Clean, consistent interface across all endpoints

Conclusion

Firecrawl makes JavaScript rendering simple and accessible through its API. Whether you're scraping modern SPAs, handling AJAX-loaded content, or extracting data from dynamic websites, Firecrawl's built-in browser automation handles the complexity for you.

Key takeaways:

JavaScript rendering is enabled by default in Firecrawl
Use the waitFor parameter for pages with delayed content loading
Choose appropriate output formats (HTML, Markdown) based on your needs
Implement error handling and respect rate limits
Use the extract endpoint for structured data from JavaScript-heavy pages

By following these best practices, you can efficiently scrape JavaScript-rendered websites without the overhead of managing headless browsers or complex automation scripts.

Table of contents