Table of contents

Can Firecrawl Generate Screenshots of Web Pages?

Yes, Firecrawl can generate screenshots of web pages as part of its comprehensive scraping capabilities. When you scrape a URL with Firecrawl, you can request multiple output formats including screenshots alongside markdown, HTML, and other content formats. This feature is particularly useful when you need visual representations of pages for documentation, monitoring, testing, or AI-powered visual analysis.

Firecrawl handles all the complexity of browser automation, JavaScript rendering, and image capture behind the scenes, making screenshot generation as simple as adding a format parameter to your scraping request.

How Firecrawl Screenshots Work

Firecrawl's screenshot functionality leverages headless browser technology to render web pages exactly as they appear to users, including dynamic content loaded by JavaScript. The screenshots are returned as base64-encoded PNG images that you can save, display, or process as needed.

Unlike basic HTTP requests that only capture static HTML, Firecrawl waits for JavaScript to execute and the page to fully render before capturing the screenshot, ensuring you get an accurate visual representation of the live page.

Generating Screenshots with Python

The Firecrawl Python SDK makes it straightforward to capture screenshots. Here's a basic example:

from firecrawl import FirecrawlApp
import os

# Initialize the Firecrawl client
app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

# Scrape a page and request a screenshot
result = app.scrape_url(
    url='https://example.com',
    params={
        'formats': ['screenshot', 'markdown']
    }
)

# The screenshot is returned as base64-encoded data
screenshot_base64 = result['screenshot']
print(f"Screenshot captured: {len(screenshot_base64)} bytes")

Saving Screenshots to Files

To save the screenshot as an image file:

import base64
from firecrawl import FirecrawlApp
import os

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

# Capture screenshot
result = app.scrape_url(
    url='https://example.com',
    params={'formats': ['screenshot']}
)

# Decode and save the screenshot
screenshot_data = base64.b64decode(result['screenshot'])
with open('screenshot.png', 'wb') as f:
    f.write(screenshot_data)

print("Screenshot saved as screenshot.png")

Capturing Screenshots with Custom Wait Times

For pages that take time to load content, you can specify a wait time before capturing the screenshot:

from firecrawl import FirecrawlApp
import base64
import os

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

# Wait for JavaScript to load before screenshot
result = app.scrape_url(
    url='https://example.com',
    params={
        'formats': ['screenshot', 'markdown'],
        'waitFor': 3000,  # Wait 3 seconds for dynamic content
    }
)

# Save screenshot
screenshot_data = base64.b64decode(result['screenshot'])
with open('screenshot_delayed.png', 'wb') as f:
    f.write(screenshot_data)

This approach is similar to using the waitFor function in Puppeteer but without having to manage the browser yourself.

Generating Screenshots with JavaScript/Node.js

The Firecrawl JavaScript SDK provides identical screenshot functionality for Node.js applications:

import FirecrawlApp from '@mendable/firecrawl-js';
import fs from 'fs';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

async function captureScreenshot() {
  const scrapeResult = await app.scrapeUrl('https://example.com', {
    formats: ['screenshot', 'markdown']
  });

  console.log('Screenshot captured successfully');
  console.log('Screenshot size:', scrapeResult.screenshot.length, 'bytes');

  return scrapeResult.screenshot;
}

captureScreenshot();

Saving Screenshots in Node.js

To save the base64 screenshot to a file:

import FirecrawlApp from '@mendable/firecrawl-js';
import fs from 'fs';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

async function saveScreenshot(url, filename) {
  const result = await app.scrapeUrl(url, {
    formats: ['screenshot']
  });

  // Convert base64 to buffer and save
  const buffer = Buffer.from(result.screenshot, 'base64');
  fs.writeFileSync(filename, buffer);

  console.log(`Screenshot saved to ${filename}`);
}

saveScreenshot('https://example.com', 'example-screenshot.png');

Batch Screenshot Generation

Capture screenshots of multiple pages concurrently:

import FirecrawlApp from '@mendable/firecrawl-js';
import fs from 'fs';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

async function captureMultipleScreenshots(urls) {
  const promises = urls.map(async (url, index) => {
    const result = await app.scrapeUrl(url, {
      formats: ['screenshot'],
      waitFor: 2000
    });

    const buffer = Buffer.from(result.screenshot, 'base64');
    const filename = `screenshot-${index + 1}.png`;
    fs.writeFileSync(filename, buffer);

    return { url, filename };
  });

  const results = await Promise.all(promises);
  console.log('All screenshots captured:', results);
  return results;
}

const urls = [
  'https://example.com',
  'https://example.com/about',
  'https://example.com/contact'
];

captureMultipleScreenshots(urls);

Advanced Screenshot Options

Combining Screenshots with Other Data

You can request multiple output formats simultaneously to get both visual and textual representations:

from firecrawl import FirecrawlApp
import base64
import json
import os

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

# Get screenshot, markdown, and HTML simultaneously
result = app.scrape_url(
    url='https://example.com/product',
    params={
        'formats': ['screenshot', 'markdown', 'html'],
        'onlyMainContent': True,
        'waitFor': 2000
    }
)

# Save screenshot
screenshot_data = base64.b64decode(result['screenshot'])
with open('page-screenshot.png', 'wb') as f:
    f.write(screenshot_data)

# Save markdown content
with open('page-content.md', 'w', encoding='utf-8') as f:
    f.write(result['markdown'])

# Save complete data as JSON
with open('page-data.json', 'w', encoding='utf-8') as f:
    # Exclude large screenshot from JSON
    data = {k: v for k, v in result.items() if k != 'screenshot'}
    json.dump(data, f, indent=2)

print("Screenshot and content saved successfully")

Screenshots During Crawling

When crawling multiple pages, you can capture screenshots of each discovered page:

from firecrawl import FirecrawlApp
import base64
import os

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

# Crawl website with screenshots
crawl_result = app.crawl_url(
    url='https://example.com',
    params={
        'limit': 10,
        'scrapeOptions': {
            'formats': ['screenshot', 'markdown'],
            'waitFor': 1000
        }
    }
)

# Save screenshots from all crawled pages
for index, page in enumerate(crawl_result['data']):
    screenshot_data = base64.b64decode(page['screenshot'])
    url_slug = page['metadata']['sourceURL'].split('/')[-1] or 'home'
    filename = f'screenshots/{url_slug}-{index}.png'

    os.makedirs('screenshots', exist_ok=True)
    with open(filename, 'wb') as f:
        f.write(screenshot_data)

    print(f"Saved: {filename}")

Use Cases for Firecrawl Screenshots

1. Visual Regression Testing

Monitor visual changes across page updates:

from firecrawl import FirecrawlApp
import base64
import os
from datetime import datetime

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

def capture_baseline_screenshot(url, directory='baselines'):
    result = app.scrape_url(url, params={'formats': ['screenshot']})

    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    filename = f'{directory}/screenshot_{timestamp}.png'

    os.makedirs(directory, exist_ok=True)
    screenshot_data = base64.b64decode(result['screenshot'])
    with open(filename, 'wb') as f:
        f.write(screenshot_data)

    return filename

# Capture baseline
baseline = capture_baseline_screenshot('https://example.com')
print(f"Baseline saved: {baseline}")

2. Documentation Generation

Create visual documentation of web applications:

import FirecrawlApp from '@mendable/firecrawl-js';
import fs from 'fs';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

async function generateDocumentation(pages) {
  const documentation = [];

  for (const page of pages) {
    const result = await app.scrapeUrl(page.url, {
      formats: ['screenshot', 'markdown'],
      waitFor: 2000
    });

    // Save screenshot
    const buffer = Buffer.from(result.screenshot, 'base64');
    const screenshotPath = `docs/images/${page.name}.png`;
    fs.writeFileSync(screenshotPath, buffer);

    // Create documentation entry
    documentation.push({
      name: page.name,
      url: page.url,
      screenshot: screenshotPath,
      content: result.markdown
    });
  }

  return documentation;
}

const pages = [
  { name: 'homepage', url: 'https://example.com' },
  { name: 'features', url: 'https://example.com/features' },
  { name: 'pricing', url: 'https://example.com/pricing' }
];

generateDocumentation(pages).then(docs => {
  console.log('Documentation generated:', docs.length, 'pages');
});

3. Monitoring and Alerts

Monitor website appearance and detect unexpected changes:

from firecrawl import FirecrawlApp
import base64
import hashlib
import os

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

def get_screenshot_hash(url):
    """Capture screenshot and return its hash"""
    result = app.scrape_url(url, params={'formats': ['screenshot']})
    screenshot_data = base64.b64decode(result['screenshot'])
    return hashlib.md5(screenshot_data).hexdigest()

def monitor_page_changes(url, previous_hash=None):
    """Monitor for visual changes"""
    current_hash = get_screenshot_hash(url)

    if previous_hash and current_hash != previous_hash:
        print(f"⚠️  Page has changed: {url}")
        return True
    else:
        print(f"✓ Page unchanged: {url}")
        return False

# Example usage
baseline_hash = get_screenshot_hash('https://example.com')
print(f"Baseline hash: {baseline_hash}")

# Later, check for changes
has_changed = monitor_page_changes('https://example.com', baseline_hash)

Handling Errors and Edge Cases

When working with screenshots, implement proper error handling:

from firecrawl import FirecrawlApp
import base64
import os

app = FirecrawlApp(api_key=os.getenv('FIRECRAWL_API_KEY'))

def safe_screenshot_capture(url, output_path, max_retries=3):
    """Capture screenshot with retry logic"""
    for attempt in range(max_retries):
        try:
            result = app.scrape_url(
                url,
                params={
                    'formats': ['screenshot'],
                    'waitFor': 2000,
                    'timeout': 30000
                }
            )

            if 'screenshot' not in result:
                raise ValueError("Screenshot not generated")

            screenshot_data = base64.b64decode(result['screenshot'])
            with open(output_path, 'wb') as f:
                f.write(screenshot_data)

            print(f"✓ Screenshot saved: {output_path}")
            return True

        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                print(f"✗ Failed to capture screenshot after {max_retries} attempts")
                return False

# Usage
safe_screenshot_capture('https://example.com', 'output.png')

Performance Considerations

Optimize Screenshot Requests

Screenshots are larger than text data, so consider these optimization strategies:

import FirecrawlApp from '@mendable/firecrawl-js';
import fs from 'fs';

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

async function optimizedScreenshots(urls, concurrency = 3) {
  // Process in batches to avoid overwhelming the API
  const results = [];

  for (let i = 0; i < urls.length; i += concurrency) {
    const batch = urls.slice(i, i + concurrency);

    const batchResults = await Promise.all(
      batch.map(url =>
        app.scrapeUrl(url, {
          formats: ['screenshot'],
          waitFor: 1000,
          timeout: 30000
        })
      )
    );

    results.push(...batchResults);
    console.log(`Processed batch ${Math.floor(i / concurrency) + 1}`);
  }

  return results;
}

// Process URLs in controlled batches
const urls = ['https://example.com/1', 'https://example.com/2'];
optimizedScreenshots(urls).then(results => {
  console.log(`Captured ${results.length} screenshots`);
});

Comparison with Other Screenshot Tools

Firecrawl's screenshot capability offers several advantages over alternatives:

| Feature | Firecrawl | Puppeteer | Selenium | |---------|-----------|-----------|----------| | Setup Complexity | Low (API-based) | Medium (requires browser) | High (driver management) | | JavaScript Rendering | ✓ Automatic | ✓ Manual control | ✓ Manual control | | Infrastructure | Managed | Self-hosted | Self-hosted | | Proxy Handling | Built-in | Manual | Manual | | Scaling | Automatic | Manual | Manual |

While tools like Puppeteer for SEO auditing offer more granular control, Firecrawl simplifies the process by handling infrastructure, anti-bot detection, and browser management automatically.

Best Practices

  1. Request Only What You Need - Only include 'screenshot' in formats when you actually need the image to minimize response size and processing time

  2. Use Appropriate Wait Times - Set waitFor values based on page complexity to ensure content is fully loaded before screenshot capture

  3. Implement Retry Logic - Network issues or timeouts can occur; always implement retry mechanisms for production use

  4. Store Screenshots Efficiently - Consider compressing or converting screenshots if storage is a concern

  5. Monitor API Usage - Screenshots consume more resources; track usage to stay within plan limits

  6. Handle Large Batches Carefully - When capturing many screenshots, process them in batches to avoid timeout issues

  7. Cache When Possible - If you need the same screenshot multiple times, cache it rather than requesting it repeatedly

Conclusion

Firecrawl's screenshot generation capability provides a powerful, managed solution for capturing visual representations of web pages without the complexity of managing headless browsers. By simply including 'screenshot' in your format options, you get high-quality PNG images of fully-rendered pages, complete with JavaScript execution and dynamic content.

Whether you're building visual regression testing tools, generating documentation, monitoring website changes, or creating datasets for AI applications, Firecrawl's screenshot feature offers a straightforward API that handles the complexity behind the scenes. Combined with its other capabilities like markdown conversion and structured data extraction, Firecrawl provides a comprehensive solution for modern web scraping needs.

For scenarios where you need more granular control over browser behavior, such as navigating to different pages with specific interactions, you might consider combining Firecrawl with dedicated browser automation tools. However, for most screenshot use cases, Firecrawl's managed approach offers the right balance of simplicity, reliability, and functionality.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon