Table of contents

How do I use Firecrawl to convert websites to Markdown format?

{result['markdown']} """

with open('output.md', 'w', encoding='utf-8') as f: f.write(output) ```

Error Handling and Troubleshooting

Implement robust error handling when converting pages:

from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlError

app = FirecrawlApp(api_key='YOUR_API_KEY')

def safe_scrape_to_markdown(url):
    try:
        result = app.scrape_url(url, {
            'formats': ['markdown'],
            'timeout': 30000
        })
        return result['markdown']
    except FirecrawlError as e:
        print(f"Firecrawl error: {e}")
        return None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None

# Use the function
markdown = safe_scrape_to_markdown('https://example.com')
if markdown:
    print("Successfully converted to Markdown")
else:
    print("Conversion failed")

Use Cases for Markdown Conversion

1. Documentation Archival

Convert documentation sites to Markdown for offline storage or version control:

# Using cURL to archive documentation
for url in $(cat urls.txt); do
  curl -X POST https://api.firecrawl.dev/v1/scrape \
    -H "Authorization: Bearer $API_KEY" \
    -d "{\"url\":\"$url\",\"formats\":[\"markdown\"]}" \
    > "${url//\//_}.json"
done

2. Content Migration

When migrating from one CMS to another, Markdown provides a clean intermediate format:

# Migrate blog posts from old site to Markdown
old_posts = [
    'https://oldblog.com/post-1',
    'https://oldblog.com/post-2'
]

for url in old_posts:
    result = app.scrape_url(url, {'formats': ['markdown']})

    # Extract title for filename
    title = result['metadata']['title']
    filename = title.lower().replace(' ', '-') + '.md'

    with open(f'migrated/{filename}', 'w') as f:
        f.write(result['markdown'])

3. LLM Training Data

Markdown is ideal for preparing web content for large language models, similar to how you might process dynamic content with Puppeteer:

async function prepareTrainingData(urls) {
  const trainingData = [];

  for (const url of urls) {
    const result = await app.scrapeUrl(url, {
      formats: ['markdown'],
      onlyMainContent: true
    });

    trainingData.push({
      source: url,
      content: result.markdown,
      title: result.metadata.title
    });
  }

  // Save as JSON for LLM training
  require('fs').writeFileSync(
    'training-data.json',
    JSON.stringify(trainingData, null, 2)
  );
}

Conclusion

Firecrawl's Markdown conversion feature provides a powerful way to transform web content into a clean, portable format. By leveraging the /scrape endpoint with the formats: ['markdown'] parameter, you can easily convert HTML pages while preserving structure and content. Whether you're archiving documentation, migrating content, or preparing data for LLMs, Firecrawl handles the complexity of HTML parsing and conversion automatically, allowing you to focus on using the content rather than extracting it.

For more advanced scenarios involving dynamic content, consider exploring how to handle authentication or manage complex JavaScript-rendered pages before conversion.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon