{result['markdown']} """
with open('output.md', 'w', encoding='utf-8') as f: f.write(output) ```
Error Handling and Troubleshooting
Implement robust error handling when converting pages:
from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlError
app = FirecrawlApp(api_key='YOUR_API_KEY')
def safe_scrape_to_markdown(url):
try:
result = app.scrape_url(url, {
'formats': ['markdown'],
'timeout': 30000
})
return result['markdown']
except FirecrawlError as e:
print(f"Firecrawl error: {e}")
return None
except Exception as e:
print(f"Unexpected error: {e}")
return None
# Use the function
markdown = safe_scrape_to_markdown('https://example.com')
if markdown:
print("Successfully converted to Markdown")
else:
print("Conversion failed")
Use Cases for Markdown Conversion
1. Documentation Archival
Convert documentation sites to Markdown for offline storage or version control:
# Using cURL to archive documentation
for url in $(cat urls.txt); do
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer $API_KEY" \
-d "{\"url\":\"$url\",\"formats\":[\"markdown\"]}" \
> "${url//\//_}.json"
done
2. Content Migration
When migrating from one CMS to another, Markdown provides a clean intermediate format:
# Migrate blog posts from old site to Markdown
old_posts = [
'https://oldblog.com/post-1',
'https://oldblog.com/post-2'
]
for url in old_posts:
result = app.scrape_url(url, {'formats': ['markdown']})
# Extract title for filename
title = result['metadata']['title']
filename = title.lower().replace(' ', '-') + '.md'
with open(f'migrated/{filename}', 'w') as f:
f.write(result['markdown'])
3. LLM Training Data
Markdown is ideal for preparing web content for large language models, similar to how you might process dynamic content with Puppeteer:
async function prepareTrainingData(urls) {
const trainingData = [];
for (const url of urls) {
const result = await app.scrapeUrl(url, {
formats: ['markdown'],
onlyMainContent: true
});
trainingData.push({
source: url,
content: result.markdown,
title: result.metadata.title
});
}
// Save as JSON for LLM training
require('fs').writeFileSync(
'training-data.json',
JSON.stringify(trainingData, null, 2)
);
}
Conclusion
Firecrawl's Markdown conversion feature provides a powerful way to transform web content into a clean, portable format. By leveraging the /scrape
endpoint with the formats: ['markdown']
parameter, you can easily convert HTML pages while preserving structure and content. Whether you're archiving documentation, migrating content, or preparing data for LLMs, Firecrawl handles the complexity of HTML parsing and conversion automatically, allowing you to focus on using the content rather than extracting it.
For more advanced scenarios involving dynamic content, consider exploring how to handle authentication or manage complex JavaScript-rendered pages before conversion.