Can I Use Headless Chromium to Generate PDFs from Web Pages?
Yes, Headless Chromium provides excellent PDF generation capabilities that allow you to convert web pages into high-quality PDF documents. This functionality is particularly useful for creating reports, invoices, documentation, and archival copies of web content. Headless Chromium's PDF generation preserves CSS styling, JavaScript-rendered content, and responsive layouts.
How PDF Generation Works in Headless Chromium
Headless Chromium renders web pages exactly as they would appear in a regular browser, then uses Chrome's built-in PDF printing functionality to generate the document. This approach ensures that the PDF output closely matches what users see in their browsers, including:
- CSS styles and layouts
- Web fonts
- Images and graphics
- JavaScript-generated content
- Responsive design elements
PDF Generation with Puppeteer (Node.js)
Puppeteer is the most popular Node.js library for controlling Headless Chromium. Here's how to generate PDFs:
Basic PDF Generation
const puppeteer = require('puppeteer');
async function generatePDF() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', {
waitUntil: 'networkidle0'
});
const pdf = await page.pdf({
path: 'example.pdf',
format: 'A4',
printBackground: true
});
await browser.close();
return pdf;
}
generatePDF();
Advanced PDF Configuration
const puppeteer = require('puppeteer');
async function generateAdvancedPDF() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set viewport for consistent rendering
await page.setViewport({ width: 1200, height: 800 });
await page.goto('https://example.com');
// Wait for dynamic content to load
await page.waitForSelector('.dynamic-content');
const pdf = await page.pdf({
path: 'advanced-example.pdf',
format: 'A4',
printBackground: true,
margin: {
top: '20mm',
right: '20mm',
bottom: '20mm',
left: '20mm'
},
displayHeaderFooter: true,
headerTemplate: '<div style="font-size:10px; width:100%; text-align:center;">Header Content</div>',
footerTemplate: '<div style="font-size:10px; width:100%; text-align:center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>'
});
await browser.close();
return pdf;
}
Handling Dynamic Content
When working with pages that load content dynamically, you need to wait for the content to fully render before generating the PDF. How to handle AJAX requests using Puppeteer provides detailed guidance on managing dynamic content.
async function generatePDFWithDynamicContent() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://spa-example.com');
// Wait for specific elements or network activity
await Promise.all([
page.waitForSelector('.content-loaded'),
page.waitForFunction(() => window.dataLoaded === true),
page.waitForTimeout(2000) // Fallback timeout
]);
const pdf = await page.pdf({
path: 'dynamic-content.pdf',
format: 'A4',
printBackground: true
});
await browser.close();
}
PDF Generation with Pyppeteer (Python)
Pyppeteer is the Python port of Puppeteer, offering similar functionality:
import asyncio
from pyppeteer import launch
async def generate_pdf():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
await page.waitForSelector('body')
await page.pdf({
'path': 'example.pdf',
'format': 'A4',
'printBackground': True,
'margin': {
'top': '20mm',
'right': '20mm',
'bottom': '20mm',
'left': '20mm'
}
})
await browser.close()
asyncio.get_event_loop().run_until_complete(generate_pdf())
Python with Custom CSS for Print
import asyncio
from pyppeteer import launch
async def generate_pdf_with_custom_css():
browser = await launch()
page = await browser.newPage()
# Add print-specific CSS
await page.addStyleTag({
'content': '''
@media print {
.no-print { display: none !important; }
.page-break { page-break-before: always; }
body { font-size: 12pt; }
}
'''
})
await page.goto('https://example.com')
pdf = await page.pdf({
'path': 'styled-example.pdf',
'format': 'A4',
'printBackground': True
})
await browser.close()
return pdf
asyncio.run(generate_pdf_with_custom_css())
Command Line PDF Generation
You can also generate PDFs directly using Chrome or Chromium from the command line:
# Basic PDF generation
google-chrome --headless --disable-gpu --print-to-pdf=output.pdf https://example.com
# With custom paper size and margins
google-chrome --headless --disable-gpu \
--print-to-pdf=output.pdf \
--print-to-pdf-no-header \
--virtual-time-budget=5000 \
https://example.com
# Generate PDF with specific viewport
chromium --headless --disable-gpu \
--window-size=1200,800 \
--print-to-pdf=output.pdf \
https://example.com
PDF Configuration Options
Page Format and Size
// Standard paper sizes
await page.pdf({ format: 'A4' }); // 210mm x 297mm
await page.pdf({ format: 'A3' }); // 297mm x 420mm
await page.pdf({ format: 'Letter' }); // 8.5in x 11in
await page.pdf({ format: 'Legal' }); // 8.5in x 14in
// Custom dimensions
await page.pdf({
width: '210mm',
height: '297mm'
});
Margins and Layout
await page.pdf({
margin: {
top: '20mm',
right: '15mm',
bottom: '20mm',
left: '15mm'
},
landscape: false, // Portrait orientation
printBackground: true
});
Headers and Footers
await page.pdf({
displayHeaderFooter: true,
headerTemplate: `
<div style="font-size:10px; width:100%; text-align:center; margin-top:5mm;">
<span class="title"></span>
</div>
`,
footerTemplate: `
<div style="font-size:10px; width:100%; text-align:center; margin-bottom:5mm;">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>
`,
margin: { top: '30mm', bottom: '30mm' }
});
Best Practices for PDF Generation
1. Wait for Content to Load
Always ensure dynamic content has fully loaded before generating the PDF. How to handle timeouts in Puppeteer offers strategies for managing loading times effectively.
// Wait for network to be idle
await page.goto(url, { waitUntil: 'networkidle0' });
// Wait for specific elements
await page.waitForSelector('.main-content');
// Wait for custom conditions
await page.waitForFunction(() => document.querySelector('.loading') === null);
2. Optimize CSS for Print
@media print {
/* Hide unnecessary elements */
.no-print, nav, footer, .sidebar {
display: none !important;
}
/* Control page breaks */
.page-break {
page-break-before: always;
}
/* Optimize text size */
body {
font-size: 12pt;
line-height: 1.4;
}
/* Ensure backgrounds print */
* {
-webkit-print-color-adjust: exact !important;
color-adjust: exact !important;
}
}
3. Handle Large Documents
For large documents, consider memory management and processing time:
async function generateLargePDF() {
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
// Increase timeout for large pages
page.setDefaultTimeout(60000);
await page.goto(url, { waitUntil: 'networkidle2' });
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
preferCSSPageSize: true
});
await browser.close();
return pdf;
}
4. Error Handling and Debugging
async function generatePDFWithErrorHandling() {
let browser;
try {
browser = await puppeteer.launch();
const page = await browser.newPage();
// Enable console logging for debugging
page.on('console', msg => console.log('PAGE LOG:', msg.text()));
page.on('pageerror', err => console.log('PAGE ERROR:', err.message));
await page.goto(url);
const pdf = await page.pdf({
format: 'A4',
printBackground: true
});
return pdf;
} catch (error) {
console.error('PDF generation failed:', error);
throw error;
} finally {
if (browser) {
await browser.close();
}
}
}
Performance Considerations
Memory Management
// Use single page instance for multiple PDFs
async function generateMultiplePDFs(urls) {
const browser = await puppeteer.launch();
const results = [];
for (const url of urls) {
const page = await browser.newPage();
try {
await page.goto(url);
const pdf = await page.pdf({ format: 'A4' });
results.push(pdf);
} finally {
await page.close(); // Important: close each page
}
}
await browser.close();
return results;
}
Concurrent PDF Generation
For processing multiple URLs simultaneously, how to run multiple pages in parallel with Puppeteer provides detailed strategies for concurrent operations.
async function generatePDFsConcurrently(urls) {
const browser = await puppeteer.launch();
const promises = urls.map(async (url) => {
const page = await browser.newPage();
try {
await page.goto(url);
return await page.pdf({ format: 'A4' });
} finally {
await page.close();
}
});
const results = await Promise.all(promises);
await browser.close();
return results;
}
Common Issues and Solutions
1. Missing Fonts
// Ensure system fonts are available
const browser = await puppeteer.launch({
args: ['--font-render-hinting=none']
});
2. Images Not Rendering
// Wait for images to load
await page.evaluate(() => {
return Promise.all(Array.from(document.images)
.filter(img => !img.complete)
.map(img => new Promise(resolve => {
img.onload = img.onerror = resolve;
})));
});
3. CSS Not Applied
// Ensure CSS is fully loaded
await page.waitForFunction(() => {
const sheets = Array.from(document.styleSheets);
return sheets.every(sheet => {
try {
return sheet.cssRules.length > 0;
} catch (e) {
return true;
}
});
});
Conclusion
Headless Chromium provides powerful and flexible PDF generation capabilities that can handle complex web pages with dynamic content, custom styling, and responsive layouts. Whether you're using Puppeteer with Node.js, Pyppeteer with Python, or command-line tools, the key to successful PDF generation lies in properly waiting for content to load, optimizing CSS for print media, and implementing robust error handling.
The combination of Headless Chromium's rendering engine and proper configuration options allows you to create professional-quality PDFs that accurately represent your web content while maintaining fast processing speeds and reliable output.