How to Generate PDFs Using Puppeteer?
Puppeteer is a powerful Node.js library that provides a high-level API to control Chrome or Chromium browsers. One of its most useful features is the ability to generate PDFs from web pages programmatically. This capability is essential for creating reports, invoices, documentation, and other printable content from dynamic web applications.
Installation and Setup
Before generating PDFs, you need to install Puppeteer in your Node.js project:
npm install puppeteer
For a lighter installation that doesn't download Chromium (if you have Chrome installed):
npm install puppeteer-core
Basic PDF Generation
Here's a simple example of generating a PDF from a webpage:
const puppeteer = require('puppeteer');
async function generatePDF() {
// Launch browser
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Navigate to the webpage
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
// Generate PDF
const pdf = await page.pdf({
path: 'example.pdf',
format: 'A4',
printBackground: true
});
await browser.close();
return pdf;
}
generatePDF().catch(console.error);
Advanced PDF Configuration Options
Puppeteer's pdf()
method accepts numerous options for customizing the output:
const puppeteer = require('puppeteer');
async function generateAdvancedPDF() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com', { waitUntil: 'networkidle2' });
const pdf = await page.pdf({
path: 'advanced-example.pdf',
format: 'A4',
printBackground: true,
margin: {
top: '20px',
right: '20px',
bottom: '20px',
left: '20px'
},
displayHeaderFooter: true,
headerTemplate: '<div style="font-size: 12px; width: 100%; text-align: center;">Header Content</div>',
footerTemplate: '<div style="font-size: 12px; width: 100%; text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
preferCSSPageSize: true,
landscape: false,
scale: 1.0
});
await browser.close();
return pdf;
}
Generating PDFs from HTML Content
You can also generate PDFs from HTML content directly without navigating to a URL:
const puppeteer = require('puppeteer');
async function generatePDFFromHTML() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const htmlContent = `
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; margin: 40px; }
.header { color: #333; border-bottom: 2px solid #333; padding-bottom: 10px; }
.content { margin-top: 20px; line-height: 1.6; }
@media print {
.no-print { display: none; }
}
</style>
</head>
<body>
<h1 class="header">PDF Report</h1>
<div class="content">
<p>This is a dynamically generated PDF from HTML content.</p>
<p>You can include any HTML and CSS styling here.</p>
</div>
</body>
</html>
`;
await page.setContent(htmlContent);
const pdf = await page.pdf({
path: 'from-html.pdf',
format: 'A4',
printBackground: true
});
await browser.close();
return pdf;
}
Handling Dynamic Content
When dealing with dynamic content that loads via JavaScript, you need to wait for the content to fully load:
const puppeteer = require('puppeteer');
async function generatePDFWithDynamicContent() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/dynamic-content');
// Wait for specific elements to load
await page.waitForSelector('.dynamic-content', { timeout: 30000 });
// Or wait for network to be idle
await page.waitForNetworkIdle();
// Optional: Wait for additional time for animations
await page.waitForTimeout(2000);
const pdf = await page.pdf({
path: 'dynamic-content.pdf',
format: 'A4',
printBackground: true
});
await browser.close();
return pdf;
}
Custom Page Dimensions and Orientation
You can specify custom page dimensions and orientation:
const puppeteer = require('puppeteer');
async function generateCustomSizePDF() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
// Custom dimensions
const pdf = await page.pdf({
path: 'custom-size.pdf',
width: '8.5in',
height: '11in',
printBackground: true,
margin: {
top: '1in',
right: '1in',
bottom: '1in',
left: '1in'
}
});
// Landscape orientation
const landscapePdf = await page.pdf({
path: 'landscape.pdf',
format: 'A4',
landscape: true,
printBackground: true
});
await browser.close();
return { pdf, landscapePdf };
}
Adding Headers and Footers
Puppeteer allows you to add custom headers and footers to your PDFs:
const puppeteer = require('puppeteer');
async function generatePDFWithHeaderFooter() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const pdf = await page.pdf({
path: 'with-header-footer.pdf',
format: 'A4',
displayHeaderFooter: true,
headerTemplate: `
<div style="font-size: 10px; padding: 5px; width: 100%; text-align: center; border-bottom: 1px solid #ccc;">
<span>Company Report - Generated on ${new Date().toLocaleDateString()}</span>
</div>
`,
footerTemplate: `
<div style="font-size: 10px; padding: 5px; width: 100%; text-align: center; border-top: 1px solid #ccc;">
<span>Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>
</div>
`,
margin: {
top: '100px',
bottom: '100px',
left: '20px',
right: '20px'
}
});
await browser.close();
return pdf;
}
Error Handling and Best Practices
Here's a robust implementation with proper error handling:
const puppeteer = require('puppeteer');
const fs = require('fs').promises;
class PDFGenerator {
constructor() {
this.browser = null;
}
async initialize() {
this.browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
}
async generatePDF(url, options = {}) {
if (!this.browser) {
await this.initialize();
}
const page = await this.browser.newPage();
try {
// Set viewport for consistent rendering
await page.setViewport({ width: 1200, height: 800 });
// Navigate with timeout
await page.goto(url, {
waitUntil: 'networkidle2',
timeout: 30000
});
// Default PDF options
const defaultOptions = {
format: 'A4',
printBackground: true,
margin: {
top: '20px',
right: '20px',
bottom: '20px',
left: '20px'
}
};
const pdfOptions = { ...defaultOptions, ...options };
const pdf = await page.pdf(pdfOptions);
return pdf;
} catch (error) {
console.error('PDF generation failed:', error);
throw error;
} finally {
await page.close();
}
}
async close() {
if (this.browser) {
await this.browser.close();
}
}
}
// Usage example
async function main() {
const generator = new PDFGenerator();
try {
const pdf = await generator.generatePDF('https://example.com', {
path: 'output.pdf',
format: 'A4'
});
console.log('PDF generated successfully');
} catch (error) {
console.error('Error:', error);
} finally {
await generator.close();
}
}
Alternative: Using Playwright for PDF Generation
While Puppeteer is excellent for PDF generation, you might also consider Playwright for similar functionality. Playwright offers cross-browser support and can be a good alternative depending on your needs.
Performance Optimization
For better performance when generating multiple PDFs:
const puppeteer = require('puppeteer');
class BatchPDFGenerator {
constructor(concurrency = 3) {
this.concurrency = concurrency;
this.browser = null;
}
async initialize() {
this.browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
}
async generateBatch(urls, options = {}) {
if (!this.browser) {
await this.initialize();
}
const chunks = [];
for (let i = 0; i < urls.length; i += this.concurrency) {
chunks.push(urls.slice(i, i + this.concurrency));
}
const results = [];
for (const chunk of chunks) {
const promises = chunk.map(url => this.generateSinglePDF(url, options));
const chunkResults = await Promise.allSettled(promises);
results.push(...chunkResults);
}
return results;
}
async generateSinglePDF(url, options) {
const page = await this.browser.newPage();
try {
await page.goto(url, { waitUntil: 'networkidle2' });
return await page.pdf(options);
} finally {
await page.close();
}
}
}
Common Use Cases and Examples
1. Invoice Generation
async function generateInvoicePDF(invoiceData) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
const invoiceHTML = `
<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; margin: 0; padding: 20px; }
.invoice-header { text-align: center; margin-bottom: 30px; }
.invoice-details { margin-bottom: 20px; }
.invoice-table { width: 100%; border-collapse: collapse; }
.invoice-table th, .invoice-table td {
border: 1px solid #ddd; padding: 8px; text-align: left;
}
.total { font-weight: bold; background-color: #f0f0f0; }
</style>
</head>
<body>
<div class="invoice-header">
<h1>Invoice #\${invoiceData.number}</h1>
<p>Date: \${invoiceData.date}</p>
</div>
<div class="invoice-details">
<p><strong>Bill To:</strong> \${invoiceData.customer}</p>
<p><strong>Address:</strong> \${invoiceData.address}</p>
</div>
<table class="invoice-table">
<thead>
<tr>
<th>Description</th>
<th>Quantity</th>
<th>Price</th>
<th>Total</th>
</tr>
</thead>
<tbody>
\${invoiceData.items.map(item => \`
<tr>
<td>\${item.description}</td>
<td>\${item.quantity}</td>
<td>$\${item.price}</td>
<td>$\${item.total}</td>
</tr>
\`).join('')}
<tr class="total">
<td colspan="3"><strong>Total Amount</strong></td>
<td><strong>$\${invoiceData.totalAmount}</strong></td>
</tr>
</tbody>
</table>
</body>
</html>
`;
await page.setContent(invoiceHTML);
const pdf = await page.pdf({
path: `invoice-${invoiceData.number}.pdf`,
format: 'A4',
printBackground: true
});
await browser.close();
return pdf;
}
2. Report Generation with Charts
async function generateReportWithCharts(reportData) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://your-app.com/report-page');
// Wait for charts to render
await page.waitForSelector('.chart-container', { timeout: 10000 });
await page.waitForTimeout(3000); // Additional wait for animations
const pdf = await page.pdf({
path: 'report-with-charts.pdf',
format: 'A4',
printBackground: true,
margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' }
});
await browser.close();
return pdf;
}
Python Alternative with Pyppeteer
For Python developers, you can use pyppeteer, which is a Python port of Puppeteer:
import asyncio
from pyppeteer import launch
async def generate_pdf():
browser = await launch()
page = await browser.newPage()
await page.goto('https://example.com')
# Generate PDF
await page.pdf({
'path': 'example.pdf',
'format': 'A4',
'printBackground': True
})
await browser.close()
# Run the function
asyncio.get_event_loop().run_until_complete(generate_pdf())
Troubleshooting Common Issues
1. Missing Fonts
If your PDF has missing fonts, install them in your system or use web fonts:
await page.addStyleTag({
url: 'https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap'
});
2. Large File Sizes
To reduce PDF file size:
const pdf = await page.pdf({
path: 'optimized.pdf',
format: 'A4',
printBackground: false, // Disable background images
scale: 0.8 // Reduce scale
});
3. Memory Issues
For large documents, consider using streaming:
const pdf = await page.pdf({
// Don't specify path to get buffer
format: 'A4',
printBackground: true
});
// Write to file in chunks
const fs = require('fs');
fs.writeFileSync('large-document.pdf', pdf);
4. Timeout Issues
For pages that take long to load, increase timeout values:
await page.goto('https://slow-loading-site.com', {
waitUntil: 'networkidle2',
timeout: 60000 // 60 seconds
});
Docker Integration
When using Puppeteer in Docker, you'll need additional configuration:
FROM node:16-alpine
# Install Chrome dependencies
RUN apk add --no-cache \
chromium \
nss \
freetype \
freetype-dev \
harfbuzz \
ca-certificates \
ttf-freefont
# Set Chrome path
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
CMD ["node", "index.js"]
Best Practices Summary
- Always close browser instances to prevent memory leaks
- Set appropriate timeouts for page loading and network requests
- Use headless mode in production for better performance
- Handle errors gracefully with try-catch blocks
- Optimize for performance by reusing browser instances when generating multiple PDFs
- Configure margins and print settings based on your content requirements
- Test with various content types to ensure consistent rendering
Conclusion
Puppeteer's PDF generation capabilities are robust and flexible, making it an excellent choice for creating professional documents from web content. Whether you're generating invoices, reports, or any other type of document, Puppeteer provides the tools you need to create high-quality PDFs programmatically.
The key to successful PDF generation lies in understanding your content, properly handling dynamic elements, and configuring the appropriate options for your use case. With the examples and best practices outlined in this guide, you'll be able to implement reliable PDF generation in your applications.
For more advanced automation needs, you might also want to explore Playwright's cross-browser capabilities, which can complement your PDF generation workflows.