Table of contents

How to Generate PDFs Using Puppeteer?

Puppeteer is a powerful Node.js library that provides a high-level API to control Chrome or Chromium browsers. One of its most useful features is the ability to generate PDFs from web pages programmatically. This capability is essential for creating reports, invoices, documentation, and other printable content from dynamic web applications.

Installation and Setup

Before generating PDFs, you need to install Puppeteer in your Node.js project:

npm install puppeteer

For a lighter installation that doesn't download Chromium (if you have Chrome installed):

npm install puppeteer-core

Basic PDF Generation

Here's a simple example of generating a PDF from a webpage:

const puppeteer = require('puppeteer');

async function generatePDF() {
  // Launch browser
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Navigate to the webpage
  await page.goto('https://example.com', { waitUntil: 'networkidle2' });

  // Generate PDF
  const pdf = await page.pdf({
    path: 'example.pdf',
    format: 'A4',
    printBackground: true
  });

  await browser.close();
  return pdf;
}

generatePDF().catch(console.error);

Advanced PDF Configuration Options

Puppeteer's pdf() method accepts numerous options for customizing the output:

const puppeteer = require('puppeteer');

async function generateAdvancedPDF() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com', { waitUntil: 'networkidle2' });

  const pdf = await page.pdf({
    path: 'advanced-example.pdf',
    format: 'A4',
    printBackground: true,
    margin: {
      top: '20px',
      right: '20px',
      bottom: '20px',
      left: '20px'
    },
    displayHeaderFooter: true,
    headerTemplate: '<div style="font-size: 12px; width: 100%; text-align: center;">Header Content</div>',
    footerTemplate: '<div style="font-size: 12px; width: 100%; text-align: center;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>',
    preferCSSPageSize: true,
    landscape: false,
    scale: 1.0
  });

  await browser.close();
  return pdf;
}

Generating PDFs from HTML Content

You can also generate PDFs from HTML content directly without navigating to a URL:

const puppeteer = require('puppeteer');

async function generatePDFFromHTML() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const htmlContent = `
    <!DOCTYPE html>
    <html>
    <head>
      <style>
        body { font-family: Arial, sans-serif; margin: 40px; }
        .header { color: #333; border-bottom: 2px solid #333; padding-bottom: 10px; }
        .content { margin-top: 20px; line-height: 1.6; }
        @media print {
          .no-print { display: none; }
        }
      </style>
    </head>
    <body>
      <h1 class="header">PDF Report</h1>
      <div class="content">
        <p>This is a dynamically generated PDF from HTML content.</p>
        <p>You can include any HTML and CSS styling here.</p>
      </div>
    </body>
    </html>
  `;

  await page.setContent(htmlContent);

  const pdf = await page.pdf({
    path: 'from-html.pdf',
    format: 'A4',
    printBackground: true
  });

  await browser.close();
  return pdf;
}

Handling Dynamic Content

When dealing with dynamic content that loads via JavaScript, you need to wait for the content to fully load:

const puppeteer = require('puppeteer');

async function generatePDFWithDynamicContent() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com/dynamic-content');

  // Wait for specific elements to load
  await page.waitForSelector('.dynamic-content', { timeout: 30000 });

  // Or wait for network to be idle
  await page.waitForNetworkIdle();

  // Optional: Wait for additional time for animations
  await page.waitForTimeout(2000);

  const pdf = await page.pdf({
    path: 'dynamic-content.pdf',
    format: 'A4',
    printBackground: true
  });

  await browser.close();
  return pdf;
}

Custom Page Dimensions and Orientation

You can specify custom page dimensions and orientation:

const puppeteer = require('puppeteer');

async function generateCustomSizePDF() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com');

  // Custom dimensions
  const pdf = await page.pdf({
    path: 'custom-size.pdf',
    width: '8.5in',
    height: '11in',
    printBackground: true,
    margin: {
      top: '1in',
      right: '1in',
      bottom: '1in',
      left: '1in'
    }
  });

  // Landscape orientation
  const landscapePdf = await page.pdf({
    path: 'landscape.pdf',
    format: 'A4',
    landscape: true,
    printBackground: true
  });

  await browser.close();
  return { pdf, landscapePdf };
}

Adding Headers and Footers

Puppeteer allows you to add custom headers and footers to your PDFs:

const puppeteer = require('puppeteer');

async function generatePDFWithHeaderFooter() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://example.com');

  const pdf = await page.pdf({
    path: 'with-header-footer.pdf',
    format: 'A4',
    displayHeaderFooter: true,
    headerTemplate: `
      <div style="font-size: 10px; padding: 5px; width: 100%; text-align: center; border-bottom: 1px solid #ccc;">
        <span>Company Report - Generated on ${new Date().toLocaleDateString()}</span>
      </div>
    `,
    footerTemplate: `
      <div style="font-size: 10px; padding: 5px; width: 100%; text-align: center; border-top: 1px solid #ccc;">
        <span>Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>
      </div>
    `,
    margin: {
      top: '100px',
      bottom: '100px',
      left: '20px',
      right: '20px'
    }
  });

  await browser.close();
  return pdf;
}

Error Handling and Best Practices

Here's a robust implementation with proper error handling:

const puppeteer = require('puppeteer');
const fs = require('fs').promises;

class PDFGenerator {
  constructor() {
    this.browser = null;
  }

  async initialize() {
    this.browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async generatePDF(url, options = {}) {
    if (!this.browser) {
      await this.initialize();
    }

    const page = await this.browser.newPage();

    try {
      // Set viewport for consistent rendering
      await page.setViewport({ width: 1200, height: 800 });

      // Navigate with timeout
      await page.goto(url, { 
        waitUntil: 'networkidle2',
        timeout: 30000 
      });

      // Default PDF options
      const defaultOptions = {
        format: 'A4',
        printBackground: true,
        margin: {
          top: '20px',
          right: '20px',
          bottom: '20px',
          left: '20px'
        }
      };

      const pdfOptions = { ...defaultOptions, ...options };

      const pdf = await page.pdf(pdfOptions);

      return pdf;

    } catch (error) {
      console.error('PDF generation failed:', error);
      throw error;
    } finally {
      await page.close();
    }
  }

  async close() {
    if (this.browser) {
      await this.browser.close();
    }
  }
}

// Usage example
async function main() {
  const generator = new PDFGenerator();

  try {
    const pdf = await generator.generatePDF('https://example.com', {
      path: 'output.pdf',
      format: 'A4'
    });

    console.log('PDF generated successfully');
  } catch (error) {
    console.error('Error:', error);
  } finally {
    await generator.close();
  }
}

Alternative: Using Playwright for PDF Generation

While Puppeteer is excellent for PDF generation, you might also consider Playwright for similar functionality. Playwright offers cross-browser support and can be a good alternative depending on your needs.

Performance Optimization

For better performance when generating multiple PDFs:

const puppeteer = require('puppeteer');

class BatchPDFGenerator {
  constructor(concurrency = 3) {
    this.concurrency = concurrency;
    this.browser = null;
  }

  async initialize() {
    this.browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
  }

  async generateBatch(urls, options = {}) {
    if (!this.browser) {
      await this.initialize();
    }

    const chunks = [];
    for (let i = 0; i < urls.length; i += this.concurrency) {
      chunks.push(urls.slice(i, i + this.concurrency));
    }

    const results = [];

    for (const chunk of chunks) {
      const promises = chunk.map(url => this.generateSinglePDF(url, options));
      const chunkResults = await Promise.allSettled(promises);
      results.push(...chunkResults);
    }

    return results;
  }

  async generateSinglePDF(url, options) {
    const page = await this.browser.newPage();

    try {
      await page.goto(url, { waitUntil: 'networkidle2' });
      return await page.pdf(options);
    } finally {
      await page.close();
    }
  }
}

Common Use Cases and Examples

1. Invoice Generation

async function generateInvoicePDF(invoiceData) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const invoiceHTML = `
    <!DOCTYPE html>
    <html>
    <head>
      <style>
        body { font-family: Arial, sans-serif; margin: 0; padding: 20px; }
        .invoice-header { text-align: center; margin-bottom: 30px; }
        .invoice-details { margin-bottom: 20px; }
        .invoice-table { width: 100%; border-collapse: collapse; }
        .invoice-table th, .invoice-table td { 
          border: 1px solid #ddd; padding: 8px; text-align: left; 
        }
        .total { font-weight: bold; background-color: #f0f0f0; }
      </style>
    </head>
    <body>
      <div class="invoice-header">
        <h1>Invoice #\${invoiceData.number}</h1>
        <p>Date: \${invoiceData.date}</p>
      </div>

      <div class="invoice-details">
        <p><strong>Bill To:</strong> \${invoiceData.customer}</p>
        <p><strong>Address:</strong> \${invoiceData.address}</p>
      </div>

      <table class="invoice-table">
        <thead>
          <tr>
            <th>Description</th>
            <th>Quantity</th>
            <th>Price</th>
            <th>Total</th>
          </tr>
        </thead>
        <tbody>
          \${invoiceData.items.map(item => \`
            <tr>
              <td>\${item.description}</td>
              <td>\${item.quantity}</td>
              <td>$\${item.price}</td>
              <td>$\${item.total}</td>
            </tr>
          \`).join('')}
          <tr class="total">
            <td colspan="3"><strong>Total Amount</strong></td>
            <td><strong>$\${invoiceData.totalAmount}</strong></td>
          </tr>
        </tbody>
      </table>
    </body>
    </html>
  `;

  await page.setContent(invoiceHTML);

  const pdf = await page.pdf({
    path: `invoice-${invoiceData.number}.pdf`,
    format: 'A4',
    printBackground: true
  });

  await browser.close();
  return pdf;
}

2. Report Generation with Charts

async function generateReportWithCharts(reportData) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://your-app.com/report-page');

  // Wait for charts to render
  await page.waitForSelector('.chart-container', { timeout: 10000 });
  await page.waitForTimeout(3000); // Additional wait for animations

  const pdf = await page.pdf({
    path: 'report-with-charts.pdf',
    format: 'A4',
    printBackground: true,
    margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' }
  });

  await browser.close();
  return pdf;
}

Python Alternative with Pyppeteer

For Python developers, you can use pyppeteer, which is a Python port of Puppeteer:

import asyncio
from pyppeteer import launch

async def generate_pdf():
    browser = await launch()
    page = await browser.newPage()

    await page.goto('https://example.com')

    # Generate PDF
    await page.pdf({
        'path': 'example.pdf',
        'format': 'A4',
        'printBackground': True
    })

    await browser.close()

# Run the function
asyncio.get_event_loop().run_until_complete(generate_pdf())

Troubleshooting Common Issues

1. Missing Fonts

If your PDF has missing fonts, install them in your system or use web fonts:

await page.addStyleTag({
  url: 'https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap'
});

2. Large File Sizes

To reduce PDF file size:

const pdf = await page.pdf({
  path: 'optimized.pdf',
  format: 'A4',
  printBackground: false, // Disable background images
  scale: 0.8 // Reduce scale
});

3. Memory Issues

For large documents, consider using streaming:

const pdf = await page.pdf({
  // Don't specify path to get buffer
  format: 'A4',
  printBackground: true
});

// Write to file in chunks
const fs = require('fs');
fs.writeFileSync('large-document.pdf', pdf);

4. Timeout Issues

For pages that take long to load, increase timeout values:

await page.goto('https://slow-loading-site.com', { 
  waitUntil: 'networkidle2',
  timeout: 60000 // 60 seconds
});

Docker Integration

When using Puppeteer in Docker, you'll need additional configuration:

FROM node:16-alpine

# Install Chrome dependencies
RUN apk add --no-cache \
    chromium \
    nss \
    freetype \
    freetype-dev \
    harfbuzz \
    ca-certificates \
    ttf-freefont

# Set Chrome path
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
    PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .
CMD ["node", "index.js"]

Best Practices Summary

  1. Always close browser instances to prevent memory leaks
  2. Set appropriate timeouts for page loading and network requests
  3. Use headless mode in production for better performance
  4. Handle errors gracefully with try-catch blocks
  5. Optimize for performance by reusing browser instances when generating multiple PDFs
  6. Configure margins and print settings based on your content requirements
  7. Test with various content types to ensure consistent rendering

Conclusion

Puppeteer's PDF generation capabilities are robust and flexible, making it an excellent choice for creating professional documents from web content. Whether you're generating invoices, reports, or any other type of document, Puppeteer provides the tools you need to create high-quality PDFs programmatically.

The key to successful PDF generation lies in understanding your content, properly handling dynamic elements, and configuring the appropriate options for your use case. With the examples and best practices outlined in this guide, you'll be able to implement reliable PDF generation in your applications.

For more advanced automation needs, you might also want to explore Playwright's cross-browser capabilities, which can complement your PDF generation workflows.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon