How to generate PDFs using Puppeteer?

Generating PDFs using Puppeteer is relatively straightforward. Puppeteer is a Node.js library that provides a high-level API to control a headless Chrome or Chromium over the DevTools Protocol. One of the many things Puppeteer can do is generate PDFs of any web page.

Here's a step-by-step guide on how you can do it:

Step 1: Install Puppeteer

If you haven't installed Puppeteer in your project, you can do so by running the following command in your terminal:

npm install puppeteer

Step 2: Write the Puppeteer Script

Now let's write a script that will generate a PDF from a webpage. Here's a simple example:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('http://example.com', {waitUntil: 'networkidle2'});
  await page.pdf({path: 'example.pdf', format: 'A4'});

  await browser.close();
})();

In this script, we first require the Puppeteer module. Then we launch a new browser instance, open a new page, and navigate to 'http://example.com'. We wait until all network connections have been idle for at least 500 ms. Once the page is loaded, we generate a PDF of the page with the page.pdf() function and save it as 'example.pdf'.

The page.pdf() function takes an options object where you can set many properties like format, header, footer, etc. In this case, we're setting the format to 'A4'.

Finally, we close the browser.

Step 3: Run the Script

To run the script, save it to a file (e.g., generatePDF.js) and run it with Node:

node generatePDF.js

If everything goes well, you should see a file named 'example.pdf' in your project folder, which is the PDF version of the webpage you chose.

Remember that the PDF generation functionality in Puppeteer works only in headless mode.

This simple script is the basic usage of Puppeteer for generating PDFs. You can adjust it according to your needs, for example, by adding error handling or by generating PDFs from multiple pages.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon