How to handle file downloads in Puppeteer?

Handling file downloads in Puppeteer can be a bit tricky because Puppeteer is designed to automate and control a Chrome browser or a Chromium instance and it's not meant to handle file downloads.

However, there are workarounds to handle file downloads in Puppeteer. Here's an example of how to do this in JavaScript:

Step 1: Set the download directory

First, you need to set the download directory in the puppeteer.launch options. Make sure to use headless: false and userDataDir options:

const browser = await puppeteer.launch({
  headless: false,
  userDataDir: './data',
});

This code will create a new directory named data where the downloaded files will be stored.

Step 2: Set the download behavior

Next, you need to set the download behavior in the browser context. Here, you set the download path and behavior to allow:

const page = await browser.newPage();
await page._client.send('Page.setDownloadBehavior', {
  behavior: 'allow',
  downloadPath: './data',
});

Step 3: Navigate and Download

Then, navigate to the page and click the download link:

await page.goto('https://example.com');
await page.click('#download_button');

This will download the file to the data directory.

Remember that Puppeteer's file download capabilities are limited and might not work for all websites. If you need to download files from a website, a better approach might be to use Puppeteer to get the download link and then use a library like axios or request-promise to download the file.

const axios = require('axios');
const fs = require('fs');

// Get the download link with Puppeteer...
const downloadLink = await page.evaluate(() => {
  return document.querySelector('#download_button').href;
});

// Then download the file with Axios...
const response = await axios({
  url: downloadLink,
  method: 'GET',
  responseType: 'stream',
});

const writer = fs.createWriteStream('./data/file.pdf');

response.data.pipe(writer);

writer.on('finish', () => {
  console.log('Download finished');
});

writer.on('error', (error) => {
  console.error('Error occurred:', error);
});

The code above uses Puppeteer to get the download link, and then uses Axios to download the file. The file is written to the disk with the fs module.

In conclusion, Puppeteer is an excellent tool for web scraping and automation, but it has limitations when it comes to file downloads. Using Puppeteer in conjunction with a library like Axios for file downloads can be a good solution.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon