Handling file downloads in Puppeteer can be a bit tricky because Puppeteer is designed to automate and control a Chrome browser or a Chromium instance and it's not meant to handle file downloads.
However, there are workarounds to handle file downloads in Puppeteer. Here's an example of how to do this in JavaScript:
Step 1: Set the download directory
First, you need to set the download directory in the puppeteer.launch
options. Make sure to use headless: false
and userDataDir
options:
const browser = await puppeteer.launch({
headless: false,
userDataDir: './data',
});
This code will create a new directory named data
where the downloaded files will be stored.
Step 2: Set the download behavior
Next, you need to set the download behavior in the browser context. Here, you set the download path and behavior to allow
:
const page = await browser.newPage();
await page._client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: './data',
});
Step 3: Navigate and Download
Then, navigate to the page and click the download link:
await page.goto('https://example.com');
await page.click('#download_button');
This will download the file to the data
directory.
Remember that Puppeteer's file download capabilities are limited and might not work for all websites. If you need to download files from a website, a better approach might be to use Puppeteer to get the download link and then use a library like axios
or request-promise
to download the file.
const axios = require('axios');
const fs = require('fs');
// Get the download link with Puppeteer...
const downloadLink = await page.evaluate(() => {
return document.querySelector('#download_button').href;
});
// Then download the file with Axios...
const response = await axios({
url: downloadLink,
method: 'GET',
responseType: 'stream',
});
const writer = fs.createWriteStream('./data/file.pdf');
response.data.pipe(writer);
writer.on('finish', () => {
console.log('Download finished');
});
writer.on('error', (error) => {
console.error('Error occurred:', error);
});
The code above uses Puppeteer to get the download link, and then uses Axios to download the file. The file is written to the disk with the fs
module.
In conclusion, Puppeteer is an excellent tool for web scraping and automation, but it has limitations when it comes to file downloads. Using Puppeteer in conjunction with a library like Axios for file downloads can be a good solution.