Running Puppeteer in a serverless environment can be a bit tricky due to its dependencies. But it's certainly possible, and one popular provider where it can be run is Google Cloud Functions. Here's how you can do it:
Steps for Running Puppeteer in Google Cloud Functions
- Install Puppeteer: First, you need to install Puppeteer in your project. You can do it by running the following command:
npm install puppeteer
- Install serverless-google-cloudfunctions plugin: To deploy your function to Google Cloud you need to install the serverless-google-cloudfunctions plugin. You can do this by running the following command:
npm install --save-dev serverless-google-cloudfunctions
- Configure serverless.yml: Your
serverless.yml
should look something like this:
service: puppeteer-gcf
provider:
name: google
runtime: nodejs10
project: your-gcp-project-id
credentials: ~/.gcp/keyfile.json
plugins:
- serverless-google-cloudfunctions
package:
include:
- node_modules/**
- package.json
- index.js
functions:
puppeteerFunc:
handler: puppeteerFunc
events:
- http: path
- Write your Cloud Function: Here's a basic example of a Puppeteer script running in a Google Cloud Function.
const puppeteer = require('puppeteer');
exports.puppeteerFunc = async (req, res) => {
const browser = await puppeteer.launch({
args: ['--no-sandbox', '--disable-setuid-sandbox'],
headless: true
});
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.title();
await browser.close();
res.status(200).send(`Title of the page: ${title}`);
};
- Deploy your function: Deploy your function with the following command:
sls deploy
Note:
- You need to enable the Cloud Functions API and the Cloud Build API in the Google Cloud Console for your project.
- The
--no-sandbox
and--disable-setuid-sandbox
flags are necessary to run Puppeteer in the Google Cloud Function environment. - This example uses the Serverless Framework, but you can also use the
gcloud
CLI tool to deploy the function.
Remember, running Puppeteer on serverless might not be the best solution for all use-cases due to the cold-start time and the time it takes to launch a new browser instance. It's recommended for light usage and tasks that can afford a bit of latency.