How can I use Puppeteer in a serverless environment?

Running Puppeteer in a serverless environment can be a bit tricky due to its dependencies. But it's certainly possible, and one popular provider where it can be run is Google Cloud Functions. Here's how you can do it:

Steps for Running Puppeteer in Google Cloud Functions

  1. Install Puppeteer: First, you need to install Puppeteer in your project. You can do it by running the following command:
npm install puppeteer
  1. Install serverless-google-cloudfunctions plugin: To deploy your function to Google Cloud you need to install the serverless-google-cloudfunctions plugin. You can do this by running the following command:
npm install --save-dev serverless-google-cloudfunctions
  1. Configure serverless.yml: Your serverless.yml should look something like this:
service: puppeteer-gcf

provider:
  name: google
  runtime: nodejs10
  project: your-gcp-project-id
  credentials: ~/.gcp/keyfile.json

plugins:
  - serverless-google-cloudfunctions

package:
  include:
    - node_modules/**
    - package.json
    - index.js

functions:
  puppeteerFunc:
    handler: puppeteerFunc
    events:
      - http: path
  1. Write your Cloud Function: Here's a basic example of a Puppeteer script running in a Google Cloud Function.
const puppeteer = require('puppeteer');

exports.puppeteerFunc = async (req, res) => {
    const browser = await puppeteer.launch({
        args: ['--no-sandbox', '--disable-setuid-sandbox'],
        headless: true
    });

    const page = await browser.newPage();
    await page.goto('https://example.com');
    const title = await page.title();

    await browser.close();

    res.status(200).send(`Title of the page: ${title}`);
};
  1. Deploy your function: Deploy your function with the following command:
sls deploy

Note:

  • You need to enable the Cloud Functions API and the Cloud Build API in the Google Cloud Console for your project.
  • The --no-sandbox and --disable-setuid-sandbox flags are necessary to run Puppeteer in the Google Cloud Function environment.
  • This example uses the Serverless Framework, but you can also use the gcloud CLI tool to deploy the function.

Remember, running Puppeteer on serverless might not be the best solution for all use-cases due to the cold-start time and the time it takes to launch a new browser instance. It's recommended for light usage and tasks that can afford a bit of latency.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon