Table of contents

How can I use Puppeteer with Docker?

Running Puppeteer in Docker containers is essential for deploying web scraping applications in production environments. This guide covers modern best practices for containerizing Puppeteer applications.

Quick Start Dockerfile

Here's a production-ready Dockerfile using the latest Node.js and security best practices:

FROM node:18-slim

# Install Chrome dependencies
RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
    && mkdir -p /home/pptruser/Downloads \
    && chown -R pptruser:pptruser /home/pptruser

# Set working directory
WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application code
COPY . .

# Change ownership to non-root user
RUN chown -R pptruser:pptruser /app

# Switch to non-root user
USER pptruser

CMD ["node", "index.js"]

Alternative: Using Official Puppeteer Image

For simpler setups, use the official Puppeteer Docker image:

FROM ghcr.io/puppeteer/puppeteer:21.5.2

# Copy package files
COPY package*.json ./

# Install dependencies (Puppeteer already installed)
RUN npm ci --omit=dev

# Copy application code
COPY . .

CMD ["node", "index.js"]

Secure Container Configuration

1. Build and Run Commands

# Build the image
docker build -t puppeteer-app .

# Run with security sandbox (recommended)
docker run --rm --init --cap-add=SYS_ADMIN \
  --security-opt seccomp=unconfined \
  puppeteer-app

# Run without sandbox (less secure but simpler)
docker run --rm --init puppeteer-app

2. Docker Compose Setup

version: '3.8'
services:
  puppeteer:
    build: .
    init: true
    cap_add:
      - SYS_ADMIN
    security_opt:
      - seccomp:unconfined
    volumes:
      - ./output:/app/output
    environment:
      - NODE_ENV=production

Puppeteer Configuration for Docker

Basic Configuration

const puppeteer = require('puppeteer');

const launchOptions = {
  headless: 'new',
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',
    '--disable-accelerated-2d-canvas',
    '--no-first-run',
    '--no-zygote',
    '--single-process',
    '--disable-gpu'
  ]
};

(async () => {
  const browser = await puppeteer.launch(launchOptions);
  const page = await browser.newPage();

  await page.goto('https://example.com');
  const screenshot = await page.screenshot({ 
    path: '/app/output/screenshot.png',
    fullPage: true 
  });

  await browser.close();
})();

Production-Ready Example

const puppeteer = require('puppeteer');

class DockerPuppeteerService {
  constructor() {
    this.launchOptions = {
      headless: 'new',
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-dev-shm-usage',
        '--disable-accelerated-2d-canvas',
        '--no-first-run',
        '--no-zygote',
        '--disable-gpu',
        '--disable-web-security',
        '--disable-features=VizDisplayCompositor'
      ],
      timeout: 30000
    };
  }

  async scrapeWithRetry(url, maxRetries = 3) {
    let browser;

    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        browser = await puppeteer.launch(this.launchOptions);
        const page = await browser.newPage();

        await page.setViewport({ width: 1920, height: 1080 });
        await page.goto(url, { waitUntil: 'networkidle2', timeout: 15000 });

        const data = await page.evaluate(() => {
          return {
            title: document.title,
            content: document.body.innerText,
            url: window.location.href
          };
        });

        return data;

      } catch (error) {
        console.error(`Attempt ${attempt} failed:`, error.message);
        if (attempt === maxRetries) throw error;
        await new Promise(resolve => setTimeout(resolve, 1000 * attempt));

      } finally {
        if (browser) {
          await browser.close();
        }
      }
    }
  }
}

// Usage
(async () => {
  const scraper = new DockerPuppeteerService();
  try {
    const result = await scraper.scrapeWithRetry('https://example.com');
    console.log('Scraped data:', result);
  } catch (error) {
    console.error('Scraping failed:', error);
    process.exit(1);
  }
})();

Troubleshooting Common Issues

Memory Issues

If you encounter memory problems, increase shared memory:

docker run --rm --init --shm-size=1gb \
  --cap-add=SYS_ADMIN puppeteer-app

Permission Errors

Ensure proper user permissions in your Dockerfile:

# Create and switch to non-root user
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 --gid 1001 nodejs
USER nodejs

Font Rendering Issues

Install additional fonts for international content:

RUN apt-get update && apt-get install -y \
  fonts-liberation \
  fonts-noto-color-emoji \
  fonts-noto-cjk-extra \
  && rm -rf /var/lib/apt/lists/*

Best Practices

  1. Use multi-stage builds to reduce image size
  2. Pin specific versions of Node.js and Puppeteer
  3. Run as non-root user for security
  4. Set resource limits in production
  5. Use health checks to monitor container status
  6. Handle graceful shutdowns with proper signal handling

This setup provides a robust foundation for running Puppeteer applications in Docker containers with proper security and performance considerations.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon