Table of contents

How to Handle Memory Leaks in Puppeteer?

Memory leaks in Puppeteer applications are a common issue that can lead to degraded performance, system crashes, and resource exhaustion. This comprehensive guide covers how to identify, prevent, and fix memory leaks in your Puppeteer-based web scraping and automation projects.

Understanding Memory Leaks in Puppeteer

Memory leaks in Puppeteer typically occur when browser resources aren't properly released after use. This includes browser instances, pages, and DOM elements that remain in memory even after they're no longer needed. Over time, these accumulated resources can consume significant system memory and cause your application to slow down or crash.

Common Causes of Memory Leaks

1. Not Closing Browser Instances

The most common cause of memory leaks is failing to close browser instances:

// ❌ Bad - Memory leak
const puppeteer = require('puppeteer');

async function scrapeWebsite() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Perform scraping operations
  const data = await page.evaluate(() => {
    return document.title;
  });

  // Browser instance is never closed - MEMORY LEAK!
  return data;
}
// ✅ Good - Proper cleanup
const puppeteer = require('puppeteer');

async function scrapeWebsite() {
  const browser = await puppeteer.launch();

  try {
    const page = await browser.newPage();

    // Perform scraping operations
    const data = await page.evaluate(() => {
      return document.title;
    });

    return data;
  } finally {
    await browser.close(); // Always close the browser
  }
}

2. Not Disposing of Pages

Creating multiple pages without properly closing them:

// ❌ Bad - Multiple pages without cleanup
async function scrapeMultiplePages(urls) {
  const browser = await puppeteer.launch();
  const results = [];

  for (const url of urls) {
    const page = await browser.newPage();
    await page.goto(url);

    const data = await page.evaluate(() => {
      return document.title;
    });

    results.push(data);
    // Page is never closed - MEMORY LEAK!
  }

  await browser.close();
  return results;
}
// ✅ Good - Proper page cleanup
async function scrapeMultiplePages(urls) {
  const browser = await puppeteer.launch();
  const results = [];

  try {
    for (const url of urls) {
      const page = await browser.newPage();

      try {
        await page.goto(url);

        const data = await page.evaluate(() => {
          return document.title;
        });

        results.push(data);
      } finally {
        await page.close(); // Always close the page
      }
    }

    return results;
  } finally {
    await browser.close();
  }
}

3. Event Listeners Not Removed

Event listeners that aren't properly removed can cause memory leaks:

// ❌ Bad - Event listeners not removed
async function setupPageWithListeners() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Adding event listeners
  page.on('console', (msg) => console.log(msg.text()));
  page.on('pageerror', (err) => console.error(err));
  page.on('response', (response) => console.log(response.url()));

  // Page operations...
  await page.goto('https://example.com');

  // Event listeners are still active - potential memory leak
  await page.close();
  await browser.close();
}
// ✅ Good - Proper event listener cleanup
async function setupPageWithListeners() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Define event handlers
  const consoleHandler = (msg) => console.log(msg.text());
  const errorHandler = (err) => console.error(err);
  const responseHandler = (response) => console.log(response.url());

  // Add event listeners
  page.on('console', consoleHandler);
  page.on('pageerror', errorHandler);
  page.on('response', responseHandler);

  try {
    // Page operations...
    await page.goto('https://example.com');
  } finally {
    // Remove event listeners
    page.off('console', consoleHandler);
    page.off('pageerror', errorHandler);
    page.off('response', responseHandler);

    await page.close();
    await browser.close();
  }
}

Best Practices for Memory Management

1. Use Browser Pooling

For applications that need to handle multiple concurrent requests, implement browser pooling:

class BrowserPool {
  constructor(maxBrowsers = 5) {
    this.maxBrowsers = maxBrowsers;
    this.browsers = [];
    this.busyBrowsers = new Set();
  }

  async getBrowser() {
    // Find available browser
    const availableBrowser = this.browsers.find(
      browser => !this.busyBrowsers.has(browser)
    );

    if (availableBrowser) {
      this.busyBrowsers.add(availableBrowser);
      return availableBrowser;
    }

    // Create new browser if under limit
    if (this.browsers.length < this.maxBrowsers) {
      const browser = await puppeteer.launch();
      this.browsers.push(browser);
      this.busyBrowsers.add(browser);
      return browser;
    }

    // Wait for available browser
    return new Promise((resolve) => {
      const checkAvailable = () => {
        const browser = this.browsers.find(
          b => !this.busyBrowsers.has(b)
        );
        if (browser) {
          this.busyBrowsers.add(browser);
          resolve(browser);
        } else {
          setTimeout(checkAvailable, 100);
        }
      };
      checkAvailable();
    });
  }

  releaseBrowser(browser) {
    this.busyBrowsers.delete(browser);
  }

  async closeAll() {
    await Promise.all(this.browsers.map(browser => browser.close()));
    this.browsers = [];
    this.busyBrowsers.clear();
  }
}

2. Implement Proper Resource Cleanup

Create a utility class for managing Puppeteer resources:

class PuppeteerManager {
  constructor() {
    this.browsers = new Set();
    this.pages = new Set();
  }

  async createBrowser(options = {}) {
    const browser = await puppeteer.launch(options);
    this.browsers.add(browser);
    return browser;
  }

  async createPage(browser) {
    const page = await browser.newPage();
    this.pages.add(page);
    return page;
  }

  async closePage(page) {
    if (this.pages.has(page)) {
      await page.close();
      this.pages.delete(page);
    }
  }

  async closeBrowser(browser) {
    if (this.browsers.has(browser)) {
      await browser.close();
      this.browsers.delete(browser);
    }
  }

  async cleanup() {
    // Close all pages
    const pagePromises = Array.from(this.pages).map(page => 
      page.close().catch(err => console.error('Error closing page:', err))
    );
    await Promise.all(pagePromises);
    this.pages.clear();

    // Close all browsers
    const browserPromises = Array.from(this.browsers).map(browser => 
      browser.close().catch(err => console.error('Error closing browser:', err))
    );
    await Promise.all(browserPromises);
    this.browsers.clear();
  }
}

3. Monitor Memory Usage

Implement memory monitoring to detect potential leaks:

class MemoryMonitor {
  constructor(threshold = 500 * 1024 * 1024) { // 500MB threshold
    this.threshold = threshold;
    this.interval = null;
  }

  start() {
    this.interval = setInterval(() => {
      const memUsage = process.memoryUsage();
      const heapUsed = memUsage.heapUsed;

      console.log(`Memory usage: ${Math.round(heapUsed / 1024 / 1024)} MB`);

      if (heapUsed > this.threshold) {
        console.warn('Memory usage exceeds threshold!');
        // Force garbage collection if --expose-gc flag is used
        if (global.gc) {
          global.gc();
        }
      }
    }, 10000); // Check every 10 seconds
  }

  stop() {
    if (this.interval) {
      clearInterval(this.interval);
      this.interval = null;
    }
  }
}

Advanced Memory Management Techniques

1. Page Reuse Strategy

Instead of creating new pages for each request, reuse pages when possible:

class PagePool {
  constructor(browser, maxPages = 10) {
    this.browser = browser;
    this.maxPages = maxPages;
    this.availablePages = [];
    this.busyPages = new Set();
  }

  async getPage() {
    if (this.availablePages.length > 0) {
      const page = this.availablePages.pop();
      this.busyPages.add(page);
      return page;
    }

    if (this.busyPages.size < this.maxPages) {
      const page = await this.browser.newPage();
      this.busyPages.add(page);
      return page;
    }

    // Wait for available page
    return new Promise((resolve) => {
      const checkAvailable = () => {
        if (this.availablePages.length > 0) {
          const page = this.availablePages.pop();
          this.busyPages.add(page);
          resolve(page);
        } else {
          setTimeout(checkAvailable, 100);
        }
      };
      checkAvailable();
    });
  }

  async releasePage(page) {
    if (this.busyPages.has(page)) {
      // Clean up page state
      await page.evaluate(() => {
        // Clear local storage
        localStorage.clear();
        // Clear session storage
        sessionStorage.clear();
        // Remove all event listeners
        window.removeEventListener = () => {};
      });

      this.busyPages.delete(page);
      this.availablePages.push(page);
    }
  }
}

2. Graceful Shutdown Handling

Implement proper shutdown handling to prevent memory leaks during application termination:

class GracefulShutdown {
  constructor() {
    this.resources = [];
    this.setupSignalHandlers();
  }

  addResource(resource) {
    this.resources.push(resource);
  }

  setupSignalHandlers() {
    const signals = ['SIGINT', 'SIGTERM', 'SIGQUIT'];

    signals.forEach(signal => {
      process.on(signal, async () => {
        console.log(`Received ${signal}, shutting down gracefully...`);
        await this.cleanup();
        process.exit(0);
      });
    });

    process.on('uncaughtException', async (error) => {
      console.error('Uncaught Exception:', error);
      await this.cleanup();
      process.exit(1);
    });
  }

  async cleanup() {
    const cleanupPromises = this.resources.map(resource => {
      if (typeof resource.cleanup === 'function') {
        return resource.cleanup();
      }
      return Promise.resolve();
    });

    await Promise.all(cleanupPromises);
  }
}

Python Example with pyppeteer

For Python developers using pyppeteer, similar memory management principles apply:

import asyncio
from pyppeteer import launch
import gc

class PuppeteerManager:
    def __init__(self):
        self.browsers = set()
        self.pages = set()

    async def create_browser(self, **options):
        browser = await launch(**options)
        self.browsers.add(browser)
        return browser

    async def create_page(self, browser):
        page = await browser.newPage()
        self.pages.add(page)
        return page

    async def close_page(self, page):
        if page in self.pages:
            await page.close()
            self.pages.remove(page)

    async def close_browser(self, browser):
        if browser in self.browsers:
            await browser.close()
            self.browsers.remove(browser)

    async def cleanup(self):
        # Close all pages
        for page in list(self.pages):
            try:
                await page.close()
            except Exception as e:
                print(f"Error closing page: {e}")
        self.pages.clear()

        # Close all browsers
        for browser in list(self.browsers):
            try:
                await browser.close()
            except Exception as e:
                print(f"Error closing browser: {e}")
        self.browsers.clear()

        # Force garbage collection
        gc.collect()

# Usage example
async def scrape_with_proper_cleanup():
    manager = PuppeteerManager()

    try:
        browser = await manager.create_browser(headless=True)
        page = await manager.create_page(browser)

        await page.goto('https://example.com')
        title = await page.title()

        return title
    finally:
        await manager.cleanup()

Monitoring and Debugging Memory Issues

1. Using Chrome DevTools

You can connect Chrome DevTools to your Puppeteer instance for memory profiling:

const browser = await puppeteer.launch({
  devtools: true,
  slowMo: 100
});

2. Memory Profiling in Node.js

Use Node.js built-in profiling tools:

# Run with memory profiling
node --inspect --expose-gc your-script.js

# Generate heap snapshot
node --heapsnapshot-signal=SIGUSR2 your-script.js

3. Automated Memory Testing

Create tests to verify memory cleanup:

const { expect } = require('chai');
const puppeteer = require('puppeteer');

describe('Memory Leak Tests', () => {
  it('should properly clean up browser instances', async () => {
    const initialMemory = process.memoryUsage().heapUsed;

    // Create and close multiple browsers
    for (let i = 0; i < 10; i++) {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('https://example.com');
      await page.close();
      await browser.close();
    }

    // Force garbage collection
    if (global.gc) {
      global.gc();
    }

    const finalMemory = process.memoryUsage().heapUsed;
    const memoryIncrease = finalMemory - initialMemory;

    // Memory increase should be minimal
    expect(memoryIncrease).to.be.lessThan(50 * 1024 * 1024); // 50MB
  });
});

Common Memory Leak Patterns to Avoid

1. Circular References

// ❌ Bad - Circular reference
class PageManager {
  constructor() {
    this.pages = [];
  }

  async createPage(browser) {
    const page = await browser.newPage();
    page.manager = this; // Circular reference
    this.pages.push(page);
    return page;
  }
}
// ✅ Good - Avoid circular references
class PageManager {
  constructor() {
    this.pages = new WeakSet(); // Use WeakSet for automatic cleanup
  }

  async createPage(browser) {
    const page = await browser.newPage();
    this.pages.add(page);
    return page;
  }
}

2. Global Variables

// ❌ Bad - Global variables holding references
let globalPages = [];

async function createPage(browser) {
  const page = await browser.newPage();
  globalPages.push(page); // Global reference prevents cleanup
  return page;
}
// ✅ Good - Local scope management
async function createPage(browser) {
  const page = await browser.newPage();
  // Return page without storing global reference
  return page;
}

When to Consider Alternative Solutions

While Puppeteer is excellent for many use cases, consider alternatives like Playwright for cross-browser automation or specialized scraping services when dealing with high-volume operations that require extensive memory management.

For complex scraping scenarios that require sophisticated memory management, you might also want to explore best practices for web scraping optimization that apply to both Puppeteer and Playwright.

Conclusion

Handling memory leaks in Puppeteer requires a systematic approach involving proper resource cleanup, monitoring, and implementing best practices. By following the techniques outlined in this guide, you can build robust, memory-efficient Puppeteer applications that can handle high-volume scraping tasks without degrading performance.

Key takeaways for preventing memory leaks:

  1. Always close resources: Use try-finally blocks to ensure browsers and pages are closed
  2. Remove event listeners: Properly clean up event handlers to prevent memory retention
  3. Implement monitoring: Track memory usage and set up alerts for abnormal consumption
  4. Use resource pooling: Reuse browser instances and pages when possible
  5. Handle graceful shutdown: Implement proper cleanup on application termination
  6. Test for leaks: Create automated tests to verify memory cleanup

Remember that memory management is crucial for production applications, especially when dealing with long-running processes or high-volume scraping operations. Regular monitoring and proactive cleanup will help maintain optimal performance and prevent system failures due to memory exhaustion.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon