How to Handle Memory Leaks in Puppeteer?
Memory leaks in Puppeteer applications are a common issue that can lead to degraded performance, system crashes, and resource exhaustion. This comprehensive guide covers how to identify, prevent, and fix memory leaks in your Puppeteer-based web scraping and automation projects.
Understanding Memory Leaks in Puppeteer
Memory leaks in Puppeteer typically occur when browser resources aren't properly released after use. This includes browser instances, pages, and DOM elements that remain in memory even after they're no longer needed. Over time, these accumulated resources can consume significant system memory and cause your application to slow down or crash.
Common Causes of Memory Leaks
1. Not Closing Browser Instances
The most common cause of memory leaks is failing to close browser instances:
// ❌ Bad - Memory leak
const puppeteer = require('puppeteer');
async function scrapeWebsite() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Perform scraping operations
const data = await page.evaluate(() => {
return document.title;
});
// Browser instance is never closed - MEMORY LEAK!
return data;
}
// ✅ Good - Proper cleanup
const puppeteer = require('puppeteer');
async function scrapeWebsite() {
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
// Perform scraping operations
const data = await page.evaluate(() => {
return document.title;
});
return data;
} finally {
await browser.close(); // Always close the browser
}
}
2. Not Disposing of Pages
Creating multiple pages without properly closing them:
// ❌ Bad - Multiple pages without cleanup
async function scrapeMultiplePages(urls) {
const browser = await puppeteer.launch();
const results = [];
for (const url of urls) {
const page = await browser.newPage();
await page.goto(url);
const data = await page.evaluate(() => {
return document.title;
});
results.push(data);
// Page is never closed - MEMORY LEAK!
}
await browser.close();
return results;
}
// ✅ Good - Proper page cleanup
async function scrapeMultiplePages(urls) {
const browser = await puppeteer.launch();
const results = [];
try {
for (const url of urls) {
const page = await browser.newPage();
try {
await page.goto(url);
const data = await page.evaluate(() => {
return document.title;
});
results.push(data);
} finally {
await page.close(); // Always close the page
}
}
return results;
} finally {
await browser.close();
}
}
3. Event Listeners Not Removed
Event listeners that aren't properly removed can cause memory leaks:
// ❌ Bad - Event listeners not removed
async function setupPageWithListeners() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Adding event listeners
page.on('console', (msg) => console.log(msg.text()));
page.on('pageerror', (err) => console.error(err));
page.on('response', (response) => console.log(response.url()));
// Page operations...
await page.goto('https://example.com');
// Event listeners are still active - potential memory leak
await page.close();
await browser.close();
}
// ✅ Good - Proper event listener cleanup
async function setupPageWithListeners() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Define event handlers
const consoleHandler = (msg) => console.log(msg.text());
const errorHandler = (err) => console.error(err);
const responseHandler = (response) => console.log(response.url());
// Add event listeners
page.on('console', consoleHandler);
page.on('pageerror', errorHandler);
page.on('response', responseHandler);
try {
// Page operations...
await page.goto('https://example.com');
} finally {
// Remove event listeners
page.off('console', consoleHandler);
page.off('pageerror', errorHandler);
page.off('response', responseHandler);
await page.close();
await browser.close();
}
}
Best Practices for Memory Management
1. Use Browser Pooling
For applications that need to handle multiple concurrent requests, implement browser pooling:
class BrowserPool {
constructor(maxBrowsers = 5) {
this.maxBrowsers = maxBrowsers;
this.browsers = [];
this.busyBrowsers = new Set();
}
async getBrowser() {
// Find available browser
const availableBrowser = this.browsers.find(
browser => !this.busyBrowsers.has(browser)
);
if (availableBrowser) {
this.busyBrowsers.add(availableBrowser);
return availableBrowser;
}
// Create new browser if under limit
if (this.browsers.length < this.maxBrowsers) {
const browser = await puppeteer.launch();
this.browsers.push(browser);
this.busyBrowsers.add(browser);
return browser;
}
// Wait for available browser
return new Promise((resolve) => {
const checkAvailable = () => {
const browser = this.browsers.find(
b => !this.busyBrowsers.has(b)
);
if (browser) {
this.busyBrowsers.add(browser);
resolve(browser);
} else {
setTimeout(checkAvailable, 100);
}
};
checkAvailable();
});
}
releaseBrowser(browser) {
this.busyBrowsers.delete(browser);
}
async closeAll() {
await Promise.all(this.browsers.map(browser => browser.close()));
this.browsers = [];
this.busyBrowsers.clear();
}
}
2. Implement Proper Resource Cleanup
Create a utility class for managing Puppeteer resources:
class PuppeteerManager {
constructor() {
this.browsers = new Set();
this.pages = new Set();
}
async createBrowser(options = {}) {
const browser = await puppeteer.launch(options);
this.browsers.add(browser);
return browser;
}
async createPage(browser) {
const page = await browser.newPage();
this.pages.add(page);
return page;
}
async closePage(page) {
if (this.pages.has(page)) {
await page.close();
this.pages.delete(page);
}
}
async closeBrowser(browser) {
if (this.browsers.has(browser)) {
await browser.close();
this.browsers.delete(browser);
}
}
async cleanup() {
// Close all pages
const pagePromises = Array.from(this.pages).map(page =>
page.close().catch(err => console.error('Error closing page:', err))
);
await Promise.all(pagePromises);
this.pages.clear();
// Close all browsers
const browserPromises = Array.from(this.browsers).map(browser =>
browser.close().catch(err => console.error('Error closing browser:', err))
);
await Promise.all(browserPromises);
this.browsers.clear();
}
}
3. Monitor Memory Usage
Implement memory monitoring to detect potential leaks:
class MemoryMonitor {
constructor(threshold = 500 * 1024 * 1024) { // 500MB threshold
this.threshold = threshold;
this.interval = null;
}
start() {
this.interval = setInterval(() => {
const memUsage = process.memoryUsage();
const heapUsed = memUsage.heapUsed;
console.log(`Memory usage: ${Math.round(heapUsed / 1024 / 1024)} MB`);
if (heapUsed > this.threshold) {
console.warn('Memory usage exceeds threshold!');
// Force garbage collection if --expose-gc flag is used
if (global.gc) {
global.gc();
}
}
}, 10000); // Check every 10 seconds
}
stop() {
if (this.interval) {
clearInterval(this.interval);
this.interval = null;
}
}
}
Advanced Memory Management Techniques
1. Page Reuse Strategy
Instead of creating new pages for each request, reuse pages when possible:
class PagePool {
constructor(browser, maxPages = 10) {
this.browser = browser;
this.maxPages = maxPages;
this.availablePages = [];
this.busyPages = new Set();
}
async getPage() {
if (this.availablePages.length > 0) {
const page = this.availablePages.pop();
this.busyPages.add(page);
return page;
}
if (this.busyPages.size < this.maxPages) {
const page = await this.browser.newPage();
this.busyPages.add(page);
return page;
}
// Wait for available page
return new Promise((resolve) => {
const checkAvailable = () => {
if (this.availablePages.length > 0) {
const page = this.availablePages.pop();
this.busyPages.add(page);
resolve(page);
} else {
setTimeout(checkAvailable, 100);
}
};
checkAvailable();
});
}
async releasePage(page) {
if (this.busyPages.has(page)) {
// Clean up page state
await page.evaluate(() => {
// Clear local storage
localStorage.clear();
// Clear session storage
sessionStorage.clear();
// Remove all event listeners
window.removeEventListener = () => {};
});
this.busyPages.delete(page);
this.availablePages.push(page);
}
}
}
2. Graceful Shutdown Handling
Implement proper shutdown handling to prevent memory leaks during application termination:
class GracefulShutdown {
constructor() {
this.resources = [];
this.setupSignalHandlers();
}
addResource(resource) {
this.resources.push(resource);
}
setupSignalHandlers() {
const signals = ['SIGINT', 'SIGTERM', 'SIGQUIT'];
signals.forEach(signal => {
process.on(signal, async () => {
console.log(`Received ${signal}, shutting down gracefully...`);
await this.cleanup();
process.exit(0);
});
});
process.on('uncaughtException', async (error) => {
console.error('Uncaught Exception:', error);
await this.cleanup();
process.exit(1);
});
}
async cleanup() {
const cleanupPromises = this.resources.map(resource => {
if (typeof resource.cleanup === 'function') {
return resource.cleanup();
}
return Promise.resolve();
});
await Promise.all(cleanupPromises);
}
}
Python Example with pyppeteer
For Python developers using pyppeteer, similar memory management principles apply:
import asyncio
from pyppeteer import launch
import gc
class PuppeteerManager:
def __init__(self):
self.browsers = set()
self.pages = set()
async def create_browser(self, **options):
browser = await launch(**options)
self.browsers.add(browser)
return browser
async def create_page(self, browser):
page = await browser.newPage()
self.pages.add(page)
return page
async def close_page(self, page):
if page in self.pages:
await page.close()
self.pages.remove(page)
async def close_browser(self, browser):
if browser in self.browsers:
await browser.close()
self.browsers.remove(browser)
async def cleanup(self):
# Close all pages
for page in list(self.pages):
try:
await page.close()
except Exception as e:
print(f"Error closing page: {e}")
self.pages.clear()
# Close all browsers
for browser in list(self.browsers):
try:
await browser.close()
except Exception as e:
print(f"Error closing browser: {e}")
self.browsers.clear()
# Force garbage collection
gc.collect()
# Usage example
async def scrape_with_proper_cleanup():
manager = PuppeteerManager()
try:
browser = await manager.create_browser(headless=True)
page = await manager.create_page(browser)
await page.goto('https://example.com')
title = await page.title()
return title
finally:
await manager.cleanup()
Monitoring and Debugging Memory Issues
1. Using Chrome DevTools
You can connect Chrome DevTools to your Puppeteer instance for memory profiling:
const browser = await puppeteer.launch({
devtools: true,
slowMo: 100
});
2. Memory Profiling in Node.js
Use Node.js built-in profiling tools:
# Run with memory profiling
node --inspect --expose-gc your-script.js
# Generate heap snapshot
node --heapsnapshot-signal=SIGUSR2 your-script.js
3. Automated Memory Testing
Create tests to verify memory cleanup:
const { expect } = require('chai');
const puppeteer = require('puppeteer');
describe('Memory Leak Tests', () => {
it('should properly clean up browser instances', async () => {
const initialMemory = process.memoryUsage().heapUsed;
// Create and close multiple browsers
for (let i = 0; i < 10; i++) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.close();
await browser.close();
}
// Force garbage collection
if (global.gc) {
global.gc();
}
const finalMemory = process.memoryUsage().heapUsed;
const memoryIncrease = finalMemory - initialMemory;
// Memory increase should be minimal
expect(memoryIncrease).to.be.lessThan(50 * 1024 * 1024); // 50MB
});
});
Common Memory Leak Patterns to Avoid
1. Circular References
// ❌ Bad - Circular reference
class PageManager {
constructor() {
this.pages = [];
}
async createPage(browser) {
const page = await browser.newPage();
page.manager = this; // Circular reference
this.pages.push(page);
return page;
}
}
// ✅ Good - Avoid circular references
class PageManager {
constructor() {
this.pages = new WeakSet(); // Use WeakSet for automatic cleanup
}
async createPage(browser) {
const page = await browser.newPage();
this.pages.add(page);
return page;
}
}
2. Global Variables
// ❌ Bad - Global variables holding references
let globalPages = [];
async function createPage(browser) {
const page = await browser.newPage();
globalPages.push(page); // Global reference prevents cleanup
return page;
}
// ✅ Good - Local scope management
async function createPage(browser) {
const page = await browser.newPage();
// Return page without storing global reference
return page;
}
When to Consider Alternative Solutions
While Puppeteer is excellent for many use cases, consider alternatives like Playwright for cross-browser automation or specialized scraping services when dealing with high-volume operations that require extensive memory management.
For complex scraping scenarios that require sophisticated memory management, you might also want to explore best practices for web scraping optimization that apply to both Puppeteer and Playwright.
Conclusion
Handling memory leaks in Puppeteer requires a systematic approach involving proper resource cleanup, monitoring, and implementing best practices. By following the techniques outlined in this guide, you can build robust, memory-efficient Puppeteer applications that can handle high-volume scraping tasks without degrading performance.
Key takeaways for preventing memory leaks:
- Always close resources: Use try-finally blocks to ensure browsers and pages are closed
- Remove event listeners: Properly clean up event handlers to prevent memory retention
- Implement monitoring: Track memory usage and set up alerts for abnormal consumption
- Use resource pooling: Reuse browser instances and pages when possible
- Handle graceful shutdown: Implement proper cleanup on application termination
- Test for leaks: Create automated tests to verify memory cleanup
Remember that memory management is crucial for production applications, especially when dealing with long-running processes or high-volume scraping operations. Regular monitoring and proactive cleanup will help maintain optimal performance and prevent system failures due to memory exhaustion.