Table of contents

What are the Memory Management Best Practices When Running Long Playwright Sessions?

Running long Playwright sessions can lead to memory leaks and performance degradation if not properly managed. This comprehensive guide covers essential memory management techniques to ensure your Playwright automation runs efficiently over extended periods.

Understanding Memory Challenges in Long Playwright Sessions

Long-running Playwright sessions face several memory-related challenges:

  • Browser process accumulation: Each new page or context consumes memory
  • DOM node retention: Unfreferenced DOM elements may remain in memory
  • Event listener leaks: Attached listeners that aren't properly removed
  • Resource accumulation: Images, stylesheets, and scripts cached in memory
  • Network request buffers: Accumulated response data from numerous requests

Core Memory Management Strategies

1. Proper Context and Page Management

Always clean up browser contexts and pages when they're no longer needed:

// JavaScript/Node.js
const { chromium } = require('playwright');

async function runLongSession() {
  const browser = await chromium.launch();

  try {
    // Create a new context for each logical session
    const context = await browser.newContext();
    const page = await context.newPage();

    // Your automation logic here
    await page.goto('https://example.com');

    // Properly close resources
    await page.close();
    await context.close();
  } finally {
    await browser.close();
  }
}
# Python
import asyncio
from playwright.async_api import async_playwright

async def run_long_session():
    async with async_playwright() as p:
        browser = await p.chromium.launch()

        try:
            # Create context for session isolation
            context = await browser.new_context()
            page = await context.new_page()

            # Your automation logic
            await page.goto('https://example.com')

            # Clean up resources
            await page.close()
            await context.close()
        finally:
            await browser.close()

asyncio.run(run_long_session())

2. Context Recycling Pattern

For long-running sessions, implement a context recycling pattern:

class PlaywrightSessionManager {
  constructor() {
    this.browser = null;
    this.currentContext = null;
    this.pageCount = 0;
    this.maxPagesPerContext = 10; // Recycle context after 10 pages
  }

  async initialize() {
    this.browser = await chromium.launch();
    await this.createNewContext();
  }

  async createNewContext() {
    if (this.currentContext) {
      await this.currentContext.close();
    }

    this.currentContext = await this.browser.newContext();
    this.pageCount = 0;
  }

  async getPage() {
    if (this.pageCount >= this.maxPagesPerContext) {
      await this.createNewContext();
    }

    this.pageCount++;
    return await this.currentContext.newPage();
  }

  async cleanup() {
    if (this.currentContext) {
      await this.currentContext.close();
    }
    if (this.browser) {
      await this.browser.close();
    }
  }
}

3. Resource Management and Cleanup

Disable unnecessary resource loading to reduce memory consumption:

// Disable images and stylesheets for data scraping
const context = await browser.newContext({
  ignoreHTTPSErrors: true,
  extraHTTPHeaders: {
    'Accept-Language': 'en-US,en;q=0.9'
  }
});

// Block resource types that aren't needed
await context.route('**/*', (route) => {
  const resourceType = route.request().resourceType();
  if (['image', 'stylesheet', 'font'].includes(resourceType)) {
    route.abort();
  } else {
    route.continue();
  }
});
# Python equivalent
async def block_resources(route):
    resource_type = route.request.resource_type
    if resource_type in ['image', 'stylesheet', 'font']:
        await route.abort()
    else:
        await route.continue_()

context = await browser.new_context()
await context.route('**/*', block_resources)

4. Memory Monitoring and Limits

Implement memory monitoring to track usage:

const process = require('process');

function getMemoryUsage() {
  const used = process.memoryUsage();
  const usage = {};
  for (let key in used) {
    usage[key] = Math.round(used[key] / 1024 / 1024 * 100) / 100;
  }
  return usage;
}

async function monitoredAutomation() {
  console.log('Initial memory:', getMemoryUsage());

  // Your Playwright automation
  const browser = await chromium.launch();
  const context = await browser.newContext();

  // Check memory periodically
  setInterval(() => {
    const memory = getMemoryUsage();
    console.log('Memory usage:', memory);

    // Restart if memory exceeds threshold
    if (memory.heapUsed > 512) { // 512 MB threshold
      console.log('Memory threshold exceeded, restarting...');
      restartSession();
    }
  }, 30000); // Check every 30 seconds
}

5. Efficient Page Navigation Patterns

When navigating between multiple pages, use efficient patterns:

// Reuse the same page instance instead of creating new ones
async function processMultipleUrls(urls) {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();

  try {
    for (const url of urls) {
      await page.goto(url);

      // Process the page
      const data = await page.evaluate(() => {
        // Extract data
        return document.title;
      });

      // Clear any event listeners or timers
      await page.evaluate(() => {
        // Clean up page-specific resources
        if (window.intervalIds) {
          window.intervalIds.forEach(id => clearInterval(id));
        }
      });

      console.log(`Processed: ${url} - ${data}`);
    }
  } finally {
    await page.close();
    await context.close();
    await browser.close();
  }
}

6. Garbage Collection Optimization

Force garbage collection at strategic points:

// Force garbage collection (requires --expose-gc flag)
if (global.gc) {
  global.gc();
}

// Or use process-based memory management
async function withMemoryManagement(callback) {
  const initialMemory = process.memoryUsage();

  try {
    await callback();
  } finally {
    // Force cleanup
    if (global.gc) {
      global.gc();
    }

    const finalMemory = process.memoryUsage();
    console.log('Memory delta:', {
      heapUsed: finalMemory.heapUsed - initialMemory.heapUsed,
      heapTotal: finalMemory.heapTotal - initialMemory.heapTotal
    });
  }
}

Advanced Memory Management Techniques

Browser Pool Management

For high-volume operations, implement a browser pool:

class BrowserPool {
  constructor(maxBrowsers = 3) {
    this.pool = [];
    this.maxBrowsers = maxBrowsers;
  }

  async getBrowser() {
    if (this.pool.length > 0) {
      return this.pool.pop();
    }

    if (this.pool.length < this.maxBrowsers) {
      return await chromium.launch();
    }

    // Wait for available browser
    return new Promise((resolve) => {
      const checkPool = setInterval(() => {
        if (this.pool.length > 0) {
          clearInterval(checkPool);
          resolve(this.pool.pop());
        }
      }, 100);
    });
  }

  async returnBrowser(browser) {
    // Close all contexts before returning to pool
    const contexts = browser.contexts();
    for (const context of contexts) {
      await context.close();
    }

    this.pool.push(browser);
  }

  async cleanup() {
    for (const browser of this.pool) {
      await browser.close();
    }
    this.pool = [];
  }
}

Memory Profiling and Debugging

Use built-in tools to profile memory usage:

# Run Node.js with memory profiling
node --inspect --expose-gc your-playwright-script.js

# Monitor memory usage with process tools
ps aux | grep node
top -p <process-id>

Configuration Best Practices

Launch Options for Memory Optimization

const browser = await chromium.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',
    '--disable-accelerated-2d-canvas',
    '--disable-gpu',
    '--disable-extensions',
    '--disable-plugins',
    '--disable-background-timer-throttling',
    '--disable-backgrounding-occluded-windows',
    '--disable-renderer-backgrounding',
    '--memory-pressure-off',
    '--max-old-space-size=4096' // Adjust based on your needs
  ]
});

Context Configuration

const context = await browser.newContext({
  viewport: { width: 1280, height: 720 },
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  // Disable unnecessary features
  javaScriptEnabled: true,
  acceptDownloads: false,
  permissions: [], // Minimal permissions
  colorScheme: 'light'
});

Monitoring and Alerting

Implement comprehensive monitoring for production environments:

class MemoryMonitor {
  constructor(thresholds = {}) {
    this.thresholds = {
      heapUsed: thresholds.heapUsed || 512 * 1024 * 1024, // 512MB
      heapTotal: thresholds.heapTotal || 1024 * 1024 * 1024, // 1GB
      ...thresholds
    };
  }

  checkMemory() {
    const usage = process.memoryUsage();
    const alerts = [];

    if (usage.heapUsed > this.thresholds.heapUsed) {
      alerts.push(`Heap usage exceeded: ${usage.heapUsed / 1024 / 1024}MB`);
    }

    if (usage.heapTotal > this.thresholds.heapTotal) {
      alerts.push(`Heap total exceeded: ${usage.heapTotal / 1024 / 1024}MB`);
    }

    return alerts;
  }

  startMonitoring(interval = 30000) {
    setInterval(() => {
      const alerts = this.checkMemory();
      if (alerts.length > 0) {
        console.warn('Memory alerts:', alerts);
        // Trigger cleanup or restart logic
      }
    }, interval);
  }
}

Common Memory Leak Patterns to Avoid

  1. Unclosed pages and contexts: Always close resources in finally blocks
  2. Event listener accumulation: Remove event listeners when done
  3. Large data retention: Process and discard large datasets promptly
  4. Infinite loops: Implement proper exit conditions
  5. Resource hoarding: Don't keep references to DOM elements longer than necessary

Best Practices Summary

  1. Always close resources: Use try-finally blocks or async context managers
  2. Implement resource recycling: Create new contexts periodically
  3. Monitor memory usage: Set up alerts and automatic restarts
  4. Optimize resource loading: Block unnecessary resources
  5. Use browser pools: Share browser instances efficiently
  6. Profile regularly: Monitor memory patterns in development

For more advanced browser automation patterns, consider exploring how to handle browser sessions in Puppeteer which shares similar session management concepts. Additionally, understanding how to run multiple pages in parallel with Puppeteer can help optimize resource usage across concurrent operations.

By implementing these memory management best practices, you can ensure your long-running Playwright sessions remain stable and efficient, preventing memory leaks and maintaining optimal performance throughout extended automation tasks.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon