Table of contents

How do I use the Puppeteer MCP server for browser automation?

The Puppeteer MCP (Model Context Protocol) server bridges the gap between AI assistants and browser automation by exposing Puppeteer's powerful capabilities through a conversational interface. This integration enables developers to control headless Chrome browsers, scrape dynamic content, automate web interactions, and extract data using natural language instructions instead of writing explicit code.

What is the Puppeteer MCP Server?

The Puppeteer MCP server implements the Model Context Protocol to make Puppeteer's browser automation framework accessible to AI assistants like Claude. Puppeteer is Google's official Node.js library for controlling headless Chrome or Chromium browsers, widely used for web scraping, automated testing, and browser automation tasks.

Unlike traditional Puppeteer scripts where you write JavaScript code to define every action, the MCP server allows AI models to interpret your intent and execute the appropriate browser commands. This makes browser automation more accessible and reduces the learning curve for complex scraping scenarios.

Key Capabilities

The Puppeteer MCP server provides comprehensive browser automation features:

  • Page navigation: Load URLs, handle redirects, and navigate browser history
  • Element interaction: Click buttons, fill forms, and interact with dynamic content
  • Data extraction: Scrape text, HTML, and structured data from web pages
  • Screenshot capture: Take full-page or viewport screenshots for visual verification
  • JavaScript execution: Run custom scripts in the page context for advanced data extraction
  • Network interception: Monitor and modify network requests and responses
  • PDF generation: Convert web pages to PDF documents
  • Performance monitoring: Track page load times and resource usage

Installation and Setup

Prerequisites

Before installing the Puppeteer MCP server, ensure you have:

  • Node.js: Version 16.x or higher
  • npm: Version 7.x or higher (comes with Node.js)
  • Operating System: Windows, macOS, or Linux

Installing the Puppeteer MCP Server

Install the Puppeteer MCP server using npm:

# Install globally for system-wide access
npm install -g @modelcontextprotocol/server-puppeteer

# Or install locally in your project
npm install @modelcontextprotocol/server-puppeteer

# Install Puppeteer (if not already installed)
npm install puppeteer

The Puppeteer package automatically downloads a compatible version of Chromium during installation. If you prefer to use an existing Chrome installation:

# Install puppeteer-core (without Chromium download)
npm install puppeteer-core

# Set the executable path in your configuration
export PUPPETEER_EXECUTABLE_PATH=/path/to/chrome

Configuring Claude Desktop

To enable the Puppeteer MCP server in Claude Desktop, you need to modify the configuration file. The location varies by operating system:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

Add the Puppeteer MCP server configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-puppeteer"
      ],
      "env": {
        "PUPPETEER_HEADLESS": "true"
      }
    }
  }
}

For a global installation, use this alternative configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "mcp-server-puppeteer",
      "args": [],
      "env": {
        "PUPPETEER_HEADLESS": "true",
        "PUPPETEER_TIMEOUT": "30000"
      }
    }
  }
}

After updating the configuration file, restart Claude Desktop to activate the MCP server.

Core MCP Tools for Browser Automation

The Puppeteer MCP server exposes a comprehensive set of tools for browser control and web scraping:

Navigation Tools

  • puppeteer_navigate: Navigate to a URL and wait for the page to load
  • puppeteer_goto: Go to a URL with advanced options (timeout, waitUntil conditions)
  • puppeteer_go_back: Navigate to the previous page in browser history
  • puppeteer_go_forward: Navigate to the next page in browser history
  • puppeteer_reload: Reload the current page

Content Extraction Tools

  • puppeteer_content: Extract the HTML content of the current page
  • puppeteer_text: Get the text content of the page or specific elements
  • puppeteer_evaluate: Execute JavaScript in the page context and return results
  • puppeteer_screenshot: Capture screenshots of the page or specific elements
  • puppeteer_pdf: Generate a PDF from the current page

Interaction Tools

  • puppeteer_click: Click on elements using selectors
  • puppeteer_type: Type text into input fields
  • puppeteer_select: Select options from dropdown menus
  • puppeteer_hover: Hover over elements to trigger tooltips or menus
  • puppeteer_focus: Set focus on specific elements

Wait and Timing Tools

  • puppeteer_wait_for_selector: Wait for an element to appear in the DOM
  • puppeteer_wait_for_navigation: Wait for page navigation to complete
  • puppeteer_wait_for_timeout: Pause execution for a specified duration
  • puppeteer_wait_for_function: Wait for a custom JavaScript condition to be true

Advanced Tools

  • puppeteer_set_viewport: Configure browser viewport size and device emulation
  • puppeteer_set_user_agent: Set custom user agent strings
  • puppeteer_set_cookie: Add cookies to the browser session
  • puppeteer_get_cookies: Retrieve cookies from the current session
  • puppeteer_intercept_requests: Monitor and modify network requests

Practical Browser Automation Examples

Example 1: Basic Web Scraping

Extract product information from an e-commerce site using natural language commands.

Natural language instruction to Claude: Use the Puppeteer MCP server to navigate to example-store.com/products, wait for the product grid to load, then extract the name, price, and rating for each product.

What happens behind the scenes:

  1. Claude calls puppeteer_navigate to load the target URL
  2. Uses puppeteer_wait_for_selector to ensure products are loaded
  3. Executes puppeteer_evaluate to extract structured data:
// JavaScript executed in the page context
() => {
  const products = Array.from(document.querySelectorAll('.product-card'));
  return products.map(product => ({
    name: product.querySelector('.product-name')?.textContent?.trim(),
    price: product.querySelector('.price')?.textContent?.trim(),
    rating: product.querySelector('.rating')?.getAttribute('data-rating')
  }));
}

Example 2: Handling Dynamic Content

When working with single-page applications or AJAX-loaded content (similar to handling AJAX requests using Puppeteer), you need to wait for content to load dynamically.

Instruction: Navigate to dashboard.example.com, wait for the analytics chart to fully render, then extract the data points displayed in the chart.

Workflow: - Navigate using puppeteer_navigate - Wait for the chart element: puppeteer_wait_for_selector with selector .chart-container - Wait for network requests to complete - Extract data from the rendered chart using puppeteer_evaluate

JavaScript for data extraction:

() => {
  const chartData = window.chartInstance?.data;
  if (!chartData) return null;

  return {
    labels: chartData.labels,
    datasets: chartData.datasets.map(ds => ({
      label: ds.label,
      data: ds.data
    }))
  };
}

Example 3: Form Submission and Search

Automate form filling and search operations to extract results.

Instruction: Go to jobs.example.com, search for "Senior Developer" positions in "New York", and extract the first 20 job listings with title, company, and salary information.

Step-by-step workflow:

  1. Navigate to the job search page
  2. Type in the search query using puppeteer_type
  3. Select location from dropdown using puppeteer_select
  4. Click search button with puppeteer_click
  5. Wait for results using puppeteer_wait_for_selector
  6. Extract job data with puppeteer_evaluate

Equivalent Puppeteer code:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  // Navigate to job search page
  await page.goto('https://jobs.example.com', { waitUntil: 'networkidle0' });

  // Fill search form
  await page.type('#job-search-input', 'Senior Developer');
  await page.select('#location-select', 'New York');
  await page.click('#search-button');

  // Wait for results
  await page.waitForSelector('.job-listing', { timeout: 5000 });

  // Extract job data
  const jobs = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.job-listing')).slice(0, 20).map(job => ({
      title: job.querySelector('.job-title')?.textContent?.trim(),
      company: job.querySelector('.company-name')?.textContent?.trim(),
      salary: job.querySelector('.salary')?.textContent?.trim()
    }));
  });

  console.log(jobs);
  await browser.close();
})();

Example 4: Multi-Page Scraping with Pagination

Extract data across multiple pages by handling pagination (similar to techniques used when navigating to different pages using Puppeteer).

Instruction: Navigate through the first 10 pages of blog.example.com/articles, extracting the title, author, and publish date from each article on every page.

Workflow: 1. Navigate to first page 2. Extract articles from current page 3. Check for "Next" button or pagination links 4. Click next page or construct next URL 5. Repeat until 10 pages are processed 6. Aggregate all results

Example pagination logic:

// JavaScript to handle pagination
async function scrapeAllPages(maxPages = 10) {
  const allArticles = [];
  let currentPage = 1;

  while (currentPage <= maxPages) {
    // Extract current page articles
    const articles = await page.evaluate(() => {
      return Array.from(document.querySelectorAll('.article')).map(article => ({
        title: article.querySelector('h2')?.textContent?.trim(),
        author: article.querySelector('.author')?.textContent?.trim(),
        date: article.querySelector('.publish-date')?.textContent?.trim()
      }));
    });

    allArticles.push(...articles);

    // Check for next page
    const hasNextPage = await page.$('.pagination .next');
    if (!hasNextPage) break;

    // Navigate to next page
    await Promise.all([
      page.click('.pagination .next'),
      page.waitForNavigation({ waitUntil: 'networkidle0' })
    ]);

    currentPage++;
  }

  return allArticles;
}

Example 5: Screenshot and Visual Verification

Capture screenshots for debugging or visual verification of page state.

Instruction: Navigate to pricing.example.com, take a full-page screenshot showing all pricing tiers, and save it as pricing-page.png.

Using the MCP server: - Navigate with puppeteer_navigate - Call puppeteer_screenshot with full-page option - Specify output path and image format

Equivalent code:

const page = await browser.newPage();
await page.goto('https://pricing.example.com');

// Take full-page screenshot
await page.screenshot({
  path: 'pricing-page.png',
  fullPage: true,
  type: 'png'
});

// Or screenshot a specific element
await page.screenshot({
  path: 'pricing-tier.png',
  clip: await page.$eval('.pricing-table', el => {
    const {x, y, width, height} = el.getBoundingClientRect();
    return {x, y, width, height};
  })
});

Advanced Automation Techniques

JavaScript Injection and Custom Extraction

Execute sophisticated data extraction logic directly in the page context using the puppeteer_evaluate tool. This approach (similar to injecting JavaScript into a page using Puppeteer) allows you to leverage the full power of the browser's JavaScript environment.

Instruction example: Execute custom JavaScript to extract all product data including nested reviews and variant information from the page.

Advanced extraction script:

() => {
  function extractProduct(productEl) {
    // Extract main product data
    const product = {
      id: productEl.getAttribute('data-product-id'),
      name: productEl.querySelector('.product-name')?.textContent?.trim(),
      price: parseFloat(productEl.querySelector('.price')?.textContent?.replace(/[^0-9.]/g, '')),
      images: Array.from(productEl.querySelectorAll('.product-image')).map(img => img.src),
      variants: []
    };

    // Extract variants
    const variantEls = productEl.querySelectorAll('.variant-option');
    product.variants = Array.from(variantEls).map(variant => ({
      size: variant.getAttribute('data-size'),
      color: variant.getAttribute('data-color'),
      available: variant.classList.contains('in-stock')
    }));

    // Extract reviews
    const reviewEls = productEl.querySelectorAll('.review');
    product.reviews = Array.from(reviewEls).map(review => ({
      rating: parseInt(review.querySelector('.star-rating')?.getAttribute('data-rating')),
      text: review.querySelector('.review-text')?.textContent?.trim(),
      author: review.querySelector('.review-author')?.textContent?.trim(),
      date: review.querySelector('.review-date')?.textContent?.trim()
    }));

    return product;
  }

  // Extract all products
  const products = Array.from(document.querySelectorAll('.product-card'));
  return products.map(extractProduct);
}

Network Request Monitoring

Monitor API calls and network activity to understand how data is loaded (useful for identifying backend APIs for direct scraping).

Instruction: Navigate to app.example.com, monitor all XHR requests made during page load, and extract the API endpoints and response data.

Puppeteer implementation:

const page = await browser.newPage();

// Enable request interception
await page.setRequestInterception(true);

const apiCalls = [];

page.on('request', request => {
  if (request.resourceType() === 'xhr' || request.resourceType() === 'fetch') {
    apiCalls.push({
      url: request.url(),
      method: request.method(),
      headers: request.headers(),
      postData: request.postData()
    });
  }
  request.continue();
});

page.on('response', async response => {
  if (response.request().resourceType() === 'xhr' || response.request().resourceType() === 'fetch') {
    try {
      const data = await response.json();
      console.log('API Response:', response.url(), data);
    } catch (e) {
      // Not JSON response
    }
  }
});

await page.goto('https://app.example.com');
console.log('API Calls:', apiCalls);

Handling Authentication

Automate login flows and maintain authenticated sessions (following patterns from handling authentication in Puppeteer).

Instruction: Log in to account.example.com using the provided credentials, then navigate to the user dashboard and extract account information.

Authentication workflow:

const page = await browser.newPage();

// Navigate to login page
await page.goto('https://account.example.com/login');

// Fill login form
await page.type('#email', 'user@example.com');
await page.type('#password', 'securepassword');

// Click login button and wait for navigation
await Promise.all([
  page.click('#login-button'),
  page.waitForNavigation({ waitUntil: 'networkidle0' })
]);

// Verify login success
const isLoggedIn = await page.evaluate(() => {
  return document.querySelector('.user-profile') !== null;
});

if (isLoggedIn) {
  // Navigate to dashboard
  await page.goto('https://account.example.com/dashboard');

  // Extract account data
  const accountData = await page.evaluate(() => ({
    name: document.querySelector('.user-name')?.textContent,
    email: document.querySelector('.user-email')?.textContent,
    memberSince: document.querySelector('.member-since')?.textContent
  }));

  console.log(accountData);
}

Cookie and Session Management

Save and restore browser sessions for authenticated scraping:

// Save cookies after login
const cookies = await page.cookies();
fs.writeFileSync('cookies.json', JSON.stringify(cookies, null, 2));

// Restore cookies in a new session
const savedCookies = JSON.parse(fs.readFileSync('cookies.json', 'utf8'));
await page.setCookie(...savedCookies);
await page.goto('https://account.example.com/dashboard');

Viewport and Device Emulation

Configure browser viewport to emulate different devices:

Instruction: Emulate an iPhone 12 and navigate to mobile.example.com to test the mobile version of the site.

Code implementation:

const iPhone = puppeteer.devices['iPhone 12'];
await page.emulate(iPhone);

// Or set custom viewport
await page.setViewport({
  width: 390,
  height: 844,
  deviceScaleFactor: 3,
  isMobile: true,
  hasTouch: true
});

await page.goto('https://mobile.example.com');

Best Practices for Puppeteer MCP Automation

1. Wait Strategically

Always use appropriate wait strategies (understanding how to use the 'waitFor' function in Puppeteer is essential):

// ❌ Avoid fixed timeouts
await page.waitForTimeout(5000);

// ✓ Wait for specific conditions
await page.waitForSelector('.content-loaded');
await page.waitForFunction('document.readyState === "complete"');
await page.waitForNavigation({ waitUntil: 'networkidle0' });

2. Handle Errors Gracefully

Implement proper error handling for robust automation:

try {
  await page.goto('https://example.com', { timeout: 30000 });
} catch (error) {
  if (error.name === 'TimeoutError') {
    console.error('Page load timeout');
    // Implement retry logic
  } else {
    throw error;
  }
}

// Check element existence before interaction
const buttonExists = await page.$('.submit-button') !== null;
if (buttonExists) {
  await page.click('.submit-button');
}

3. Optimize Performance

Use headless mode and disable unnecessary features:

const browser = await puppeteer.launch({
  headless: true,
  args: [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',
    '--disable-accelerated-2d-canvas',
    '--disable-gpu'
  ]
});

// Block unnecessary resources
await page.setRequestInterception(true);
page.on('request', (req) => {
  if (['image', 'stylesheet', 'font'].includes(req.resourceType())) {
    req.abort();
  } else {
    req.continue();
  }
});

4. Respect Website Policies

Implement rate limiting and respectful scraping:

// Add delays between requests
await page.waitForTimeout(2000); // 2 second delay

// Set realistic user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');

// Respect robots.txt
// Check robots.txt before scraping

5. Clean Up Resources

Always close browsers and pages to prevent memory leaks:

try {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Perform scraping tasks
  await page.goto('https://example.com');

  // Extract data...

} catch (error) {
  console.error('Scraping error:', error);
} finally {
  // Always close browser
  await browser.close();
}

6. Use Specific Selectors

Provide clear, specific element descriptions to Claude:

  • ❌ "Click the button"
  • ✓ "Click the 'Add to Cart' button with class 'btn-primary'"
  • ✓ "Click the submit button in the checkout form"

7. Handle Dynamic Content

For single-page applications, wait for content to render:

// Wait for specific content to appear
await page.waitForFunction(
  () => document.querySelectorAll('.product-card').length > 0
);

// Wait for API responses
await page.waitForResponse(
  response => response.url().includes('/api/products') && response.status() === 200
);

Troubleshooting Common Issues

Chromium Download Failures

If Puppeteer fails to download Chromium during installation:

# Skip Chromium download and use system Chrome
npm install puppeteer-core

# Or set custom download host
PUPPETEER_DOWNLOAD_HOST=https://npm.taobao.org/mirrors npm install puppeteer

# Use existing Chrome installation
export PUPPETEER_EXECUTABLE_PATH=/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome

MCP Server Connection Issues

  1. Verify Node.js is accessible from Claude Desktop
  2. Check configuration file syntax (valid JSON)
  3. Ensure Puppeteer is installed in the correct location
  4. Review Claude Desktop logs for error messages
  5. Restart Claude Desktop after configuration changes

Memory and Performance Issues

// Limit concurrent pages
const maxConcurrent = 5;
const pages = [];

for (let i = 0; i < maxConcurrent; i++) {
  pages.push(await browser.newPage());
}

// Close unused pages
await page.close();

// Disable unnecessary features
await page.setJavaScriptEnabled(false); // If JS not needed

Element Not Found Errors

When elements aren't found:

  1. Verify the page has fully loaded
  2. Check for iframes containing the element
  3. Ensure element is visible (not hidden by CSS)
  4. Use more specific selectors
  5. Check for dynamic class names or IDs
// Handle iframes
const frames = page.frames();
for (const frame of frames) {
  const element = await frame.$('.target-element');
  if (element) {
    // Found in this frame
    await frame.click('.target-element');
    break;
  }
}

Integration with Production Systems

Transitioning to WebScraping.AI

While the Puppeteer MCP server is excellent for prototyping and development, production scraping requires robust infrastructure. Use the MCP server to:

  1. Explore website structure: Understand page layout and data flow
  2. Test selectors: Identify the correct CSS selectors or XPath expressions
  3. Prototype workflows: Develop and test scraping logic interactively
  4. Debug issues: Investigate why scraping fails on specific pages

Then transition to WebScraping.AI API for production deployments with:

  • Managed infrastructure: No need to maintain browser instances
  • Proxy rotation: Automatic IP rotation to avoid blocking
  • CAPTCHA solving: Built-in CAPTCHA detection and solving
  • JavaScript rendering: Full support for dynamic content
  • Scalability: Handle thousands of concurrent requests
  • Reliability: 99.9% uptime SLA with automatic retries

Example Migration

Development with Puppeteer MCP: Use Puppeteer to navigate to products.example.com and extract all product details including prices and descriptions.

Production with WebScraping.AI API:

import requests

api_key = "YOUR_API_KEY"
url = "https://api.webscraping.ai/html"

params = {
    "url": "https://products.example.com",
    "js": True,
    "js_timeout": 5000
}

headers = {
    "API-KEY": api_key
}

response = requests.get(url, params=params, headers=headers)
html_content = response.text

# Parse HTML with BeautifulSoup or similar
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

products = []
for product in soup.select('.product-card'):
    products.append({
        'name': product.select_one('.product-name').text.strip(),
        'price': product.select_one('.price').text.strip(),
        'description': product.select_one('.description').text.strip()
    })

Conclusion

The Puppeteer MCP server revolutionizes browser automation by making it accessible through natural language instructions. By combining Puppeteer's battle-tested browser control capabilities with AI-powered understanding, you can scrape complex websites, automate repetitive tasks, and extract data without writing extensive code.

Whether you're building a proof-of-concept scraper, debugging website interactions, or exploring API endpoints through network monitoring, the Puppeteer MCP server provides an intuitive interface for browser automation. The conversational approach reduces development time and makes complex scraping scenarios more manageable.

For development and prototyping, the Puppeteer MCP server offers unmatched flexibility and ease of use. When ready to scale to production workloads requiring reliability, anti-blocking measures, and guaranteed performance, consider professional solutions like the WebScraping.AI API.

Start by installing the Puppeteer MCP server, configuring it with Claude Desktop, and experimenting with browser automation through simple natural language commands. The combination of AI assistance and Puppeteer's powerful automation capabilities opens new possibilities for efficient web scraping and data extraction.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon