How do I use the Puppeteer MCP server for browser automation?
The Puppeteer MCP (Model Context Protocol) server bridges the gap between AI assistants and browser automation by exposing Puppeteer's powerful capabilities through a conversational interface. This integration enables developers to control headless Chrome browsers, scrape dynamic content, automate web interactions, and extract data using natural language instructions instead of writing explicit code.
What is the Puppeteer MCP Server?
The Puppeteer MCP server implements the Model Context Protocol to make Puppeteer's browser automation framework accessible to AI assistants like Claude. Puppeteer is Google's official Node.js library for controlling headless Chrome or Chromium browsers, widely used for web scraping, automated testing, and browser automation tasks.
Unlike traditional Puppeteer scripts where you write JavaScript code to define every action, the MCP server allows AI models to interpret your intent and execute the appropriate browser commands. This makes browser automation more accessible and reduces the learning curve for complex scraping scenarios.
Key Capabilities
The Puppeteer MCP server provides comprehensive browser automation features:
- Page navigation: Load URLs, handle redirects, and navigate browser history
- Element interaction: Click buttons, fill forms, and interact with dynamic content
- Data extraction: Scrape text, HTML, and structured data from web pages
- Screenshot capture: Take full-page or viewport screenshots for visual verification
- JavaScript execution: Run custom scripts in the page context for advanced data extraction
- Network interception: Monitor and modify network requests and responses
- PDF generation: Convert web pages to PDF documents
- Performance monitoring: Track page load times and resource usage
Installation and Setup
Prerequisites
Before installing the Puppeteer MCP server, ensure you have:
- Node.js: Version 16.x or higher
- npm: Version 7.x or higher (comes with Node.js)
- Operating System: Windows, macOS, or Linux
Installing the Puppeteer MCP Server
Install the Puppeteer MCP server using npm:
# Install globally for system-wide access
npm install -g @modelcontextprotocol/server-puppeteer
# Or install locally in your project
npm install @modelcontextprotocol/server-puppeteer
# Install Puppeteer (if not already installed)
npm install puppeteer
The Puppeteer package automatically downloads a compatible version of Chromium during installation. If you prefer to use an existing Chrome installation:
# Install puppeteer-core (without Chromium download)
npm install puppeteer-core
# Set the executable path in your configuration
export PUPPETEER_EXECUTABLE_PATH=/path/to/chrome
Configuring Claude Desktop
To enable the Puppeteer MCP server in Claude Desktop, you need to modify the configuration file. The location varies by operating system:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
- Linux:
~/.config/Claude/claude_desktop_config.json
Add the Puppeteer MCP server configuration:
{
"mcpServers": {
"puppeteer": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-puppeteer"
],
"env": {
"PUPPETEER_HEADLESS": "true"
}
}
}
}
For a global installation, use this alternative configuration:
{
"mcpServers": {
"puppeteer": {
"command": "mcp-server-puppeteer",
"args": [],
"env": {
"PUPPETEER_HEADLESS": "true",
"PUPPETEER_TIMEOUT": "30000"
}
}
}
}
After updating the configuration file, restart Claude Desktop to activate the MCP server.
Core MCP Tools for Browser Automation
The Puppeteer MCP server exposes a comprehensive set of tools for browser control and web scraping:
Navigation Tools
puppeteer_navigate
: Navigate to a URL and wait for the page to loadpuppeteer_goto
: Go to a URL with advanced options (timeout, waitUntil conditions)puppeteer_go_back
: Navigate to the previous page in browser historypuppeteer_go_forward
: Navigate to the next page in browser historypuppeteer_reload
: Reload the current page
Content Extraction Tools
puppeteer_content
: Extract the HTML content of the current pagepuppeteer_text
: Get the text content of the page or specific elementspuppeteer_evaluate
: Execute JavaScript in the page context and return resultspuppeteer_screenshot
: Capture screenshots of the page or specific elementspuppeteer_pdf
: Generate a PDF from the current page
Interaction Tools
puppeteer_click
: Click on elements using selectorspuppeteer_type
: Type text into input fieldspuppeteer_select
: Select options from dropdown menuspuppeteer_hover
: Hover over elements to trigger tooltips or menuspuppeteer_focus
: Set focus on specific elements
Wait and Timing Tools
puppeteer_wait_for_selector
: Wait for an element to appear in the DOMpuppeteer_wait_for_navigation
: Wait for page navigation to completepuppeteer_wait_for_timeout
: Pause execution for a specified durationpuppeteer_wait_for_function
: Wait for a custom JavaScript condition to be true
Advanced Tools
puppeteer_set_viewport
: Configure browser viewport size and device emulationpuppeteer_set_user_agent
: Set custom user agent stringspuppeteer_set_cookie
: Add cookies to the browser sessionpuppeteer_get_cookies
: Retrieve cookies from the current sessionpuppeteer_intercept_requests
: Monitor and modify network requests
Practical Browser Automation Examples
Example 1: Basic Web Scraping
Extract product information from an e-commerce site using natural language commands.
Natural language instruction to Claude:
Use the Puppeteer MCP server to navigate to example-store.com/products, wait for the product grid to load, then extract the name, price, and rating for each product.
What happens behind the scenes:
- Claude calls
puppeteer_navigate
to load the target URL - Uses
puppeteer_wait_for_selector
to ensure products are loaded - Executes
puppeteer_evaluate
to extract structured data:
// JavaScript executed in the page context
() => {
const products = Array.from(document.querySelectorAll('.product-card'));
return products.map(product => ({
name: product.querySelector('.product-name')?.textContent?.trim(),
price: product.querySelector('.price')?.textContent?.trim(),
rating: product.querySelector('.rating')?.getAttribute('data-rating')
}));
}
Example 2: Handling Dynamic Content
When working with single-page applications or AJAX-loaded content (similar to handling AJAX requests using Puppeteer), you need to wait for content to load dynamically.
Instruction:
Navigate to dashboard.example.com, wait for the analytics chart to fully render, then extract the data points displayed in the chart.
Workflow:
- Navigate using puppeteer_navigate
- Wait for the chart element: puppeteer_wait_for_selector
with selector .chart-container
- Wait for network requests to complete
- Extract data from the rendered chart using puppeteer_evaluate
JavaScript for data extraction:
() => {
const chartData = window.chartInstance?.data;
if (!chartData) return null;
return {
labels: chartData.labels,
datasets: chartData.datasets.map(ds => ({
label: ds.label,
data: ds.data
}))
};
}
Example 3: Form Submission and Search
Automate form filling and search operations to extract results.
Instruction:
Go to jobs.example.com, search for "Senior Developer" positions in "New York", and extract the first 20 job listings with title, company, and salary information.
Step-by-step workflow:
- Navigate to the job search page
- Type in the search query using
puppeteer_type
- Select location from dropdown using
puppeteer_select
- Click search button with
puppeteer_click
- Wait for results using
puppeteer_wait_for_selector
- Extract job data with
puppeteer_evaluate
Equivalent Puppeteer code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
// Navigate to job search page
await page.goto('https://jobs.example.com', { waitUntil: 'networkidle0' });
// Fill search form
await page.type('#job-search-input', 'Senior Developer');
await page.select('#location-select', 'New York');
await page.click('#search-button');
// Wait for results
await page.waitForSelector('.job-listing', { timeout: 5000 });
// Extract job data
const jobs = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.job-listing')).slice(0, 20).map(job => ({
title: job.querySelector('.job-title')?.textContent?.trim(),
company: job.querySelector('.company-name')?.textContent?.trim(),
salary: job.querySelector('.salary')?.textContent?.trim()
}));
});
console.log(jobs);
await browser.close();
})();
Example 4: Multi-Page Scraping with Pagination
Extract data across multiple pages by handling pagination (similar to techniques used when navigating to different pages using Puppeteer).
Instruction:
Navigate through the first 10 pages of blog.example.com/articles, extracting the title, author, and publish date from each article on every page.
Workflow: 1. Navigate to first page 2. Extract articles from current page 3. Check for "Next" button or pagination links 4. Click next page or construct next URL 5. Repeat until 10 pages are processed 6. Aggregate all results
Example pagination logic:
// JavaScript to handle pagination
async function scrapeAllPages(maxPages = 10) {
const allArticles = [];
let currentPage = 1;
while (currentPage <= maxPages) {
// Extract current page articles
const articles = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.article')).map(article => ({
title: article.querySelector('h2')?.textContent?.trim(),
author: article.querySelector('.author')?.textContent?.trim(),
date: article.querySelector('.publish-date')?.textContent?.trim()
}));
});
allArticles.push(...articles);
// Check for next page
const hasNextPage = await page.$('.pagination .next');
if (!hasNextPage) break;
// Navigate to next page
await Promise.all([
page.click('.pagination .next'),
page.waitForNavigation({ waitUntil: 'networkidle0' })
]);
currentPage++;
}
return allArticles;
}
Example 5: Screenshot and Visual Verification
Capture screenshots for debugging or visual verification of page state.
Instruction:
Navigate to pricing.example.com, take a full-page screenshot showing all pricing tiers, and save it as pricing-page.png.
Using the MCP server:
- Navigate with puppeteer_navigate
- Call puppeteer_screenshot
with full-page option
- Specify output path and image format
Equivalent code:
const page = await browser.newPage();
await page.goto('https://pricing.example.com');
// Take full-page screenshot
await page.screenshot({
path: 'pricing-page.png',
fullPage: true,
type: 'png'
});
// Or screenshot a specific element
await page.screenshot({
path: 'pricing-tier.png',
clip: await page.$eval('.pricing-table', el => {
const {x, y, width, height} = el.getBoundingClientRect();
return {x, y, width, height};
})
});
Advanced Automation Techniques
JavaScript Injection and Custom Extraction
Execute sophisticated data extraction logic directly in the page context using the puppeteer_evaluate
tool. This approach (similar to injecting JavaScript into a page using Puppeteer) allows you to leverage the full power of the browser's JavaScript environment.
Instruction example:
Execute custom JavaScript to extract all product data including nested reviews and variant information from the page.
Advanced extraction script:
() => {
function extractProduct(productEl) {
// Extract main product data
const product = {
id: productEl.getAttribute('data-product-id'),
name: productEl.querySelector('.product-name')?.textContent?.trim(),
price: parseFloat(productEl.querySelector('.price')?.textContent?.replace(/[^0-9.]/g, '')),
images: Array.from(productEl.querySelectorAll('.product-image')).map(img => img.src),
variants: []
};
// Extract variants
const variantEls = productEl.querySelectorAll('.variant-option');
product.variants = Array.from(variantEls).map(variant => ({
size: variant.getAttribute('data-size'),
color: variant.getAttribute('data-color'),
available: variant.classList.contains('in-stock')
}));
// Extract reviews
const reviewEls = productEl.querySelectorAll('.review');
product.reviews = Array.from(reviewEls).map(review => ({
rating: parseInt(review.querySelector('.star-rating')?.getAttribute('data-rating')),
text: review.querySelector('.review-text')?.textContent?.trim(),
author: review.querySelector('.review-author')?.textContent?.trim(),
date: review.querySelector('.review-date')?.textContent?.trim()
}));
return product;
}
// Extract all products
const products = Array.from(document.querySelectorAll('.product-card'));
return products.map(extractProduct);
}
Network Request Monitoring
Monitor API calls and network activity to understand how data is loaded (useful for identifying backend APIs for direct scraping).
Instruction:
Navigate to app.example.com, monitor all XHR requests made during page load, and extract the API endpoints and response data.
Puppeteer implementation:
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
const apiCalls = [];
page.on('request', request => {
if (request.resourceType() === 'xhr' || request.resourceType() === 'fetch') {
apiCalls.push({
url: request.url(),
method: request.method(),
headers: request.headers(),
postData: request.postData()
});
}
request.continue();
});
page.on('response', async response => {
if (response.request().resourceType() === 'xhr' || response.request().resourceType() === 'fetch') {
try {
const data = await response.json();
console.log('API Response:', response.url(), data);
} catch (e) {
// Not JSON response
}
}
});
await page.goto('https://app.example.com');
console.log('API Calls:', apiCalls);
Handling Authentication
Automate login flows and maintain authenticated sessions (following patterns from handling authentication in Puppeteer).
Instruction:
Log in to account.example.com using the provided credentials, then navigate to the user dashboard and extract account information.
Authentication workflow:
const page = await browser.newPage();
// Navigate to login page
await page.goto('https://account.example.com/login');
// Fill login form
await page.type('#email', 'user@example.com');
await page.type('#password', 'securepassword');
// Click login button and wait for navigation
await Promise.all([
page.click('#login-button'),
page.waitForNavigation({ waitUntil: 'networkidle0' })
]);
// Verify login success
const isLoggedIn = await page.evaluate(() => {
return document.querySelector('.user-profile') !== null;
});
if (isLoggedIn) {
// Navigate to dashboard
await page.goto('https://account.example.com/dashboard');
// Extract account data
const accountData = await page.evaluate(() => ({
name: document.querySelector('.user-name')?.textContent,
email: document.querySelector('.user-email')?.textContent,
memberSince: document.querySelector('.member-since')?.textContent
}));
console.log(accountData);
}
Cookie and Session Management
Save and restore browser sessions for authenticated scraping:
// Save cookies after login
const cookies = await page.cookies();
fs.writeFileSync('cookies.json', JSON.stringify(cookies, null, 2));
// Restore cookies in a new session
const savedCookies = JSON.parse(fs.readFileSync('cookies.json', 'utf8'));
await page.setCookie(...savedCookies);
await page.goto('https://account.example.com/dashboard');
Viewport and Device Emulation
Configure browser viewport to emulate different devices:
Instruction:
Emulate an iPhone 12 and navigate to mobile.example.com to test the mobile version of the site.
Code implementation:
const iPhone = puppeteer.devices['iPhone 12'];
await page.emulate(iPhone);
// Or set custom viewport
await page.setViewport({
width: 390,
height: 844,
deviceScaleFactor: 3,
isMobile: true,
hasTouch: true
});
await page.goto('https://mobile.example.com');
Best Practices for Puppeteer MCP Automation
1. Wait Strategically
Always use appropriate wait strategies (understanding how to use the 'waitFor' function in Puppeteer is essential):
// ❌ Avoid fixed timeouts
await page.waitForTimeout(5000);
// ✓ Wait for specific conditions
await page.waitForSelector('.content-loaded');
await page.waitForFunction('document.readyState === "complete"');
await page.waitForNavigation({ waitUntil: 'networkidle0' });
2. Handle Errors Gracefully
Implement proper error handling for robust automation:
try {
await page.goto('https://example.com', { timeout: 30000 });
} catch (error) {
if (error.name === 'TimeoutError') {
console.error('Page load timeout');
// Implement retry logic
} else {
throw error;
}
}
// Check element existence before interaction
const buttonExists = await page.$('.submit-button') !== null;
if (buttonExists) {
await page.click('.submit-button');
}
3. Optimize Performance
Use headless mode and disable unnecessary features:
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-accelerated-2d-canvas',
'--disable-gpu'
]
});
// Block unnecessary resources
await page.setRequestInterception(true);
page.on('request', (req) => {
if (['image', 'stylesheet', 'font'].includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
4. Respect Website Policies
Implement rate limiting and respectful scraping:
// Add delays between requests
await page.waitForTimeout(2000); // 2 second delay
// Set realistic user agent
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
// Respect robots.txt
// Check robots.txt before scraping
5. Clean Up Resources
Always close browsers and pages to prevent memory leaks:
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Perform scraping tasks
await page.goto('https://example.com');
// Extract data...
} catch (error) {
console.error('Scraping error:', error);
} finally {
// Always close browser
await browser.close();
}
6. Use Specific Selectors
Provide clear, specific element descriptions to Claude:
- ❌ "Click the button"
- ✓ "Click the 'Add to Cart' button with class 'btn-primary'"
- ✓ "Click the submit button in the checkout form"
7. Handle Dynamic Content
For single-page applications, wait for content to render:
// Wait for specific content to appear
await page.waitForFunction(
() => document.querySelectorAll('.product-card').length > 0
);
// Wait for API responses
await page.waitForResponse(
response => response.url().includes('/api/products') && response.status() === 200
);
Troubleshooting Common Issues
Chromium Download Failures
If Puppeteer fails to download Chromium during installation:
# Skip Chromium download and use system Chrome
npm install puppeteer-core
# Or set custom download host
PUPPETEER_DOWNLOAD_HOST=https://npm.taobao.org/mirrors npm install puppeteer
# Use existing Chrome installation
export PUPPETEER_EXECUTABLE_PATH=/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
MCP Server Connection Issues
- Verify Node.js is accessible from Claude Desktop
- Check configuration file syntax (valid JSON)
- Ensure Puppeteer is installed in the correct location
- Review Claude Desktop logs for error messages
- Restart Claude Desktop after configuration changes
Memory and Performance Issues
// Limit concurrent pages
const maxConcurrent = 5;
const pages = [];
for (let i = 0; i < maxConcurrent; i++) {
pages.push(await browser.newPage());
}
// Close unused pages
await page.close();
// Disable unnecessary features
await page.setJavaScriptEnabled(false); // If JS not needed
Element Not Found Errors
When elements aren't found:
- Verify the page has fully loaded
- Check for iframes containing the element
- Ensure element is visible (not hidden by CSS)
- Use more specific selectors
- Check for dynamic class names or IDs
// Handle iframes
const frames = page.frames();
for (const frame of frames) {
const element = await frame.$('.target-element');
if (element) {
// Found in this frame
await frame.click('.target-element');
break;
}
}
Integration with Production Systems
Transitioning to WebScraping.AI
While the Puppeteer MCP server is excellent for prototyping and development, production scraping requires robust infrastructure. Use the MCP server to:
- Explore website structure: Understand page layout and data flow
- Test selectors: Identify the correct CSS selectors or XPath expressions
- Prototype workflows: Develop and test scraping logic interactively
- Debug issues: Investigate why scraping fails on specific pages
Then transition to WebScraping.AI API for production deployments with:
- Managed infrastructure: No need to maintain browser instances
- Proxy rotation: Automatic IP rotation to avoid blocking
- CAPTCHA solving: Built-in CAPTCHA detection and solving
- JavaScript rendering: Full support for dynamic content
- Scalability: Handle thousands of concurrent requests
- Reliability: 99.9% uptime SLA with automatic retries
Example Migration
Development with Puppeteer MCP:
Use Puppeteer to navigate to products.example.com and extract all product details including prices and descriptions.
Production with WebScraping.AI API:
import requests
api_key = "YOUR_API_KEY"
url = "https://api.webscraping.ai/html"
params = {
"url": "https://products.example.com",
"js": True,
"js_timeout": 5000
}
headers = {
"API-KEY": api_key
}
response = requests.get(url, params=params, headers=headers)
html_content = response.text
# Parse HTML with BeautifulSoup or similar
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
products = []
for product in soup.select('.product-card'):
products.append({
'name': product.select_one('.product-name').text.strip(),
'price': product.select_one('.price').text.strip(),
'description': product.select_one('.description').text.strip()
})
Conclusion
The Puppeteer MCP server revolutionizes browser automation by making it accessible through natural language instructions. By combining Puppeteer's battle-tested browser control capabilities with AI-powered understanding, you can scrape complex websites, automate repetitive tasks, and extract data without writing extensive code.
Whether you're building a proof-of-concept scraper, debugging website interactions, or exploring API endpoints through network monitoring, the Puppeteer MCP server provides an intuitive interface for browser automation. The conversational approach reduces development time and makes complex scraping scenarios more manageable.
For development and prototyping, the Puppeteer MCP server offers unmatched flexibility and ease of use. When ready to scale to production workloads requiring reliability, anti-blocking measures, and guaranteed performance, consider professional solutions like the WebScraping.AI API.
Start by installing the Puppeteer MCP server, configuring it with Claude Desktop, and experimenting with browser automation through simple natural language commands. The combination of AI assistance and Puppeteer's powerful automation capabilities opens new possibilities for efficient web scraping and data extraction.