How do I perform browser automation using MCP servers?
Browser automation using Model Context Protocol (MCP) servers provides a powerful way to control headless browsers for web scraping, testing, and data extraction. MCP servers like Playwright and Puppeteer offer standardized interfaces that allow AI assistants and automation tools to interact with web browsers programmatically.
Understanding MCP Browser Automation
MCP (Model Context Protocol) servers for browser automation expose browser control capabilities through a standardized interface. This allows you to:
- Navigate to web pages and interact with dynamic content
- Click buttons, fill forms, and simulate user interactions
- Wait for elements to load and handle asynchronous content
- Take screenshots and extract page data
- Handle authentication and session management
- Monitor network requests and responses
The two most popular MCP servers for browser automation are:
- Playwright MCP Server - A modern, cross-browser automation tool supporting Chrome, Firefox, and WebKit
- Puppeteer MCP Server - A Node.js library that provides a high-level API for controlling Chrome/Chromium
Installing and Configuring MCP Browser Automation Servers
Setting Up Playwright MCP Server
First, ensure you have the necessary browsers installed:
# Install the Playwright MCP server
npx @modelcontextprotocol/create-server playwright
# Install browsers
npx playwright install chromium firefox webkit
# Or install specific browsers
npx playwright install chromium
Configure the MCP server in your claude_desktop_config.json
or MCP configuration file:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-playwright"],
"env": {
"PLAYWRIGHT_BROWSER": "chromium"
}
}
}
}
Setting Up Puppeteer MCP Server
# Install the Puppeteer MCP server
npm install -g @modelcontextprotocol/server-puppeteer
# The browser is automatically downloaded with Puppeteer
Configuration example:
{
"mcpServers": {
"puppeteer": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"],
"env": {
"PUPPETEER_HEADLESS": "true"
}
}
}
}
Basic Browser Automation Operations
Navigating to Web Pages
Once your MCP server is configured, you can navigate to pages using MCP tools:
# Using MCP client in Python
from mcp import Client
client = Client("playwright")
# Navigate to a webpage
result = client.call_tool(
"browser_navigate",
{"url": "https://example.com"}
)
print(f"Navigation completed: {result}")
For JavaScript implementations:
// Using MCP client in JavaScript/Node.js
const { MCPClient } = require('@modelcontextprotocol/sdk');
const client = new MCPClient('playwright');
async function navigateToPage() {
const result = await client.callTool('browser_navigate', {
url: 'https://example.com'
});
console.log('Page loaded:', result);
}
navigateToPage();
This approach is similar to navigating to different pages using Puppeteer, but through the standardized MCP interface.
Taking Page Snapshots
Capture the current state of a page for analysis:
# Get accessibility snapshot of current page
snapshot = client.call_tool("browser_snapshot", {})
# The snapshot contains structured data about the page
print(snapshot['data'])
// JavaScript equivalent
async function captureSnapshot() {
const snapshot = await client.callTool('browser_snapshot', {});
console.log('Page snapshot:', snapshot.data);
}
Taking Screenshots
# Take a full-page screenshot
screenshot = client.call_tool("browser_take_screenshot", {
"filename": "page-capture.png",
"fullPage": True,
"type": "png"
})
print(f"Screenshot saved: {screenshot['filename']}")
Clicking Elements
# Click an element on the page
click_result = client.call_tool("browser_click", {
"element": "Submit button",
"ref": "button#submit",
"button": "left"
})
Typing Text and Filling Forms
# Type text into an input field
client.call_tool("browser_type", {
"element": "Search input field",
"ref": "input[name='q']",
"text": "web scraping with MCP",
"submit": True # Press Enter after typing
})
# Fill multiple form fields at once
client.call_tool("browser_fill_form", {
"fields": [
{
"name": "Email field",
"type": "textbox",
"ref": "input[name='email']",
"value": "user@example.com"
},
{
"name": "Password field",
"type": "textbox",
"ref": "input[name='password']",
"value": "securepassword123"
},
{
"name": "Remember me checkbox",
"type": "checkbox",
"ref": "input[name='remember']",
"value": "true"
}
]
})
Advanced Browser Automation Techniques
Waiting for Elements and Content
Properly waiting for content to load is crucial for scraping dynamic websites. MCP servers provide several waiting mechanisms:
# Wait for specific text to appear
client.call_tool("browser_wait_for", {
"text": "Results loaded"
})
# Wait for text to disappear (loading indicators)
client.call_tool("browser_wait_for", {
"textGone": "Loading..."
})
# Wait for a specific time period
client.call_tool("browser_wait_for", {
"time": 3 # Wait 3 seconds
})
These waiting strategies are essential when handling AJAX requests using Puppeteer or working with single-page applications.
Handling Dialogs and Pop-ups
# Accept a browser dialog (alert, confirm, prompt)
client.call_tool("browser_handle_dialog", {
"accept": True,
"promptText": "Optional text for prompt dialogs"
})
# Dismiss a dialog
client.call_tool("browser_handle_dialog", {
"accept": False
})
Executing Custom JavaScript
# Execute JavaScript on the page
result = client.call_tool("browser_evaluate", {
"function": """() => {
return {
title: document.title,
url: window.location.href,
links: Array.from(document.querySelectorAll('a')).length
};
}"""
})
print(f"Page analysis: {result}")
# Execute JavaScript on a specific element
result = client.call_tool("browser_evaluate", {
"element": "Main container div",
"ref": "div#main",
"function": """(element) => {
return {
width: element.offsetWidth,
height: element.offsetHeight,
children: element.children.length
};
}"""
})
Managing Browser Tabs
# List all open tabs
tabs = client.call_tool("browser_tabs", {
"action": "list"
})
# Open a new tab
client.call_tool("browser_tabs", {
"action": "new"
})
# Switch to a specific tab
client.call_tool("browser_tabs", {
"action": "select",
"index": 1
})
# Close a tab
client.call_tool("browser_tabs", {
"action": "close",
"index": 0
})
Monitoring Network Requests
# Get all network requests since page load
requests = client.call_tool("browser_network_requests", {})
for request in requests['requests']:
print(f"{request['method']} {request['url']}")
print(f"Status: {request['status']}")
print(f"Response size: {request['size']} bytes")
This is particularly useful for monitoring network requests in Puppeteer-based workflows.
Handling File Uploads
# Upload files through file input
client.call_tool("browser_file_upload", {
"paths": [
"/absolute/path/to/file1.pdf",
"/absolute/path/to/file2.jpg"
]
})
# Cancel file chooser
client.call_tool("browser_file_upload", {
"paths": [] # Empty array cancels the file chooser
})
Resizing Browser Window
# Set viewport size for responsive testing
client.call_tool("browser_resize", {
"width": 1920,
"height": 1080
})
# Mobile viewport
client.call_tool("browser_resize", {
"width": 375,
"height": 667
})
Complete Browser Automation Workflow Example
Here's a comprehensive example that demonstrates a complete web scraping workflow:
from mcp import Client
import json
def scrape_product_data(url):
client = Client("playwright")
# Navigate to product page
client.call_tool("browser_navigate", {"url": url})
# Wait for content to load
client.call_tool("browser_wait_for", {
"text": "Add to Cart"
})
# Take screenshot for verification
client.call_tool("browser_take_screenshot", {
"filename": "product-page.png",
"fullPage": True
})
# Extract product data using JavaScript
product_data = client.call_tool("browser_evaluate", {
"function": """() => {
return {
title: document.querySelector('h1.product-title')?.textContent,
price: document.querySelector('.price')?.textContent,
description: document.querySelector('.description')?.textContent,
availability: document.querySelector('.stock-status')?.textContent,
images: Array.from(document.querySelectorAll('.product-image img'))
.map(img => img.src)
};
}"""
})
# Get network requests to find API calls
network = client.call_tool("browser_network_requests", {})
api_calls = [req for req in network['requests']
if 'api' in req['url']]
# Close the browser
client.call_tool("browser_close", {})
return {
"product": product_data,
"api_calls": api_calls
}
# Use the scraper
result = scrape_product_data("https://example.com/product/123")
print(json.dumps(result, indent=2))
JavaScript equivalent:
const { MCPClient } = require('@modelcontextprotocol/sdk');
async function scrapeProductData(url) {
const client = new MCPClient('playwright');
// Navigate to product page
await client.callTool('browser_navigate', { url });
// Wait for content
await client.callTool('browser_wait_for', {
text: 'Add to Cart'
});
// Take screenshot
await client.callTool('browser_take_screenshot', {
filename: 'product-page.png',
fullPage: true
});
// Extract data
const productData = await client.callTool('browser_evaluate', {
function: `() => {
return {
title: document.querySelector('h1.product-title')?.textContent,
price: document.querySelector('.price')?.textContent,
description: document.querySelector('.description')?.textContent,
availability: document.querySelector('.stock-status')?.textContent,
images: Array.from(document.querySelectorAll('.product-image img'))
.map(img => img.src)
};
}`
});
// Get network data
const network = await client.callTool('browser_network_requests', {});
// Close browser
await client.callTool('browser_close', {});
return {
product: productData,
networkRequests: network.requests
};
}
// Execute
scrapeProductData('https://example.com/product/123')
.then(result => console.log(JSON.stringify(result, null, 2)))
.catch(error => console.error('Scraping failed:', error));
Best Practices for MCP Browser Automation
1. Always Wait for Elements
Never assume elements are immediately available. Always use appropriate waiting mechanisms:
# Good practice
client.call_tool("browser_wait_for", {"text": "Expected content"})
result = client.call_tool("browser_click", {
"element": "Button",
"ref": "button.submit"
})
# Avoid immediate actions without waiting
2. Handle Errors Gracefully
Implement error handling for network issues, missing elements, and timeouts:
try:
client.call_tool("browser_navigate", {"url": target_url})
client.call_tool("browser_wait_for", {"text": "Content loaded", "time": 10})
except TimeoutError:
print("Page load timeout - retrying...")
# Implement retry logic
except Exception as e:
print(f"Browser automation error: {e}")
# Log error and clean up
3. Clean Up Resources
Always close browsers and tabs when done:
try:
# Your automation code
result = perform_scraping()
finally:
# Always close the browser
client.call_tool("browser_close", {})
4. Use Snapshots Before Screenshots
Accessibility snapshots are more efficient than screenshots for data extraction:
# Efficient: Get structured data
snapshot = client.call_tool("browser_snapshot", {})
# Use screenshots only when visual verification is needed
client.call_tool("browser_take_screenshot", {
"filename": "visual-check.png"
})
5. Respect Rate Limits
Add delays between requests to avoid overwhelming servers:
import time
for url in urls:
client.call_tool("browser_navigate", {"url": url})
# Extract data...
time.sleep(2) # 2-second delay between pages
Troubleshooting Common Issues
Browser Installation Problems
# Verify browser installation
npx playwright install --dry-run
# Reinstall specific browser
npx playwright install chromium --force
# Check installed browsers
npx playwright --version
Connection Issues
# Test MCP server connection
try:
client = Client("playwright")
result = client.call_tool("browser_navigate", {
"url": "https://example.com"
})
print("Connection successful")
except Exception as e:
print(f"Connection failed: {e}")
Performance Optimization
# Disable images and CSS for faster loading
client.call_tool("browser_navigate", {
"url": target_url,
"waitUntil": "domcontentloaded" # Don't wait for all resources
})
# Use headless mode
# Configure in MCP server environment
# PLAYWRIGHT_HEADLESS=true
Conclusion
Browser automation through MCP servers provides a powerful, standardized way to control headless browsers for web scraping and testing. By using Playwright or Puppeteer MCP servers, you can:
- Interact with dynamic JavaScript-heavy websites
- Automate complex user workflows
- Extract data from pages that require interaction
- Monitor network traffic and API calls
- Handle authentication and session management
The MCP protocol abstracts browser automation complexity, making it accessible through simple tool calls while maintaining the full power of underlying browser automation frameworks. Whether you're building web scraping pipelines, automated testing suites, or data extraction workflows, MCP browser automation servers offer a modern, efficient approach to controlling browsers programmatically.