How do I perform browser automation using MCP servers?

Browser automation using Model Context Protocol (MCP) servers provides a powerful way to control headless browsers for web scraping, testing, and data extraction. MCP servers like Playwright and Puppeteer offer standardized interfaces that allow AI assistants and automation tools to interact with web browsers programmatically.

Understanding MCP Browser Automation

MCP (Model Context Protocol) servers for browser automation expose browser control capabilities through a standardized interface. This allows you to:

Navigate to web pages and interact with dynamic content
Click buttons, fill forms, and simulate user interactions
Wait for elements to load and handle asynchronous content
Take screenshots and extract page data
Handle authentication and session management
Monitor network requests and responses

The two most popular MCP servers for browser automation are:

Playwright MCP Server - A modern, cross-browser automation tool supporting Chrome, Firefox, and WebKit
Puppeteer MCP Server - A Node.js library that provides a high-level API for controlling Chrome/Chromium

Installing and Configuring MCP Browser Automation Servers

Setting Up Playwright MCP Server

First, ensure you have the necessary browsers installed:

# Install the Playwright MCP server
npx @modelcontextprotocol/create-server playwright

# Install browsers
npx playwright install chromium firefox webkit

# Or install specific browsers
npx playwright install chromium

Configure the MCP server in your claude_desktop_config.json or MCP configuration file:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-playwright"],
      "env": {
        "PLAYWRIGHT_BROWSER": "chromium"
      }
    }
  }
}

Setting Up Puppeteer MCP Server

# Install the Puppeteer MCP server
npm install -g @modelcontextprotocol/server-puppeteer

# The browser is automatically downloaded with Puppeteer

Configuration example:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-puppeteer"],
      "env": {
        "PUPPETEER_HEADLESS": "true"
      }
    }
  }
}

Basic Browser Automation Operations

Navigating to Web Pages

Once your MCP server is configured, you can navigate to pages using MCP tools:

# Using MCP client in Python
from mcp import Client

client = Client("playwright")

# Navigate to a webpage
result = client.call_tool(
    "browser_navigate",
    {"url": "https://example.com"}
)

print(f"Navigation completed: {result}")

For JavaScript implementations:

// Using MCP client in JavaScript/Node.js
const { MCPClient } = require('@modelcontextprotocol/sdk');

const client = new MCPClient('playwright');

async function navigateToPage() {
    const result = await client.callTool('browser_navigate', {
        url: 'https://example.com'
    });

    console.log('Page loaded:', result);
}

navigateToPage();

This approach is similar to navigating to different pages using Puppeteer, but through the standardized MCP interface.

Taking Page Snapshots

Capture the current state of a page for analysis:

# Get accessibility snapshot of current page
snapshot = client.call_tool("browser_snapshot", {})

# The snapshot contains structured data about the page
print(snapshot['data'])

// JavaScript equivalent
async function captureSnapshot() {
    const snapshot = await client.callTool('browser_snapshot', {});
    console.log('Page snapshot:', snapshot.data);
}

Taking Screenshots

# Take a full-page screenshot
screenshot = client.call_tool("browser_take_screenshot", {
    "filename": "page-capture.png",
    "fullPage": True,
    "type": "png"
})

print(f"Screenshot saved: {screenshot['filename']}")

Clicking Elements

# Click an element on the page
click_result = client.call_tool("browser_click", {
    "element": "Submit button",
    "ref": "button#submit",
    "button": "left"
})

Typing Text and Filling Forms

# Type text into an input field
client.call_tool("browser_type", {
    "element": "Search input field",
    "ref": "input[name='q']",
    "text": "web scraping with MCP",
    "submit": True  # Press Enter after typing
})

# Fill multiple form fields at once
client.call_tool("browser_fill_form", {
    "fields": [
        {
            "name": "Email field",
            "type": "textbox",
            "ref": "input[name='email']",
            "value": "user@example.com"
        },
        {
            "name": "Password field",
            "type": "textbox",
            "ref": "input[name='password']",
            "value": "securepassword123"
        },
        {
            "name": "Remember me checkbox",
            "type": "checkbox",
            "ref": "input[name='remember']",
            "value": "true"
        }
    ]
})

Advanced Browser Automation Techniques

Waiting for Elements and Content

Properly waiting for content to load is crucial for scraping dynamic websites. MCP servers provide several waiting mechanisms:

# Wait for specific text to appear
client.call_tool("browser_wait_for", {
    "text": "Results loaded"
})

# Wait for text to disappear (loading indicators)
client.call_tool("browser_wait_for", {
    "textGone": "Loading..."
})

# Wait for a specific time period
client.call_tool("browser_wait_for", {
    "time": 3  # Wait 3 seconds
})

These waiting strategies are essential when handling AJAX requests using Puppeteer or working with single-page applications.

Handling Dialogs and Pop-ups

# Accept a browser dialog (alert, confirm, prompt)
client.call_tool("browser_handle_dialog", {
    "accept": True,
    "promptText": "Optional text for prompt dialogs"
})

# Dismiss a dialog
client.call_tool("browser_handle_dialog", {
    "accept": False
})

Executing Custom JavaScript

# Execute JavaScript on the page
result = client.call_tool("browser_evaluate", {
    "function": """() => {
        return {
            title: document.title,
            url: window.location.href,
            links: Array.from(document.querySelectorAll('a')).length
        };
    }"""
})

print(f"Page analysis: {result}")

# Execute JavaScript on a specific element
result = client.call_tool("browser_evaluate", {
    "element": "Main container div",
    "ref": "div#main",
    "function": """(element) => {
        return {
            width: element.offsetWidth,
            height: element.offsetHeight,
            children: element.children.length
        };
    }"""
})

Managing Browser Tabs

# List all open tabs
tabs = client.call_tool("browser_tabs", {
    "action": "list"
})

# Open a new tab
client.call_tool("browser_tabs", {
    "action": "new"
})

# Switch to a specific tab
client.call_tool("browser_tabs", {
    "action": "select",
    "index": 1
})

# Close a tab
client.call_tool("browser_tabs", {
    "action": "close",
    "index": 0
})

Monitoring Network Requests

# Get all network requests since page load
requests = client.call_tool("browser_network_requests", {})

for request in requests['requests']:
    print(f"{request['method']} {request['url']}")
    print(f"Status: {request['status']}")
    print(f"Response size: {request['size']} bytes")

This is particularly useful for monitoring network requests in Puppeteer-based workflows.

Handling File Uploads

# Upload files through file input
client.call_tool("browser_file_upload", {
    "paths": [
        "/absolute/path/to/file1.pdf",
        "/absolute/path/to/file2.jpg"
    ]
})

# Cancel file chooser
client.call_tool("browser_file_upload", {
    "paths": []  # Empty array cancels the file chooser
})

Resizing Browser Window

# Set viewport size for responsive testing
client.call_tool("browser_resize", {
    "width": 1920,
    "height": 1080
})

# Mobile viewport
client.call_tool("browser_resize", {
    "width": 375,
    "height": 667
})

Complete Browser Automation Workflow Example

Here's a comprehensive example that demonstrates a complete web scraping workflow:

from mcp import Client
import json

def scrape_product_data(url):
    client = Client("playwright")

    # Navigate to product page
    client.call_tool("browser_navigate", {"url": url})

    # Wait for content to load
    client.call_tool("browser_wait_for", {
        "text": "Add to Cart"
    })

    # Take screenshot for verification
    client.call_tool("browser_take_screenshot", {
        "filename": "product-page.png",
        "fullPage": True
    })

    # Extract product data using JavaScript
    product_data = client.call_tool("browser_evaluate", {
        "function": """() => {
            return {
                title: document.querySelector('h1.product-title')?.textContent,
                price: document.querySelector('.price')?.textContent,
                description: document.querySelector('.description')?.textContent,
                availability: document.querySelector('.stock-status')?.textContent,
                images: Array.from(document.querySelectorAll('.product-image img'))
                    .map(img => img.src)
            };
        }"""
    })

    # Get network requests to find API calls
    network = client.call_tool("browser_network_requests", {})
    api_calls = [req for req in network['requests']
                 if 'api' in req['url']]

    # Close the browser
    client.call_tool("browser_close", {})

    return {
        "product": product_data,
        "api_calls": api_calls
    }

# Use the scraper
result = scrape_product_data("https://example.com/product/123")
print(json.dumps(result, indent=2))

JavaScript equivalent:

const { MCPClient } = require('@modelcontextprotocol/sdk');

async function scrapeProductData(url) {
    const client = new MCPClient('playwright');

    // Navigate to product page
    await client.callTool('browser_navigate', { url });

    // Wait for content
    await client.callTool('browser_wait_for', {
        text: 'Add to Cart'
    });

    // Take screenshot
    await client.callTool('browser_take_screenshot', {
        filename: 'product-page.png',
        fullPage: true
    });

    // Extract data
    const productData = await client.callTool('browser_evaluate', {
        function: `() => {
            return {
                title: document.querySelector('h1.product-title')?.textContent,
                price: document.querySelector('.price')?.textContent,
                description: document.querySelector('.description')?.textContent,
                availability: document.querySelector('.stock-status')?.textContent,
                images: Array.from(document.querySelectorAll('.product-image img'))
                    .map(img => img.src)
            };
        }`
    });

    // Get network data
    const network = await client.callTool('browser_network_requests', {});

    // Close browser
    await client.callTool('browser_close', {});

    return {
        product: productData,
        networkRequests: network.requests
    };
}

// Execute
scrapeProductData('https://example.com/product/123')
    .then(result => console.log(JSON.stringify(result, null, 2)))
    .catch(error => console.error('Scraping failed:', error));

Best Practices for MCP Browser Automation

1. Always Wait for Elements

Never assume elements are immediately available. Always use appropriate waiting mechanisms:

# Good practice
client.call_tool("browser_wait_for", {"text": "Expected content"})
result = client.call_tool("browser_click", {
    "element": "Button",
    "ref": "button.submit"
})

# Avoid immediate actions without waiting

2. Handle Errors Gracefully

Implement error handling for network issues, missing elements, and timeouts:

try:
    client.call_tool("browser_navigate", {"url": target_url})
    client.call_tool("browser_wait_for", {"text": "Content loaded", "time": 10})
except TimeoutError:
    print("Page load timeout - retrying...")
    # Implement retry logic
except Exception as e:
    print(f"Browser automation error: {e}")
    # Log error and clean up

3. Clean Up Resources

Always close browsers and tabs when done:

try:
    # Your automation code
    result = perform_scraping()
finally:
    # Always close the browser
    client.call_tool("browser_close", {})

4. Use Snapshots Before Screenshots

Accessibility snapshots are more efficient than screenshots for data extraction:

# Efficient: Get structured data
snapshot = client.call_tool("browser_snapshot", {})

# Use screenshots only when visual verification is needed
client.call_tool("browser_take_screenshot", {
    "filename": "visual-check.png"
})

5. Respect Rate Limits

Add delays between requests to avoid overwhelming servers:

import time

for url in urls:
    client.call_tool("browser_navigate", {"url": url})
    # Extract data...
    time.sleep(2)  # 2-second delay between pages

Troubleshooting Common Issues

Browser Installation Problems

# Verify browser installation
npx playwright install --dry-run

# Reinstall specific browser
npx playwright install chromium --force

# Check installed browsers
npx playwright --version

Connection Issues

# Test MCP server connection
try:
    client = Client("playwright")
    result = client.call_tool("browser_navigate", {
        "url": "https://example.com"
    })
    print("Connection successful")
except Exception as e:
    print(f"Connection failed: {e}")

Performance Optimization

# Disable images and CSS for faster loading
client.call_tool("browser_navigate", {
    "url": target_url,
    "waitUntil": "domcontentloaded"  # Don't wait for all resources
})

# Use headless mode
# Configure in MCP server environment
# PLAYWRIGHT_HEADLESS=true

Conclusion

Browser automation through MCP servers provides a powerful, standardized way to control headless browsers for web scraping and testing. By using Playwright or Puppeteer MCP servers, you can:

Interact with dynamic JavaScript-heavy websites
Automate complex user workflows
Extract data from pages that require interaction
Monitor network traffic and API calls
Handle authentication and session management

The MCP protocol abstracts browser automation complexity, making it accessible through simple tool calls while maintaining the full power of underlying browser automation frameworks. Whether you're building web scraping pipelines, automated testing suites, or data extraction workflows, MCP browser automation servers offer a modern, efficient approach to controlling browsers programmatically.

Table of contents