How do I scrape websites using Playwright with MCP?

The Model Context Protocol (MCP) provides a powerful way to integrate Playwright browser automation into AI-powered workflows, particularly with Claude Desktop and other AI assistants. The Playwright MCP server enables you to perform sophisticated web scraping tasks through a simple, natural language interface while leveraging Playwright's full browser automation capabilities.

What is Playwright MCP Server?

The Playwright MCP server is an official Anthropic implementation that exposes Playwright browser automation functionality through the Model Context Protocol. It allows AI models to control a browser, navigate web pages, interact with elements, and extract data - all through structured tool calls rather than writing code manually.

Unlike traditional Playwright scripting, the MCP approach lets you describe what you want to scrape in natural language, and the AI assistant handles the browser automation details. This is particularly useful for rapid prototyping, one-off scraping tasks, or building complex automation workflows.

Installing Playwright MCP Server

Prerequisites

Before you begin, ensure you have:

Node.js 18 or higher installed
Claude Desktop application (or another MCP-compatible client)
Basic familiarity with web scraping concepts

Installation Steps

1. Install the Playwright MCP Server via npm:

npm install -g @modelcontextprotocol/server-playwright

2. Configure Claude Desktop to use the MCP server:

Edit your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Add the Playwright server configuration:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-playwright"
      ]
    }
  }
}

3. Install Playwright browsers:

After configuration, the MCP server will prompt you to install browsers on first use. You can also install them manually:

npx playwright install

4. Restart Claude Desktop to load the new configuration.

Available Playwright MCP Tools

The Playwright MCP server provides numerous tools for browser automation and web scraping:

Navigation Tools

browser_navigate - Navigate to a URL
browser_navigate_back - Go back to the previous page
browser_tabs - List, create, close, or switch browser tabs

Content Extraction Tools

browser_snapshot - Capture accessibility tree snapshot (recommended for content extraction)
browser_take_screenshot - Take PNG or JPEG screenshots
browser_console_messages - Retrieve console logs and errors

Interaction Tools

browser_click - Click on elements
browser_type - Type text into input fields
browser_fill_form - Fill multiple form fields at once
browser_select_option - Select dropdown options
browser_hover - Hover over elements
browser_drag - Drag and drop elements

Advanced Tools

browser_evaluate - Execute JavaScript in the page context
browser_wait_for - Wait for text to appear/disappear or for specific time
browser_network_requests - Monitor network activity
browser_mouse_move_xy, browser_mouse_click_xy - Precise mouse control

Basic Web Scraping with Playwright MCP

Here's how to perform common web scraping tasks using the Playwright MCP server through Claude Desktop:

Example 1: Extracting Article Content

Simply describe what you want in natural language:

Navigate to https://example.com/article and extract the article title and content.

Behind the scenes, Claude will: 1. Use browser_navigate to load the page 2. Use browser_snapshot to capture the page structure 3. Parse and extract the requested content 4. Present it in a readable format

Example 2: Scraping Product Listings

For more complex scenarios like paginated product listings:

Go to https://example-store.com/products, extract all product names and prices
from the first page, then click the "Next" button and extract products from
the second page as well.

The AI assistant will: 1. Navigate to the URL 2. Take a snapshot to identify products 3. Extract structured data 4. Locate and click the pagination button 5. Extract data from the next page 6. Compile results from both pages

Example 3: Handling Dynamic Content

For websites that load content dynamically, similar to handling AJAX requests using Puppeteer:

Navigate to https://spa-example.com, wait for the products section to load,
then extract all product titles.

The MCP server can use browser_wait_for to ensure content is loaded before extraction.

Advanced Scraping Techniques

Form Submission and Authentication

You can automate login flows and form submissions:

Navigate to https://example.com/login, fill in the username field with "user@example.com",
fill in the password field with "password123", click the login button, wait for the
dashboard to load, then extract the user's account balance.

For more complex authentication scenarios, check out how to handle authentication in Puppeteer, which uses similar concepts.

JavaScript Execution

Execute custom JavaScript to interact with the page:

Navigate to https://example.com and execute JavaScript to scroll to the bottom
of the page, then extract all loaded items.

This uses the browser_evaluate tool to run custom code.

Multi-Tab Scraping

Handle multiple pages simultaneously:

Open three new tabs, navigate each to different product category pages,
extract the top 5 products from each, and compile them into a single list.

The browser_tabs tool manages multiple browser contexts efficiently.

Screenshot-Based Verification

Capture visual evidence of scraping results:

Navigate to the pricing page, take a screenshot of the pricing table,
then extract all plan names and prices.

Programmatic MCP Integration

While Claude Desktop provides a natural language interface, you can also integrate the Playwright MCP server programmatically using the MCP SDK.

Python Example

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def scrape_with_playwright_mcp():
    server_params = StdioServerParameters(
        command="npx",
        args=["-y", "@modelcontextprotocol/server-playwright"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Navigate to a page
            await session.call_tool("browser_navigate", {
                "url": "https://example.com"
            })

            # Take a snapshot
            snapshot = await session.call_tool("browser_snapshot", {})

            # Extract data from snapshot
            print(snapshot)

            # Click an element
            await session.call_tool("browser_click", {
                "element": "Submit button",
                "ref": "button[type='submit']"
            })

asyncio.run(scrape_with_playwright_mcp())

JavaScript/TypeScript Example

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

async function scrapeWithPlaywrightMCP() {
  const transport = new StdioClientTransport({
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-playwright"],
  });

  const client = new Client({
    name: "playwright-scraper",
    version: "1.0.0",
  }, {
    capabilities: {}
  });

  await client.connect(transport);

  // Navigate to URL
  await client.callTool("browser_navigate", {
    url: "https://example.com"
  });

  // Get page snapshot
  const snapshot = await client.callTool("browser_snapshot", {});
  console.log(snapshot);

  // Extract specific content
  const result = await client.callTool("browser_evaluate", {
    function: "() => document.querySelector('h1').textContent"
  });

  console.log("Page title:", result);

  await client.close();
}

scrapeWithPlaywrightMCP();

Best Practices for Playwright MCP Scraping

1. Use Snapshots Over Screenshots

The browser_snapshot tool returns an accessibility tree representation of the page, which is more efficient and easier to parse than screenshots for data extraction.

2. Handle Timeouts Appropriately

Always use browser_wait_for when dealing with dynamic content. This is crucial for handling timeouts in Puppeteer and applies equally to Playwright MCP.

Before extracting data, wait for the text "Products loaded" to appear on the page.

3. Respect Rate Limits

When scraping multiple pages, add delays between requests:

Navigate to each URL in the list, waiting 2 seconds between each navigation,
and extract the product information.

4. Monitor Network Activity

Use browser_network_requests to understand what data the page loads:

Navigate to the page, then show me all API requests that were made.

This helps identify API endpoints you could call directly instead of browser automation.

5. Error Handling

Check console messages for JavaScript errors that might affect scraping:

Navigate to the page, extract the data, and also show me any console errors
that occurred.

Limitations and Considerations

Resource Usage

Running a full browser instance through MCP is resource-intensive. For simple HTML scraping, consider using the WebScraping.AI API which handles browser management and proxy rotation automatically.

Scaling Challenges

The Playwright MCP server runs a single browser instance. For parallel scraping of many pages, traditional Playwright scripts or specialized scraping APIs are more efficient.

Session Persistence

Browser sessions are not automatically persisted between Claude Desktop restarts. For workflows requiring session continuity, implement custom session management.

Anti-Bot Detection

While Playwright MCP uses a real browser, sophisticated bot detection may still block automated access. Use proxies and consider professional scraping services for production use.

Comparing Playwright MCP with Direct API Scraping

For production web scraping, dedicated scraping APIs often provide better performance and reliability:

Playwright MCP Advantages: - Natural language interface for rapid prototyping - Full browser automation capabilities - Excellent for interactive debugging - Handles complex JavaScript-heavy sites

WebScraping.AI API Advantages: - No browser management required - Built-in proxy rotation and CAPTCHA handling - Faster for simple HTML extraction - Better for high-volume scraping - Automatic JavaScript rendering when needed

Example: Complete Scraping Workflow

Here's a complete workflow that demonstrates multiple MCP capabilities:

1. Navigate to https://news-site.example.com
2. Wait for the articles section to load
3. Extract headlines and links from the first 10 articles
4. Open the first article link in a new tab
5. Switch to that tab
6. Take a screenshot of the article
7. Extract the full article text
8. Check console for any errors
9. Close the article tab
10. Return the compiled data

This workflow uses navigation, waiting, extraction, tab management, screenshots, and error checking - all through simple natural language instructions.

Conclusion

The Playwright MCP server transforms browser automation into a conversational interface, making web scraping more accessible while maintaining the power of Playwright. It's ideal for exploratory scraping, rapid prototyping, and building AI-powered automation workflows.

For production scraping needs, consider combining MCP for development and testing with robust scraping APIs like WebScraping.AI for execution. This gives you the flexibility of browser automation when needed and the efficiency of API-based scraping for scale.

To get started, install the Playwright MCP server, configure it in Claude Desktop, and begin describing your scraping tasks in natural language. The AI assistant will handle the browser automation details, letting you focus on extracting the data you need.

Table of contents