How do I scrape websites using Playwright with MCP?
The Model Context Protocol (MCP) provides a powerful way to integrate Playwright browser automation into AI-powered workflows, particularly with Claude Desktop and other AI assistants. The Playwright MCP server enables you to perform sophisticated web scraping tasks through a simple, natural language interface while leveraging Playwright's full browser automation capabilities.
What is Playwright MCP Server?
The Playwright MCP server is an official Anthropic implementation that exposes Playwright browser automation functionality through the Model Context Protocol. It allows AI models to control a browser, navigate web pages, interact with elements, and extract data - all through structured tool calls rather than writing code manually.
Unlike traditional Playwright scripting, the MCP approach lets you describe what you want to scrape in natural language, and the AI assistant handles the browser automation details. This is particularly useful for rapid prototyping, one-off scraping tasks, or building complex automation workflows.
Installing Playwright MCP Server
Prerequisites
Before you begin, ensure you have:
- Node.js 18 or higher installed
- Claude Desktop application (or another MCP-compatible client)
- Basic familiarity with web scraping concepts
Installation Steps
1. Install the Playwright MCP Server via npm:
npm install -g @modelcontextprotocol/server-playwright
2. Configure Claude Desktop to use the MCP server:
Edit your Claude Desktop configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add the Playwright server configuration:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-playwright"
]
}
}
}
3. Install Playwright browsers:
After configuration, the MCP server will prompt you to install browsers on first use. You can also install them manually:
npx playwright install
4. Restart Claude Desktop to load the new configuration.
Available Playwright MCP Tools
The Playwright MCP server provides numerous tools for browser automation and web scraping:
Navigation Tools
browser_navigate
- Navigate to a URLbrowser_navigate_back
- Go back to the previous pagebrowser_tabs
- List, create, close, or switch browser tabs
Content Extraction Tools
browser_snapshot
- Capture accessibility tree snapshot (recommended for content extraction)browser_take_screenshot
- Take PNG or JPEG screenshotsbrowser_console_messages
- Retrieve console logs and errors
Interaction Tools
browser_click
- Click on elementsbrowser_type
- Type text into input fieldsbrowser_fill_form
- Fill multiple form fields at oncebrowser_select_option
- Select dropdown optionsbrowser_hover
- Hover over elementsbrowser_drag
- Drag and drop elements
Advanced Tools
browser_evaluate
- Execute JavaScript in the page contextbrowser_wait_for
- Wait for text to appear/disappear or for specific timebrowser_network_requests
- Monitor network activitybrowser_mouse_move_xy
,browser_mouse_click_xy
- Precise mouse control
Basic Web Scraping with Playwright MCP
Here's how to perform common web scraping tasks using the Playwright MCP server through Claude Desktop:
Example 1: Extracting Article Content
Simply describe what you want in natural language:
Navigate to https://example.com/article and extract the article title and content.
Behind the scenes, Claude will:
1. Use browser_navigate
to load the page
2. Use browser_snapshot
to capture the page structure
3. Parse and extract the requested content
4. Present it in a readable format
Example 2: Scraping Product Listings
For more complex scenarios like paginated product listings:
Go to https://example-store.com/products, extract all product names and prices
from the first page, then click the "Next" button and extract products from
the second page as well.
The AI assistant will: 1. Navigate to the URL 2. Take a snapshot to identify products 3. Extract structured data 4. Locate and click the pagination button 5. Extract data from the next page 6. Compile results from both pages
Example 3: Handling Dynamic Content
For websites that load content dynamically, similar to handling AJAX requests using Puppeteer:
Navigate to https://spa-example.com, wait for the products section to load,
then extract all product titles.
The MCP server can use browser_wait_for
to ensure content is loaded before extraction.
Advanced Scraping Techniques
Form Submission and Authentication
You can automate login flows and form submissions:
Navigate to https://example.com/login, fill in the username field with "user@example.com",
fill in the password field with "password123", click the login button, wait for the
dashboard to load, then extract the user's account balance.
For more complex authentication scenarios, check out how to handle authentication in Puppeteer, which uses similar concepts.
JavaScript Execution
Execute custom JavaScript to interact with the page:
Navigate to https://example.com and execute JavaScript to scroll to the bottom
of the page, then extract all loaded items.
This uses the browser_evaluate
tool to run custom code.
Multi-Tab Scraping
Handle multiple pages simultaneously:
Open three new tabs, navigate each to different product category pages,
extract the top 5 products from each, and compile them into a single list.
The browser_tabs
tool manages multiple browser contexts efficiently.
Screenshot-Based Verification
Capture visual evidence of scraping results:
Navigate to the pricing page, take a screenshot of the pricing table,
then extract all plan names and prices.
Programmatic MCP Integration
While Claude Desktop provides a natural language interface, you can also integrate the Playwright MCP server programmatically using the MCP SDK.
Python Example
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def scrape_with_playwright_mcp():
server_params = StdioServerParameters(
command="npx",
args=["-y", "@modelcontextprotocol/server-playwright"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Navigate to a page
await session.call_tool("browser_navigate", {
"url": "https://example.com"
})
# Take a snapshot
snapshot = await session.call_tool("browser_snapshot", {})
# Extract data from snapshot
print(snapshot)
# Click an element
await session.call_tool("browser_click", {
"element": "Submit button",
"ref": "button[type='submit']"
})
asyncio.run(scrape_with_playwright_mcp())
JavaScript/TypeScript Example
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
async function scrapeWithPlaywrightMCP() {
const transport = new StdioClientTransport({
command: "npx",
args: ["-y", "@modelcontextprotocol/server-playwright"],
});
const client = new Client({
name: "playwright-scraper",
version: "1.0.0",
}, {
capabilities: {}
});
await client.connect(transport);
// Navigate to URL
await client.callTool("browser_navigate", {
url: "https://example.com"
});
// Get page snapshot
const snapshot = await client.callTool("browser_snapshot", {});
console.log(snapshot);
// Extract specific content
const result = await client.callTool("browser_evaluate", {
function: "() => document.querySelector('h1').textContent"
});
console.log("Page title:", result);
await client.close();
}
scrapeWithPlaywrightMCP();
Best Practices for Playwright MCP Scraping
1. Use Snapshots Over Screenshots
The browser_snapshot
tool returns an accessibility tree representation of the page, which is more efficient and easier to parse than screenshots for data extraction.
2. Handle Timeouts Appropriately
Always use browser_wait_for
when dealing with dynamic content. This is crucial for handling timeouts in Puppeteer and applies equally to Playwright MCP.
Before extracting data, wait for the text "Products loaded" to appear on the page.
3. Respect Rate Limits
When scraping multiple pages, add delays between requests:
Navigate to each URL in the list, waiting 2 seconds between each navigation,
and extract the product information.
4. Monitor Network Activity
Use browser_network_requests
to understand what data the page loads:
Navigate to the page, then show me all API requests that were made.
This helps identify API endpoints you could call directly instead of browser automation.
5. Error Handling
Check console messages for JavaScript errors that might affect scraping:
Navigate to the page, extract the data, and also show me any console errors
that occurred.
Limitations and Considerations
Resource Usage
Running a full browser instance through MCP is resource-intensive. For simple HTML scraping, consider using the WebScraping.AI API which handles browser management and proxy rotation automatically.
Scaling Challenges
The Playwright MCP server runs a single browser instance. For parallel scraping of many pages, traditional Playwright scripts or specialized scraping APIs are more efficient.
Session Persistence
Browser sessions are not automatically persisted between Claude Desktop restarts. For workflows requiring session continuity, implement custom session management.
Anti-Bot Detection
While Playwright MCP uses a real browser, sophisticated bot detection may still block automated access. Use proxies and consider professional scraping services for production use.
Comparing Playwright MCP with Direct API Scraping
For production web scraping, dedicated scraping APIs often provide better performance and reliability:
Playwright MCP Advantages: - Natural language interface for rapid prototyping - Full browser automation capabilities - Excellent for interactive debugging - Handles complex JavaScript-heavy sites
WebScraping.AI API Advantages: - No browser management required - Built-in proxy rotation and CAPTCHA handling - Faster for simple HTML extraction - Better for high-volume scraping - Automatic JavaScript rendering when needed
Example: Complete Scraping Workflow
Here's a complete workflow that demonstrates multiple MCP capabilities:
1. Navigate to https://news-site.example.com
2. Wait for the articles section to load
3. Extract headlines and links from the first 10 articles
4. Open the first article link in a new tab
5. Switch to that tab
6. Take a screenshot of the article
7. Extract the full article text
8. Check console for any errors
9. Close the article tab
10. Return the compiled data
This workflow uses navigation, waiting, extraction, tab management, screenshots, and error checking - all through simple natural language instructions.
Conclusion
The Playwright MCP server transforms browser automation into a conversational interface, making web scraping more accessible while maintaining the power of Playwright. It's ideal for exploratory scraping, rapid prototyping, and building AI-powered automation workflows.
For production scraping needs, consider combining MCP for development and testing with robust scraping APIs like WebScraping.AI for execution. This gives you the flexibility of browser automation when needed and the efficiency of API-based scraping for scale.
To get started, install the Playwright MCP server, configure it in Claude Desktop, and begin describing your scraping tasks in natural language. The AI assistant will handle the browser automation details, letting you focus on extracting the data you need.