What is Anthropic MCP and How Do I Use It?
Anthropic MCP (Model Context Protocol) is an open-source protocol developed by Anthropic that enables AI applications like Claude to connect with external data sources, tools, and services through a standardized interface. For web scraping developers, MCP provides a revolutionary way to build AI-powered scraping workflows by allowing language models to directly interact with scraping APIs, browser automation tools, and data storage systems.
MCP transforms how developers approach web scraping by bridging the gap between AI intelligence and practical scraping tools. Instead of writing complex custom integrations, you can create MCP servers that expose scraping capabilities through a unified protocol that works across different AI assistants and applications.
Understanding Anthropic MCP
Anthropic introduced MCP in November 2024 as part of their vision to make AI assistants more useful by connecting them to real-world data and tools. The protocol is designed with three core principles:
1. Standardization
MCP provides a common language for AI-tool communication. Whether you're building with Python, JavaScript, or any other language, the protocol remains consistent, ensuring your scraping tools work with any MCP-compatible AI application.
2. Security
MCP implements strict security boundaries between AI applications and external services. Your scraping credentials, API keys, and sensitive data remain protected through proper isolation and authentication mechanisms.
3. Modularity
Build once, use everywhere. An MCP server you create for web scraping can be shared, reused, and integrated into different projects without modification.
MCP Architecture for Web Scraping
The MCP architecture consists of three key components working together:
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ AI Assistant │ │ MCP Server │ │ Scraping Tools │
│ (Claude) │◄───────►│ (Your Code) │◄───────►│ APIs/Browsers │
└──────────────────┘ MCP └──────────────────┘ HTTP └──────────────────┘
│
▼
┌──────────────────┐
│ Data Storage │
│ Database/Files │
└──────────────────┘
MCP Hosts: Applications like Claude Desktop that embed AI models and initiate connections to MCP servers.
MCP Servers: Lightweight programs you build that expose web scraping capabilities (tools, resources, and prompts).
External Services: The actual scraping infrastructure—APIs like WebScraping.AI, browser automation with Puppeteer/Playwright, or databases.
Building Your First MCP Server for Web Scraping
Let's create a practical MCP server that exposes web scraping capabilities to Claude or other AI assistants.
Python Implementation
Here's a complete Python MCP server using the official SDK:
import asyncio
import os
import httpx
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
# Initialize the MCP server
app = Server("webscraping-ai-mcp")
# Store your API key securely
WEBSCRAPING_AI_KEY = os.getenv("WEBSCRAPING_AI_API_KEY")
# Define the tools available to AI assistants
@app.list_tools()
async def list_tools() -> list[Tool]:
"""Register web scraping tools with the MCP server"""
return [
Tool(
name="scrape_html",
description="Extract HTML content from any URL with JavaScript rendering support",
inputSchema={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The target URL to scrape"
},
"js": {
"type": "boolean",
"description": "Enable JavaScript rendering (default: true)",
"default": True
},
"wait_for": {
"type": "string",
"description": "CSS selector to wait for before returning content"
},
"timeout": {
"type": "integer",
"description": "Maximum wait time in milliseconds",
"default": 15000
}
},
"required": ["url"]
}
),
Tool(
name="extract_fields",
description="Extract structured data fields from a webpage using AI",
inputSchema={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to extract data from"
},
"fields": {
"type": "object",
"description": "Dictionary of field names and their descriptions",
"additionalProperties": {
"type": "string"
}
}
},
"required": ["url", "fields"]
}
),
Tool(
name="ask_question",
description="Ask a natural language question about webpage content",
inputSchema={
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The webpage URL"
},
"question": {
"type": "string",
"description": "Your question about the page content"
}
},
"required": ["url", "question"]
}
)
]
# Implement the tool execution logic
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
"""Execute the requested scraping tool"""
if name == "scrape_html":
async with httpx.AsyncClient(timeout=60.0) as client:
params = {
"url": arguments["url"],
"api_key": WEBSCRAPING_AI_KEY,
"js": arguments.get("js", True)
}
if "wait_for" in arguments:
params["wait_for"] = arguments["wait_for"]
if "timeout" in arguments:
params["timeout"] = arguments["timeout"]
response = await client.get(
"https://api.webscraping.ai/html",
params=params
)
return [TextContent(
type="text",
text=response.text
)]
elif name == "extract_fields":
async with httpx.AsyncClient(timeout=60.0) as client:
response = await client.post(
"https://api.webscraping.ai/fields",
params={
"url": arguments["url"],
"api_key": WEBSCRAPING_AI_KEY
},
json={"fields": arguments["fields"]}
)
result = response.json()
return [TextContent(
type="text",
text=f"Extracted Data:\n{result}"
)]
elif name == "ask_question":
async with httpx.AsyncClient(timeout=60.0) as client:
response = await client.post(
"https://api.webscraping.ai/question",
params={
"url": arguments["url"],
"api_key": WEBSCRAPING_AI_KEY
},
json={"question": arguments["question"]}
)
result = response.json()
return [TextContent(
type="text",
text=f"Answer: {result.get('answer', result)}"
)]
raise ValueError(f"Unknown tool: {name}")
# Start the MCP server
async def main():
async with stdio_server() as (read_stream, write_stream):
await app.run(read_stream, write_stream)
if __name__ == "__main__":
asyncio.run(main())
JavaScript/TypeScript Implementation
For Node.js developers, here's the equivalent implementation:
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import fetch from "node-fetch";
// Create MCP server instance
const server = new Server(
{
name: "webscraping-ai-mcp",
version: "1.0.0",
},
{
capabilities: {
tools: {},
},
}
);
const API_KEY = process.env.WEBSCRAPING_AI_API_KEY;
const BASE_URL = "https://api.webscraping.ai";
// Register available scraping tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
{
name: "scrape_html",
description: "Scrape HTML content with JavaScript rendering",
inputSchema: {
type: "object",
properties: {
url: {
type: "string",
description: "URL to scrape",
},
js: {
type: "boolean",
description: "Enable JavaScript rendering",
default: true,
},
wait_for: {
type: "string",
description: "CSS selector to wait for",
},
},
required: ["url"],
},
},
{
name: "extract_text",
description: "Extract clean text content from a webpage",
inputSchema: {
type: "object",
properties: {
url: {
type: "string",
description: "URL to extract text from",
},
return_links: {
type: "boolean",
description: "Include links in the response",
default: false,
},
},
required: ["url"],
},
},
{
name: "scrape_selected",
description: "Extract content from specific CSS selectors",
inputSchema: {
type: "object",
properties: {
url: {
type: "string",
description: "Target URL",
},
selector: {
type: "string",
description: "CSS selector to extract",
},
},
required: ["url", "selector"],
},
},
],
};
});
// Handle tool execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params;
try {
if (name === "scrape_html") {
const params = new URLSearchParams({
url: args.url,
api_key: API_KEY,
js: args.js !== false ? "true" : "false",
});
if (args.wait_for) {
params.append("wait_for", args.wait_for);
}
const response = await fetch(`${BASE_URL}/html?${params}`);
const html = await response.text();
return {
content: [
{
type: "text",
text: html,
},
],
};
}
if (name === "extract_text") {
const params = new URLSearchParams({
url: args.url,
api_key: API_KEY,
return_links: args.return_links ? "true" : "false",
});
const response = await fetch(`${BASE_URL}/text?${params}`);
const data = await response.json();
return {
content: [
{
type: "text",
text: JSON.stringify(data, null, 2),
},
],
};
}
if (name === "scrape_selected") {
const params = new URLSearchParams({
url: args.url,
api_key: API_KEY,
selector: args.selector,
});
const response = await fetch(`${BASE_URL}/selected?${params}`);
const data = await response.json();
return {
content: [
{
type: "text",
text: JSON.stringify(data, null, 2),
},
],
};
}
throw new Error(`Unknown tool: ${name}`);
} catch (error) {
return {
content: [
{
type: "text",
text: `Error: ${error.message}`,
},
],
isError: true,
};
}
});
// Start the server
async function main() {
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("WebScraping.AI MCP Server running");
}
main().catch(console.error);
Installing and Configuring Your MCP Server
Step 1: Install Dependencies
For Python:
pip install mcp httpx
For Node.js:
npm install @modelcontextprotocol/sdk node-fetch
Step 2: Save Your Server Code
Save the Python code as webscraping_mcp_server.py
or the JavaScript code as server.js
.
Step 3: Configure Claude Desktop
To connect your MCP server to Claude Desktop, edit the configuration file:
On macOS:
Edit ~/Library/Application Support/Claude/claude_desktop_config.json
On Windows:
Edit %APPDATA%\Claude\claude_desktop_config.json
Add your server configuration:
{
"mcpServers": {
"webscraping-ai": {
"command": "python",
"args": ["/absolute/path/to/webscraping_mcp_server.py"],
"env": {
"WEBSCRAPING_AI_API_KEY": "your_api_key_here"
}
}
}
}
For Node.js servers:
{
"mcpServers": {
"webscraping-ai": {
"command": "node",
"args": ["/absolute/path/to/server.js"],
"env": {
"WEBSCRAPING_AI_API_KEY": "your_api_key_here"
}
}
}
}
Step 4: Restart Claude Desktop
After saving the configuration, restart Claude Desktop. Your MCP server will automatically start when Claude launches.
Using Your MCP Server with Claude
Once configured, you can interact with your scraping tools through natural language:
Example 1: Simple HTML Scraping ``` User: "Use the scrape_html tool to get the content from https://example.com"
Claude will execute the tool and return the HTML content. ```
Example 2: Structured Data Extraction ``` User: "Extract the product name, price, and description from https://shop.example.com/product/123"
Claude uses the extract_fields tool with appropriate field definitions. ```
Example 3: Question Answering ``` User: "What is the main topic discussed on https://blog.example.com/latest-post?"
Claude uses the ask_question tool to analyze the content. ```
Advanced MCP Features for Web Scraping
Adding Resources
MCP servers can expose scraped data as resources that AI assistants can read:
from mcp.types import Resource
@app.list_resources()
async def list_resources() -> list[Resource]:
return [
Resource(
uri="scraping://cache/latest",
name="Latest scraping results",
mimeType="application/json"
)
]
@app.read_resource()
async def read_resource(uri: str) -> str:
# Return cached scraping data
if uri == "scraping://cache/latest":
return json.dumps(cached_results)
raise ValueError(f"Unknown resource: {uri}")
Implementing Prompts
Prompts are reusable templates for common scraping workflows:
from mcp.types import Prompt, PromptMessage
@app.list_prompts()
async def list_prompts() -> list[Prompt]:
return [
Prompt(
name="monitor_price",
description="Monitor product prices on e-commerce sites",
arguments=[
{"name": "product_url", "description": "URL of the product page", "required": True}
]
)
]
@app.get_prompt()
async def get_prompt(name: str, arguments: dict) -> list[PromptMessage]:
if name == "monitor_price":
return [
PromptMessage(
role="user",
content=f"Please scrape {arguments['product_url']} and extract the current price, product name, and availability status."
)
]
Real-World Use Cases
1. E-commerce Price Monitoring
Create an MCP server that monitors product prices across multiple retailers, similar to how you would handle browser sessions in Puppeteer for authenticated scraping:
@app.call_tool()
async def call_tool(name: str, arguments: dict):
if name == "monitor_competitors":
urls = arguments["competitor_urls"]
results = []
async with httpx.AsyncClient() as client:
for url in urls:
response = await client.post(
"https://api.webscraping.ai/fields",
params={"url": url, "api_key": API_KEY},
json={
"fields": {
"product_name": "Name of the product",
"price": "Current price",
"stock_status": "In stock or out of stock"
}
}
)
results.append(response.json())
return [TextContent(type="text", text=json.dumps(results, indent=2))]
2. Content Aggregation Pipeline
Build a content aggregation system that scrapes multiple news sources:
server.setRequestHandler(CallToolRequestSchema, async (request) => {
if (request.params.name === "aggregate_articles") {
const sources = request.params.arguments.sources;
const articles = [];
for (const source of sources) {
const response = await fetch(
`${BASE_URL}/fields?url=${encodeURIComponent(source)}&api_key=${API_KEY}`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
fields: {
title: "Article headline",
author: "Author name",
date: "Publication date",
summary: "Article summary",
},
}),
}
);
const data = await response.json();
articles.push(data);
}
return {
content: [
{
type: "text",
text: JSON.stringify(articles, null, 2),
},
],
};
}
});
3. SEO and Competitive Analysis
Create tools for SEO monitoring, similar to monitoring network requests in Puppeteer:
Tool(
name="seo_audit",
description="Perform SEO analysis on a webpage",
inputSchema={
"type": "object",
"properties": {
"url": {"type": "string"},
"checks": {
"type": "array",
"items": {"type": "string"},
"description": "List of SEO elements to check (title, meta, headings, etc.)"
}
},
"required": ["url"]
}
)
Security Best Practices
When building MCP servers for web scraping:
1. Environment Variables
Never hardcode API keys:
import os
API_KEY = os.getenv("WEBSCRAPING_AI_API_KEY")
if not API_KEY:
raise ValueError("WEBSCRAPING_AI_API_KEY environment variable not set")
2. Input Validation
Validate all URLs before scraping:
from urllib.parse import urlparse
def validate_url(url: str) -> bool:
try:
result = urlparse(url)
return all([result.scheme in ['http', 'https'], result.netloc])
except:
return False
3. Rate Limiting
Implement request throttling:
import asyncio
from datetime import datetime, timedelta
class RateLimiter:
def __init__(self, max_requests: int, time_window: int):
self.max_requests = max_requests
self.time_window = time_window
self.requests = []
async def acquire(self):
now = datetime.now()
self.requests = [r for r in self.requests if now - r < timedelta(seconds=self.time_window)]
if len(self.requests) >= self.max_requests:
wait_time = (self.requests[0] + timedelta(seconds=self.time_window) - now).total_seconds()
await asyncio.sleep(wait_time)
self.requests.append(now)
4. Error Handling
Provide clear error messages:
try {
const response = await fetch(apiUrl);
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
return await response.text();
} catch (error) {
return {
content: [
{
type: "text",
text: `Scraping failed: ${error.message}`,
},
],
isError: true,
};
}
Testing Your MCP Server
Create a test script to verify functionality:
# test_mcp_server.py
import asyncio
from mcp.client import Client
from mcp.client.stdio import stdio_client
async def test_server():
async with stdio_client(
server_params={
"command": "python",
"args": ["webscraping_mcp_server.py"],
"env": {"WEBSCRAPING_AI_API_KEY": "your_key"}
}
) as (read, write):
async with Client(read, write) as client:
# List available tools
tools = await client.list_tools()
print(f"Available tools: {[t.name for t in tools]}")
# Test scraping
result = await client.call_tool(
"scrape_html",
{"url": "https://example.com"}
)
print(f"Scraping result length: {len(result[0].text)}")
asyncio.run(test_server())
Troubleshooting Common Issues
MCP Server Not Connecting
- Verify the path in
claude_desktop_config.json
is absolute - Check that all dependencies are installed
- Review Claude Desktop logs for error messages
- Ensure environment variables are properly set
Scraping Timeouts
- Increase timeout values in API requests
- Use the
wait_for
parameter for dynamic content - Consider using proxies for geo-restricted content
API Rate Limits
- Implement proper rate limiting in your server
- Cache results when possible
- Use the WebScraping.AI API efficiently
Conclusion
Anthropic MCP revolutionizes web scraping by providing a standardized way to connect AI assistants with scraping tools and data sources. By building MCP servers, you create reusable, AI-accessible scraping capabilities that can be controlled through natural language, making complex data extraction workflows more intuitive and maintainable.
Whether you're monitoring competitor prices, aggregating content from multiple sources, or building custom data extraction pipelines, MCP provides the infrastructure to seamlessly integrate web scraping with AI intelligence. The combination of Anthropic's Claude and WebScraping.AI's robust scraping capabilities, connected through MCP, offers a powerful solution for modern data extraction challenges.
Start building your first MCP server today, and transform how you approach AI-powered web scraping automation.