Table of contents

What is Anthropic MCP and How Do I Use It?

Anthropic MCP (Model Context Protocol) is an open-source protocol developed by Anthropic that enables AI applications like Claude to connect with external data sources, tools, and services through a standardized interface. For web scraping developers, MCP provides a revolutionary way to build AI-powered scraping workflows by allowing language models to directly interact with scraping APIs, browser automation tools, and data storage systems.

MCP transforms how developers approach web scraping by bridging the gap between AI intelligence and practical scraping tools. Instead of writing complex custom integrations, you can create MCP servers that expose scraping capabilities through a unified protocol that works across different AI assistants and applications.

Understanding Anthropic MCP

Anthropic introduced MCP in November 2024 as part of their vision to make AI assistants more useful by connecting them to real-world data and tools. The protocol is designed with three core principles:

1. Standardization

MCP provides a common language for AI-tool communication. Whether you're building with Python, JavaScript, or any other language, the protocol remains consistent, ensuring your scraping tools work with any MCP-compatible AI application.

2. Security

MCP implements strict security boundaries between AI applications and external services. Your scraping credentials, API keys, and sensitive data remain protected through proper isolation and authentication mechanisms.

3. Modularity

Build once, use everywhere. An MCP server you create for web scraping can be shared, reused, and integrated into different projects without modification.

MCP Architecture for Web Scraping

The MCP architecture consists of three key components working together:

┌──────────────────┐         ┌──────────────────┐         ┌──────────────────┐
│   AI Assistant   │         │   MCP Server     │         │  Scraping Tools  │
│   (Claude)       │◄───────►│   (Your Code)    │◄───────►│  APIs/Browsers   │
└──────────────────┘   MCP   └──────────────────┘  HTTP   └──────────────────┘
                                      │
                                      ▼
                             ┌──────────────────┐
                             │  Data Storage    │
                             │  Database/Files  │
                             └──────────────────┘

MCP Hosts: Applications like Claude Desktop that embed AI models and initiate connections to MCP servers.

MCP Servers: Lightweight programs you build that expose web scraping capabilities (tools, resources, and prompts).

External Services: The actual scraping infrastructure—APIs like WebScraping.AI, browser automation with Puppeteer/Playwright, or databases.

Building Your First MCP Server for Web Scraping

Let's create a practical MCP server that exposes web scraping capabilities to Claude or other AI assistants.

Python Implementation

Here's a complete Python MCP server using the official SDK:

import asyncio
import os
import httpx
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent

# Initialize the MCP server
app = Server("webscraping-ai-mcp")

# Store your API key securely
WEBSCRAPING_AI_KEY = os.getenv("WEBSCRAPING_AI_API_KEY")

# Define the tools available to AI assistants
@app.list_tools()
async def list_tools() -> list[Tool]:
    """Register web scraping tools with the MCP server"""
    return [
        Tool(
            name="scrape_html",
            description="Extract HTML content from any URL with JavaScript rendering support",
            inputSchema={
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The target URL to scrape"
                    },
                    "js": {
                        "type": "boolean",
                        "description": "Enable JavaScript rendering (default: true)",
                        "default": True
                    },
                    "wait_for": {
                        "type": "string",
                        "description": "CSS selector to wait for before returning content"
                    },
                    "timeout": {
                        "type": "integer",
                        "description": "Maximum wait time in milliseconds",
                        "default": 15000
                    }
                },
                "required": ["url"]
            }
        ),
        Tool(
            name="extract_fields",
            description="Extract structured data fields from a webpage using AI",
            inputSchema={
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The URL to extract data from"
                    },
                    "fields": {
                        "type": "object",
                        "description": "Dictionary of field names and their descriptions",
                        "additionalProperties": {
                            "type": "string"
                        }
                    }
                },
                "required": ["url", "fields"]
            }
        ),
        Tool(
            name="ask_question",
            description="Ask a natural language question about webpage content",
            inputSchema={
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The webpage URL"
                    },
                    "question": {
                        "type": "string",
                        "description": "Your question about the page content"
                    }
                },
                "required": ["url", "question"]
            }
        )
    ]

# Implement the tool execution logic
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    """Execute the requested scraping tool"""

    if name == "scrape_html":
        async with httpx.AsyncClient(timeout=60.0) as client:
            params = {
                "url": arguments["url"],
                "api_key": WEBSCRAPING_AI_KEY,
                "js": arguments.get("js", True)
            }

            if "wait_for" in arguments:
                params["wait_for"] = arguments["wait_for"]
            if "timeout" in arguments:
                params["timeout"] = arguments["timeout"]

            response = await client.get(
                "https://api.webscraping.ai/html",
                params=params
            )

            return [TextContent(
                type="text",
                text=response.text
            )]

    elif name == "extract_fields":
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.post(
                "https://api.webscraping.ai/fields",
                params={
                    "url": arguments["url"],
                    "api_key": WEBSCRAPING_AI_KEY
                },
                json={"fields": arguments["fields"]}
            )

            result = response.json()
            return [TextContent(
                type="text",
                text=f"Extracted Data:\n{result}"
            )]

    elif name == "ask_question":
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.post(
                "https://api.webscraping.ai/question",
                params={
                    "url": arguments["url"],
                    "api_key": WEBSCRAPING_AI_KEY
                },
                json={"question": arguments["question"]}
            )

            result = response.json()
            return [TextContent(
                type="text",
                text=f"Answer: {result.get('answer', result)}"
            )]

    raise ValueError(f"Unknown tool: {name}")

# Start the MCP server
async def main():
    async with stdio_server() as (read_stream, write_stream):
        await app.run(read_stream, write_stream)

if __name__ == "__main__":
    asyncio.run(main())

JavaScript/TypeScript Implementation

For Node.js developers, here's the equivalent implementation:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import fetch from "node-fetch";

// Create MCP server instance
const server = new Server(
  {
    name: "webscraping-ai-mcp",
    version: "1.0.0",
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

const API_KEY = process.env.WEBSCRAPING_AI_API_KEY;
const BASE_URL = "https://api.webscraping.ai";

// Register available scraping tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: "scrape_html",
        description: "Scrape HTML content with JavaScript rendering",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "URL to scrape",
            },
            js: {
              type: "boolean",
              description: "Enable JavaScript rendering",
              default: true,
            },
            wait_for: {
              type: "string",
              description: "CSS selector to wait for",
            },
          },
          required: ["url"],
        },
      },
      {
        name: "extract_text",
        description: "Extract clean text content from a webpage",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "URL to extract text from",
            },
            return_links: {
              type: "boolean",
              description: "Include links in the response",
              default: false,
            },
          },
          required: ["url"],
        },
      },
      {
        name: "scrape_selected",
        description: "Extract content from specific CSS selectors",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "Target URL",
            },
            selector: {
              type: "string",
              description: "CSS selector to extract",
            },
          },
          required: ["url", "selector"],
        },
      },
    ],
  };
});

// Handle tool execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    if (name === "scrape_html") {
      const params = new URLSearchParams({
        url: args.url,
        api_key: API_KEY,
        js: args.js !== false ? "true" : "false",
      });

      if (args.wait_for) {
        params.append("wait_for", args.wait_for);
      }

      const response = await fetch(`${BASE_URL}/html?${params}`);
      const html = await response.text();

      return {
        content: [
          {
            type: "text",
            text: html,
          },
        ],
      };
    }

    if (name === "extract_text") {
      const params = new URLSearchParams({
        url: args.url,
        api_key: API_KEY,
        return_links: args.return_links ? "true" : "false",
      });

      const response = await fetch(`${BASE_URL}/text?${params}`);
      const data = await response.json();

      return {
        content: [
          {
            type: "text",
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }

    if (name === "scrape_selected") {
      const params = new URLSearchParams({
        url: args.url,
        api_key: API_KEY,
        selector: args.selector,
      });

      const response = await fetch(`${BASE_URL}/selected?${params}`);
      const data = await response.json();

      return {
        content: [
          {
            type: "text",
            text: JSON.stringify(data, null, 2),
          },
        ],
      };
    }

    throw new Error(`Unknown tool: ${name}`);
  } catch (error) {
    return {
      content: [
        {
          type: "text",
          text: `Error: ${error.message}`,
        },
      ],
      isError: true,
    };
  }
});

// Start the server
async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("WebScraping.AI MCP Server running");
}

main().catch(console.error);

Installing and Configuring Your MCP Server

Step 1: Install Dependencies

For Python:

pip install mcp httpx

For Node.js:

npm install @modelcontextprotocol/sdk node-fetch

Step 2: Save Your Server Code

Save the Python code as webscraping_mcp_server.py or the JavaScript code as server.js.

Step 3: Configure Claude Desktop

To connect your MCP server to Claude Desktop, edit the configuration file:

On macOS: Edit ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: Edit %APPDATA%\Claude\claude_desktop_config.json

Add your server configuration:

{
  "mcpServers": {
    "webscraping-ai": {
      "command": "python",
      "args": ["/absolute/path/to/webscraping_mcp_server.py"],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "your_api_key_here"
      }
    }
  }
}

For Node.js servers:

{
  "mcpServers": {
    "webscraping-ai": {
      "command": "node",
      "args": ["/absolute/path/to/server.js"],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "your_api_key_here"
      }
    }
  }
}

Step 4: Restart Claude Desktop

After saving the configuration, restart Claude Desktop. Your MCP server will automatically start when Claude launches.

Using Your MCP Server with Claude

Once configured, you can interact with your scraping tools through natural language:

Example 1: Simple HTML Scraping ``` User: "Use the scrape_html tool to get the content from https://example.com"

Claude will execute the tool and return the HTML content. ```

Example 2: Structured Data Extraction ``` User: "Extract the product name, price, and description from https://shop.example.com/product/123"

Claude uses the extract_fields tool with appropriate field definitions. ```

Example 3: Question Answering ``` User: "What is the main topic discussed on https://blog.example.com/latest-post?"

Claude uses the ask_question tool to analyze the content. ```

Advanced MCP Features for Web Scraping

Adding Resources

MCP servers can expose scraped data as resources that AI assistants can read:

from mcp.types import Resource

@app.list_resources()
async def list_resources() -> list[Resource]:
    return [
        Resource(
            uri="scraping://cache/latest",
            name="Latest scraping results",
            mimeType="application/json"
        )
    ]

@app.read_resource()
async def read_resource(uri: str) -> str:
    # Return cached scraping data
    if uri == "scraping://cache/latest":
        return json.dumps(cached_results)
    raise ValueError(f"Unknown resource: {uri}")

Implementing Prompts

Prompts are reusable templates for common scraping workflows:

from mcp.types import Prompt, PromptMessage

@app.list_prompts()
async def list_prompts() -> list[Prompt]:
    return [
        Prompt(
            name="monitor_price",
            description="Monitor product prices on e-commerce sites",
            arguments=[
                {"name": "product_url", "description": "URL of the product page", "required": True}
            ]
        )
    ]

@app.get_prompt()
async def get_prompt(name: str, arguments: dict) -> list[PromptMessage]:
    if name == "monitor_price":
        return [
            PromptMessage(
                role="user",
                content=f"Please scrape {arguments['product_url']} and extract the current price, product name, and availability status."
            )
        ]

Real-World Use Cases

1. E-commerce Price Monitoring

Create an MCP server that monitors product prices across multiple retailers, similar to how you would handle browser sessions in Puppeteer for authenticated scraping:

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "monitor_competitors":
        urls = arguments["competitor_urls"]
        results = []

        async with httpx.AsyncClient() as client:
            for url in urls:
                response = await client.post(
                    "https://api.webscraping.ai/fields",
                    params={"url": url, "api_key": API_KEY},
                    json={
                        "fields": {
                            "product_name": "Name of the product",
                            "price": "Current price",
                            "stock_status": "In stock or out of stock"
                        }
                    }
                )
                results.append(response.json())

        return [TextContent(type="text", text=json.dumps(results, indent=2))]

2. Content Aggregation Pipeline

Build a content aggregation system that scrapes multiple news sources:

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "aggregate_articles") {
    const sources = request.params.arguments.sources;
    const articles = [];

    for (const source of sources) {
      const response = await fetch(
        `${BASE_URL}/fields?url=${encodeURIComponent(source)}&api_key=${API_KEY}`,
        {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({
            fields: {
              title: "Article headline",
              author: "Author name",
              date: "Publication date",
              summary: "Article summary",
            },
          }),
        }
      );

      const data = await response.json();
      articles.push(data);
    }

    return {
      content: [
        {
          type: "text",
          text: JSON.stringify(articles, null, 2),
        },
      ],
    };
  }
});

3. SEO and Competitive Analysis

Create tools for SEO monitoring, similar to monitoring network requests in Puppeteer:

Tool(
    name="seo_audit",
    description="Perform SEO analysis on a webpage",
    inputSchema={
        "type": "object",
        "properties": {
            "url": {"type": "string"},
            "checks": {
                "type": "array",
                "items": {"type": "string"},
                "description": "List of SEO elements to check (title, meta, headings, etc.)"
            }
        },
        "required": ["url"]
    }
)

Security Best Practices

When building MCP servers for web scraping:

1. Environment Variables

Never hardcode API keys:

import os

API_KEY = os.getenv("WEBSCRAPING_AI_API_KEY")
if not API_KEY:
    raise ValueError("WEBSCRAPING_AI_API_KEY environment variable not set")

2. Input Validation

Validate all URLs before scraping:

from urllib.parse import urlparse

def validate_url(url: str) -> bool:
    try:
        result = urlparse(url)
        return all([result.scheme in ['http', 'https'], result.netloc])
    except:
        return False

3. Rate Limiting

Implement request throttling:

import asyncio
from datetime import datetime, timedelta

class RateLimiter:
    def __init__(self, max_requests: int, time_window: int):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = []

    async def acquire(self):
        now = datetime.now()
        self.requests = [r for r in self.requests if now - r < timedelta(seconds=self.time_window)]

        if len(self.requests) >= self.max_requests:
            wait_time = (self.requests[0] + timedelta(seconds=self.time_window) - now).total_seconds()
            await asyncio.sleep(wait_time)

        self.requests.append(now)

4. Error Handling

Provide clear error messages:

try {
  const response = await fetch(apiUrl);
  if (!response.ok) {
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }
  return await response.text();
} catch (error) {
  return {
    content: [
      {
        type: "text",
        text: `Scraping failed: ${error.message}`,
      },
    ],
    isError: true,
  };
}

Testing Your MCP Server

Create a test script to verify functionality:

# test_mcp_server.py
import asyncio
from mcp.client import Client
from mcp.client.stdio import stdio_client

async def test_server():
    async with stdio_client(
        server_params={
            "command": "python",
            "args": ["webscraping_mcp_server.py"],
            "env": {"WEBSCRAPING_AI_API_KEY": "your_key"}
        }
    ) as (read, write):
        async with Client(read, write) as client:
            # List available tools
            tools = await client.list_tools()
            print(f"Available tools: {[t.name for t in tools]}")

            # Test scraping
            result = await client.call_tool(
                "scrape_html",
                {"url": "https://example.com"}
            )
            print(f"Scraping result length: {len(result[0].text)}")

asyncio.run(test_server())

Troubleshooting Common Issues

MCP Server Not Connecting

  • Verify the path in claude_desktop_config.json is absolute
  • Check that all dependencies are installed
  • Review Claude Desktop logs for error messages
  • Ensure environment variables are properly set

Scraping Timeouts

  • Increase timeout values in API requests
  • Use the wait_for parameter for dynamic content
  • Consider using proxies for geo-restricted content

API Rate Limits

  • Implement proper rate limiting in your server
  • Cache results when possible
  • Use the WebScraping.AI API efficiently

Conclusion

Anthropic MCP revolutionizes web scraping by providing a standardized way to connect AI assistants with scraping tools and data sources. By building MCP servers, you create reusable, AI-accessible scraping capabilities that can be controlled through natural language, making complex data extraction workflows more intuitive and maintainable.

Whether you're monitoring competitor prices, aggregating content from multiple sources, or building custom data extraction pipelines, MCP provides the infrastructure to seamlessly integrate web scraping with AI intelligence. The combination of Anthropic's Claude and WebScraping.AI's robust scraping capabilities, connected through MCP, offers a powerful solution for modern data extraction challenges.

Start building your first MCP server today, and transform how you approach AI-powered web scraping automation.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon