Table of contents

How do I Find a List of Available MCP Servers?

Finding Model Context Protocol (MCP) servers for web scraping and automation can be done through several official and community resources. MCP servers provide specialized tools and capabilities that can be integrated into your scraping workflows, from browser automation to data extraction.

Official MCP Server Directory

The primary resource for finding MCP servers is the official Anthropic MCP servers repository on GitHub. This repository contains a curated list of first-party and community MCP servers.

Browsing the Official Repository

You can explore available servers by visiting the repository directly:

# Clone the official MCP servers repository
git clone https://github.com/modelcontextprotocol/servers.git
cd servers

# List all available server directories
ls -la src/

The repository is organized with each server in its own directory under the src/ folder. Common servers for web scraping include:

  • Playwright MCP Server - Browser automation and web scraping
  • Puppeteer MCP Server - Headless Chrome automation
  • Fetch MCP Server - HTTP requests and API interactions
  • Memory MCP Server - State management across scraping sessions
  • Filesystem MCP Server - File operations for saving scraped data

Using the MCP CLI to Discover Servers

The MCP SDK includes command-line tools for discovering and managing servers:

# Install the MCP CLI globally
npm install -g @modelcontextprotocol/cli

# List installed MCP servers
mcp list

# Search for available MCP servers
mcp search web-scraping
mcp search browser
mcp search automation

Installing Servers from NPM

Many MCP servers are published as npm packages, making them easy to discover and install:

# Search for MCP servers on npm
npm search @modelcontextprotocol

# Install a specific MCP server
npm install @modelcontextprotocol/server-playwright

# Install the Puppeteer MCP server for browser automation
npm install @modelcontextprotocol/server-puppeteer

Configuring MCP Servers in Claude Desktop

Once you've identified servers you want to use, configure them in your Claude Desktop settings:

{
  "mcpServers": {
    "playwright": {
      "command": "node",
      "args": [
        "/path/to/node_modules/@modelcontextprotocol/server-playwright/dist/index.js"
      ]
    },
    "puppeteer": {
      "command": "node",
      "args": [
        "/path/to/node_modules/@modelcontextprotocol/server-puppeteer/dist/index.js"
      ]
    },
    "webscraping-ai": {
      "command": "npx",
      "args": [
        "-y",
        "@drakula2k/webscraping-ai-mcp-server"
      ],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Programmatically Listing Available MCP Tools

After connecting to an MCP server, you can programmatically list available tools using the MCP SDK:

Python Example

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def list_mcp_tools(server_command, server_args):
    """List all tools available from an MCP server"""
    server_params = StdioServerParameters(
        command=server_command,
        args=server_args
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # List all available tools
            tools = await session.list_tools()

            print(f"Available tools from {server_command}:")
            for tool in tools.tools:
                print(f"\n- {tool.name}")
                print(f"  Description: {tool.description}")
                print(f"  Input schema: {tool.inputSchema}")

            return tools

# Example: List tools from Playwright MCP server
import asyncio

asyncio.run(list_mcp_tools(
    "node",
    ["node_modules/@modelcontextprotocol/server-playwright/dist/index.js"]
))

JavaScript Example

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

async function listMcpTools(serverCommand, serverArgs) {
  // Create transport for stdio communication
  const transport = new StdioClientTransport({
    command: serverCommand,
    args: serverArgs
  });

  // Create MCP client
  const client = new Client({
    name: "mcp-tool-lister",
    version: "1.0.0"
  }, {
    capabilities: {}
  });

  await client.connect(transport);

  // List all available tools
  const tools = await client.listTools();

  console.log(`Available tools from ${serverCommand}:`);
  tools.tools.forEach(tool => {
    console.log(`\n- ${tool.name}`);
    console.log(`  Description: ${tool.description}`);
    console.log(`  Input schema:`, JSON.stringify(tool.inputSchema, null, 2));
  });

  await client.close();
  return tools;
}

// Example: List tools from Puppeteer MCP server
listMcpTools(
  "node",
  ["node_modules/@modelcontextprotocol/server-puppeteer/dist/index.js"]
).catch(console.error);

Community MCP Server Resources

Beyond the official repository, you can find community-built MCP servers through several channels:

GitHub Search

Use GitHub's search functionality to discover MCP servers:

# Search GitHub for MCP server repositories
# Visit: https://github.com/search?q=mcp+server+web+scraping
# Or: https://github.com/search?q=modelcontextprotocol+server

Popular Community MCP Servers for Web Scraping

  • WebScraping.AI MCP Server - Full-featured web scraping with proxy support and AI extraction
  • Selenium MCP Server - WebDriver-based browser automation
  • Axios MCP Server - HTTP client for API requests
  • Cheerio MCP Server - Fast HTML parsing and manipulation
  • BeautifulSoup MCP Server - Python HTML/XML parsing

Listing MCP Resources Programmatically

MCP servers can expose resources (like URLs, files, or data sources) that can be listed:

async def list_mcp_resources(server_command, server_args):
    """List all resources available from an MCP server"""
    server_params = StdioServerParameters(
        command=server_command,
        args=server_args
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # List all available resources
            resources = await session.list_resources()

            print(f"Available resources from {server_command}:")
            for resource in resources.resources:
                print(f"\n- {resource.uri}")
                print(f"  Name: {resource.name}")
                print(f"  Description: {resource.description}")
                print(f"  MIME type: {resource.mimeType}")

            return resources

Testing MCP Server Availability

Before integrating an MCP server into your workflow, test its availability and functionality:

# Test MCP server connection
node test-mcp-server.js

Create a test script (test-mcp-server.js):

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

async function testMcpServer(serverCommand, serverArgs) {
  try {
    const transport = new StdioClientTransport({
      command: serverCommand,
      args: serverArgs
    });

    const client = new Client({
      name: "mcp-tester",
      version: "1.0.0"
    }, {
      capabilities: {}
    });

    await client.connect(transport);
    console.log("✓ Successfully connected to MCP server");

    const tools = await client.listTools();
    console.log(`✓ Server provides ${tools.tools.length} tools`);

    const resources = await client.listResources();
    console.log(`✓ Server provides ${resources.resources.length} resources`);

    await client.close();
    console.log("✓ Test completed successfully");

    return true;
  } catch (error) {
    console.error("✗ MCP server test failed:", error);
    return false;
  }
}

// Test Playwright MCP server
testMcpServer(
  "node",
  ["node_modules/@modelcontextprotocol/server-playwright/dist/index.js"]
);

MCP Server Discovery Best Practices

When searching for MCP servers for web scraping tasks, consider these best practices:

  1. Check Server Maintenance - Verify the repository is actively maintained with recent commits
  2. Review Documentation - Ensure the server has clear documentation and examples
  3. Test Locally First - Always test servers in a development environment before production use
  4. Check Dependencies - Review the server's dependencies and security status
  5. Community Support - Look for servers with active community support and issue resolution

Integration with Web Scraping Workflows

Once you've found suitable MCP servers, integrate them into your scraping workflows. For browser automation tasks, handling browser sessions in Puppeteer and monitoring network requests in Puppeteer become easier when using MCP-powered tools.

Creating a Custom MCP Server Registry

For organizations managing multiple MCP servers, consider creating a custom registry:

// mcp-registry.json
{
  "servers": [
    {
      "name": "playwright-scraper",
      "description": "Browser automation with Playwright",
      "category": "browser-automation",
      "command": "node",
      "args": ["./servers/playwright/index.js"],
      "tags": ["browser", "scraping", "automation"]
    },
    {
      "name": "webscraping-ai",
      "description": "AI-powered web scraping API",
      "category": "api-scraping",
      "command": "npx",
      "args": ["-y", "@drakula2k/webscraping-ai-mcp-server"],
      "tags": ["api", "ai", "extraction"],
      "requiresApiKey": true
    }
  ]
}

Load and use the registry:

import fs from 'fs/promises';

async function loadMcpRegistry(registryPath) {
  const registry = JSON.parse(
    await fs.readFile(registryPath, 'utf-8')
  );

  // Filter servers by category or tags
  const browserServers = registry.servers.filter(
    s => s.category === 'browser-automation'
  );

  const apiServers = registry.servers.filter(
    s => s.tags.includes('api')
  );

  return {
    all: registry.servers,
    browser: browserServers,
    api: apiServers
  };
}

// Use the registry
const servers = await loadMcpRegistry('./mcp-registry.json');
console.log('Available browser automation servers:', servers.browser);

Conclusion

Finding and using MCP servers for web scraping involves exploring the official repository, community resources, and npm packages. By leveraging the MCP SDK's discovery tools and programmatically listing available tools and resources, you can build powerful, modular scraping workflows that combine multiple specialized servers.

Start with the official Anthropic MCP servers repository, test servers locally, and gradually build your custom registry of trusted MCP servers for your specific web scraping needs. Whether you're using browser automation with Puppeteer or API-based scraping solutions, MCP servers provide a standardized way to extend your capabilities.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon