How do I use MCP server authentication in my scraper?

Integrating MCP server authentication into your web scraping workflows requires understanding how to securely pass credentials, manage authenticated sessions, and handle multi-service authentication patterns. Unlike standalone scrapers where you might hardcode API keys or manage credentials manually, MCP servers provide a structured approach to credential management that enhances security and maintainability.

This guide demonstrates practical implementation patterns for integrating authentication into your scrapers, whether you're building custom MCP servers or using existing ones to handle authenticated web scraping tasks.

Understanding MCP Authentication in Scraping Context

When you use MCP servers for web scraping, authentication operates at multiple levels:

MCP Server Access: The server itself runs as a trusted process with access to environment variables containing credentials
Scraping API Authentication: Your scraper authenticates with services like WebScraping.AI using API keys
Target Website Authentication: Scrapers may need to pass cookies, tokens, or credentials to access protected content
Proxy Authentication: Requests may route through authenticated proxy services

The key advantage of using MCP servers is centralizing credential management while allowing your scraping tools to access them securely.

Basic MCP Server Authentication Setup

Python Scraper with MCP Authentication

Here's a complete example of a Python-based MCP server that implements authentication for web scraping:

import os
import asyncio
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
import httpx
from typing import Dict, Optional

class AuthenticatedScraper:
    """Web scraper with MCP-managed authentication"""

    def __init__(self):
        # Load credentials from environment (set via MCP config)
        self.api_key = os.environ.get("WEBSCRAPING_AI_API_KEY")
        self.proxy_username = os.environ.get("PROXY_USERNAME")
        self.proxy_password = os.environ.get("PROXY_PASSWORD")

        if not self.api_key:
            raise ValueError(
                "WEBSCRAPING_AI_API_KEY must be set in MCP server config"
            )

    async def scrape_html(
        self,
        url: str,
        use_proxy: bool = False,
        wait_for: Optional[str] = None,
        headers: Optional[Dict[str, str]] = None
    ) -> str:
        """
        Scrape HTML with authenticated API access

        Args:
            url: Target URL to scrape
            use_proxy: Whether to use authenticated proxy
            wait_for: CSS selector to wait for
            headers: Custom headers for request

        Returns:
            Scraped HTML content
        """
        params = {
            "url": url,
            "api_key": self.api_key,
            "js": "true"
        }

        # Add proxy authentication if enabled
        if use_proxy and self.proxy_username and self.proxy_password:
            params["proxy"] = "residential"
            params["proxy_username"] = self.proxy_username
            params["proxy_password"] = self.proxy_password

        # Add wait condition
        if wait_for:
            params["wait_for"] = wait_for

        # Add custom headers
        if headers:
            params["headers"] = headers

        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.get(
                "https://api.webscraping.ai/html",
                params=params
            )

            response.raise_for_status()
            return response.text

    async def scrape_with_cookies(
        self,
        url: str,
        cookies: Dict[str, str]
    ) -> str:
        """
        Scrape authenticated pages using session cookies

        Args:
            url: Target URL requiring authentication
            cookies: Session cookies for authentication

        Returns:
            Scraped HTML content
        """
        async with httpx.AsyncClient(timeout=60.0) as client:
            response = await client.post(
                "https://api.webscraping.ai/html",
                params={
                    "url": url,
                    "api_key": self.api_key,
                    "js": "true"
                },
                json={"cookies": cookies}
            )

            response.raise_for_status()
            return response.text

# Initialize MCP server with authentication
app = Server("authenticated-scraper")
scraper = AuthenticatedScraper()

@app.list_tools()
async def list_tools() -> list[Tool]:
    """Define available scraping tools"""
    return [
        Tool(
            name="scrape_page",
            description="Scrape any webpage with API authentication",
            inputSchema={
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "URL to scrape"
                    },
                    "use_proxy": {
                        "type": "boolean",
                        "description": "Use authenticated proxy (default: false)"
                    },
                    "wait_for": {
                        "type": "string",
                        "description": "CSS selector to wait for before extraction"
                    }
                },
                "required": ["url"]
            }
        ),
        Tool(
            name="scrape_authenticated_page",
            description="Scrape pages requiring login/session cookies",
            inputSchema={
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "URL requiring authentication"
                    },
                    "cookies": {
                        "type": "object",
                        "description": "Session cookies as key-value pairs"
                    }
                },
                "required": ["url", "cookies"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    """Handle tool execution with authentication"""
    try:
        if name == "scrape_page":
            html = await scraper.scrape_html(
                url=arguments["url"],
                use_proxy=arguments.get("use_proxy", False),
                wait_for=arguments.get("wait_for")
            )

            return [TextContent(
                type="text",
                text=f"Successfully scraped {arguments['url']}\n\n{html}"
            )]

        elif name == "scrape_authenticated_page":
            html = await scraper.scrape_with_cookies(
                url=arguments["url"],
                cookies=arguments["cookies"]
            )

            return [TextContent(
                type="text",
                text=f"Successfully scraped authenticated page\n\n{html}"
            )]

        else:
            raise ValueError(f"Unknown tool: {name}")

    except httpx.HTTPStatusError as e:
        return [TextContent(
            type="text",
            text=f"Scraping failed: {e.response.status_code} - {e.response.text}"
        )]
    except Exception as e:
        return [TextContent(
            type="text",
            text=f"Error: {str(e)}"
        )]

async def main():
    """Start the authenticated MCP server"""
    async with stdio_server() as (read_stream, write_stream):
        await app.run(read_stream, write_stream)

if __name__ == "__main__":
    asyncio.run(main())

JavaScript/TypeScript Scraper with MCP Authentication

For Node.js environments, implement authentication similarly:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import axios, { AxiosError } from "axios";

interface ScraperConfig {
  apiKey: string;
  proxyUsername?: string;
  proxyPassword?: string;
}

class AuthenticatedScraper {
  private config: ScraperConfig;

  constructor() {
    // Load credentials from environment
    const apiKey = process.env.WEBSCRAPING_AI_API_KEY;

    if (!apiKey) {
      throw new Error(
        "WEBSCRAPING_AI_API_KEY must be configured in MCP server environment"
      );
    }

    this.config = {
      apiKey,
      proxyUsername: process.env.PROXY_USERNAME,
      proxyPassword: process.env.PROXY_PASSWORD,
    };
  }

  async scrapeHtml(
    url: string,
    options: {
      useProxy?: boolean;
      waitFor?: string;
      timeout?: number;
      headers?: Record<string, string>;
    } = {}
  ): Promise<string> {
    const params: any = {
      url,
      api_key: this.config.apiKey,
      js: true,
      timeout: options.timeout || 30000,
    };

    // Configure proxy authentication
    if (options.useProxy && this.config.proxyUsername && this.config.proxyPassword) {
      params.proxy = "residential";
      params.proxy_username = this.config.proxyUsername;
      params.proxy_password = this.config.proxyPassword;
    }

    // Add wait condition
    if (options.waitFor) {
      params.wait_for = options.waitFor;
    }

    // Add custom headers
    if (options.headers) {
      params.headers = JSON.stringify(options.headers);
    }

    try {
      const response = await axios.get(
        "https://api.webscraping.ai/html",
        {
          params,
          timeout: 60000,
        }
      );

      return response.data;
    } catch (error) {
      if (axios.isAxiosError(error)) {
        throw new Error(
          `Scraping failed: ${error.response?.status} - ${error.response?.data}`
        );
      }
      throw error;
    }
  }

  async scrapeWithAuth(
    url: string,
    cookies: Record<string, string>
  ): Promise<string> {
    try {
      const response = await axios.post(
        "https://api.webscraping.ai/html",
        {
          cookies,
        },
        {
          params: {
            url,
            api_key: this.config.apiKey,
            js: true,
          },
          timeout: 60000,
        }
      );

      return response.data;
    } catch (error) {
      if (axios.isAxiosError(error)) {
        throw new Error(
          `Authentication failed: ${error.response?.status} - ${error.response?.data}`
        );
      }
      throw error;
    }
  }

  async extractData(url: string, fields: Record<string, string>): Promise<any> {
    try {
      const response = await axios.post(
        "https://api.webscraping.ai/fields",
        {
          fields,
        },
        {
          params: {
            url,
            api_key: this.config.apiKey,
          },
          timeout: 60000,
        }
      );

      return response.data;
    } catch (error) {
      if (axios.isAxiosError(error)) {
        throw new Error(
          `Data extraction failed: ${error.response?.status}`
        );
      }
      throw error;
    }
  }
}

// Initialize server and scraper
const server = new Server(
  {
    name: "authenticated-web-scraper",
    version: "1.0.0",
  },
  {
    capabilities: {
      tools: {},
    },
  }
);

const scraper = new AuthenticatedScraper();

// Define available tools
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
      {
        name: "scrape_webpage",
        description: "Scrape any webpage with authenticated API access",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "Target URL to scrape",
            },
            use_proxy: {
              type: "boolean",
              description: "Use authenticated residential proxy",
            },
            wait_for: {
              type: "string",
              description: "CSS selector to wait for",
            },
            timeout: {
              type: "number",
              description: "Request timeout in milliseconds",
            },
          },
          required: ["url"],
        },
      },
      {
        name: "scrape_with_cookies",
        description: "Scrape authenticated pages using session cookies",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "URL requiring authentication",
            },
            cookies: {
              type: "object",
              description: "Session cookies as key-value pairs",
            },
          },
          required: ["url", "cookies"],
        },
      },
      {
        name: "extract_fields",
        description: "Extract specific data fields using AI",
        inputSchema: {
          type: "object",
          properties: {
            url: {
              type: "string",
              description: "URL to extract data from",
            },
            fields: {
              type: "object",
              description: "Field names and extraction instructions",
            },
          },
          required: ["url", "fields"],
        },
      },
    ],
  };
});

// Handle tool execution
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  const { name, arguments: args } = request.params;

  try {
    switch (name) {
      case "scrape_webpage": {
        const html = await scraper.scrapeHtml(args.url, {
          useProxy: args.use_proxy,
          waitFor: args.wait_for,
          timeout: args.timeout,
        });

        return {
          content: [
            {
              type: "text",
              text: `Successfully scraped ${args.url}\n\n${html}`,
            },
          ],
        };
      }

      case "scrape_with_cookies": {
        const html = await scraper.scrapeWithAuth(args.url, args.cookies);

        return {
          content: [
            {
              type: "text",
              text: `Successfully scraped authenticated page\n\n${html}`,
            },
          ],
        };
      }

      case "extract_fields": {
        const data = await scraper.extractData(args.url, args.fields);

        return {
          content: [
            {
              type: "text",
              text: JSON.stringify(data, null, 2),
            },
          ],
        };
      }

      default:
        throw new Error(`Unknown tool: ${name}`);
    }
  } catch (error: any) {
    return {
      content: [
        {
          type: "text",
          text: `Error: ${error.message}`,
        },
      ],
      isError: true,
    };
  }
});

// Start the server
async function main() {
  const transport = new StdioServerTransport();
  await server.connect(transport);
  console.error("Authenticated web scraping MCP server running");
}

main().catch((error) => {
  console.error("Failed to start server:", error);
  process.exit(1);
});

MCP Server Configuration with Credentials

To use your authenticated scraper, configure the MCP client with necessary credentials:

macOS Configuration

Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "web-scraper": {
      "command": "python",
      "args": ["/path/to/authenticated_scraper.py"],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "your_api_key_here",
        "PROXY_USERNAME": "your_proxy_username",
        "PROXY_PASSWORD": "your_proxy_password"
      }
    }
  }
}

Windows Configuration

Edit %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "web-scraper": {
      "command": "node",
      "args": ["C:\\path\\to\\authenticated_scraper.js"],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "your_api_key_here",
        "PROXY_USERNAME": "your_proxy_username",
        "PROXY_PASSWORD": "your_proxy_password"
      }
    }
  }
}

Using System Environment Variables

For better security, reference system environment variables:

{
  "mcpServers": {
    "web-scraper": {
      "command": "python",
      "args": ["/path/to/scraper.py"],
      "env": {
        "WEBSCRAPING_AI_API_KEY": "${WEBSCRAPING_AI_API_KEY}",
        "PROXY_USERNAME": "${PROXY_USERNAME}",
        "PROXY_PASSWORD": "${PROXY_PASSWORD}"
      }
    }
  }
}

Set system variables:

# macOS/Linux - Add to ~/.bashrc or ~/.zshrc
export WEBSCRAPING_AI_API_KEY="your_key"
export PROXY_USERNAME="your_username"
export PROXY_PASSWORD="your_password"

# Windows PowerShell
[System.Environment]::SetEnvironmentVariable('WEBSCRAPING_AI_API_KEY', 'your_key', 'User')

Advanced Authentication Patterns for Scrapers

Multi-Target Scraping with Session Management

When scraping multiple authenticated sites, manage separate session credentials:

from dataclasses import dataclass
from typing import Dict
import httpx

@dataclass
class SessionCredentials:
    """Manage credentials for different target sites"""
    cookies: Dict[str, str]
    headers: Dict[str, str]
    auth_token: str = ""

class MultiSiteScraper:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.sessions: Dict[str, SessionCredentials] = {}

    def add_session(self, site_name: str, credentials: SessionCredentials):
        """Register authentication credentials for a specific site"""
        self.sessions[site_name] = credentials

    async def scrape_with_session(self, site_name: str, url: str) -> str:
        """Scrape using stored session credentials"""
        if site_name not in self.sessions:
            raise ValueError(f"No session found for {site_name}")

        creds = self.sessions[site_name]

        async with httpx.AsyncClient() as client:
            # Combine session cookies with headers
            headers = {**creds.headers}
            if creds.auth_token:
                headers["Authorization"] = f"Bearer {creds.auth_token}"

            response = await client.post(
                "https://api.webscraping.ai/html",
                params={
                    "url": url,
                    "api_key": self.api_key,
                    "js": "true"
                },
                json={
                    "cookies": creds.cookies,
                    "headers": headers
                }
            )

            return response.text

# Usage in MCP server
@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "scrape_with_session":
        # Add session before scraping
        scraper.add_session(
            site_name=arguments["site_name"],
            credentials=SessionCredentials(
                cookies=arguments["cookies"],
                headers=arguments.get("headers", {}),
                auth_token=arguments.get("auth_token", "")
            )
        )

        html = await scraper.scrape_with_session(
            site_name=arguments["site_name"],
            url=arguments["url"]
        )

        return [TextContent(type="text", text=html)]

Rotating Proxy Authentication

Implement proxy rotation with authentication for large-scale scraping:

class ProxyRotator {
  private proxyList: Array<{
    host: string;
    username: string;
    password: string;
  }>;
  private currentIndex: number = 0;

  constructor() {
    // Load proxy credentials from environment
    this.proxyList = JSON.parse(
      process.env.PROXY_LIST || "[]"
    );
  }

  getNextProxy() {
    if (this.proxyList.length === 0) {
      return null;
    }

    const proxy = this.proxyList[this.currentIndex];
    this.currentIndex = (this.currentIndex + 1) % this.proxyList.length;
    return proxy;
  }

  async scrapeWithRotation(url: string, apiKey: string): Promise<string> {
    const proxy = this.getNextProxy();

    if (!proxy) {
      throw new Error("No proxies configured");
    }

    const response = await axios.get("https://api.webscraping.ai/html", {
      params: {
        url,
        api_key: apiKey,
        proxy: "custom",
        proxy_url: `http://${proxy.username}:${proxy.password}@${proxy.host}`,
      },
    });

    return response.data;
  }
}

const proxyRotator = new ProxyRotator();

server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === "scrape_with_rotating_proxy") {
    const html = await proxyRotator.scrapeWithRotation(
      request.params.arguments.url,
      process.env.WEBSCRAPING_AI_API_KEY!
    );

    return {
      content: [{ type: "text", text: html }],
    };
  }
});

Browser Automation with Authentication

For complex authentication flows similar to handling authentication in Puppeteer, integrate browser automation:

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "scrape_with_browser_auth":
        # First, perform login flow to get session cookies
        login_response = await client.post(
            "https://api.webscraping.ai/html",
            params={
                "url": arguments["login_url"],
                "api_key": API_KEY,
                "js": "true"
            },
            json={
                "js_script": f"""
                    document.querySelector('#username').value = '{arguments["username"]}';
                    document.querySelector('#password').value = '{arguments["password"]}';
                    document.querySelector('form').submit();
                """
            }
        )

        # Extract cookies from response headers
        session_cookies = extract_cookies(login_response.headers)

        # Use session cookies to scrape protected pages
        protected_response = await client.post(
            "https://api.webscraping.ai/html",
            params={
                "url": arguments["target_url"],
                "api_key": API_KEY
            },
            json={"cookies": session_cookies}
        )

        return [TextContent(type="text", text=protected_response.text)]

Security Best Practices for Authenticated Scrapers

1. Credential Validation at Startup

def validate_credentials():
    """Validate all required credentials are present"""
    required = {
        "WEBSCRAPING_AI_API_KEY": "WebScraping.AI API key",
        "PROXY_USERNAME": "Proxy username (optional)",
        "PROXY_PASSWORD": "Proxy password (optional)"
    }

    missing = []
    for var, description in required.items():
        if var.endswith("(optional)"):
            continue
        if not os.environ.get(var):
            missing.append(f"{var} ({description})")

    if missing:
        raise ValueError(
            f"Missing required credentials:\n" +
            "\n".join(f"  - {item}" for item in missing)
        )

# Validate before starting server
validate_credentials()

2. Secure Logging

Never log sensitive authentication data:

import winston from "winston";

const logger = winston.createLogger({
  level: "info",
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  transports: [
    new winston.transports.File({ filename: "scraper.log" }),
  ],
});

// Good: Log without exposing credentials
logger.info("Scraping request", {
  url: targetUrl,
  useProxy: true,
  timestamp: new Date().toISOString(),
});

// Bad: Never log credentials
// logger.info("Request", { apiKey: API_KEY }); // DON'T DO THIS

3. Rate Limiting with Authentication

Protect your API quota:

from datetime import datetime, timedelta
from collections import deque

class AuthenticatedRateLimiter:
    def __init__(self, max_requests_per_minute: int):
        self.max_requests = max_requests_per_minute
        self.requests = deque()

    async def check_limit(self):
        """Enforce rate limiting"""
        now = datetime.now()

        # Remove requests older than 1 minute
        while self.requests and self.requests[0] < now - timedelta(minutes=1):
            self.requests.popleft()

        if len(self.requests) >= self.max_requests:
            wait_time = (self.requests[0] + timedelta(minutes=1) - now).total_seconds()
            raise ValueError(f"Rate limit exceeded. Wait {wait_time:.0f} seconds")

        self.requests.append(now)

rate_limiter = AuthenticatedRateLimiter(max_requests_per_minute=60)

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    await rate_limiter.check_limit()

    # Proceed with authenticated scraping
    html = await scraper.scrape_html(arguments["url"])
    return [TextContent(type="text", text=html)]

4. Error Handling for Authentication Failures

async function scrapeWithRetry(
  url: string,
  maxRetries: number = 3
): Promise<string> {
  let lastError: Error;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await axios.get("https://api.webscraping.ai/html", {
        params: {
          url,
          api_key: process.env.WEBSCRAPING_AI_API_KEY,
        },
      });

      return response.data;
    } catch (error: any) {
      lastError = error;

      if (error.response?.status === 401) {
        throw new Error("Authentication failed: Invalid API key");
      }

      if (error.response?.status === 429) {
        const waitTime = Math.pow(2, attempt) * 1000;
        console.error(`Rate limited. Waiting ${waitTime}ms before retry ${attempt}/${maxRetries}`);
        await new Promise((resolve) => setTimeout(resolve, waitTime));
        continue;
      }

      if (attempt === maxRetries) {
        throw new Error(`Scraping failed after ${maxRetries} attempts: ${error.message}`);
      }
    }
  }

  throw lastError!;
}

Testing Your Authenticated Scraper

Create a test suite to verify authentication works correctly:

import asyncio
import os

async def test_scraper_authentication():
    """Test MCP scraper authentication"""
    print("Testing MCP scraper authentication...\n")

    # Test 1: Environment variables
    api_key = os.environ.get("WEBSCRAPING_AI_API_KEY")
    if not api_key:
        print("❌ FAIL: API key not found")
        return False
    print("✅ PASS: API key loaded from environment")

    # Test 2: Basic scraping
    try:
        scraper = AuthenticatedScraper()
        html = await scraper.scrape_html("https://example.com")
        if html and len(html) > 0:
            print("✅ PASS: Basic scraping works")
        else:
            print("❌ FAIL: Empty response")
            return False
    except Exception as e:
        print(f"❌ FAIL: Scraping error: {e}")
        return False

    # Test 3: Proxy authentication (if configured)
    proxy_user = os.environ.get("PROXY_USERNAME")
    if proxy_user:
        try:
            html = await scraper.scrape_html(
                "https://httpbin.org/ip",
                use_proxy=True
            )
            print("✅ PASS: Proxy authentication works")
        except Exception as e:
            print(f"⚠️  WARNING: Proxy test failed: {e}")

    print("\n✅ All tests passed!")
    return True

if __name__ == "__main__":
    asyncio.run(test_scraper_authentication())

Run tests:

python test_scraper.py

Troubleshooting Common Issues

Issue: "API key not configured"

Solution: Verify MCP configuration includes environment variables:

# Check MCP config file
cat ~/Library/Application\ Support/Claude/claude_desktop_config.json

# Ensure env section exists
{
  "mcpServers": {
    "scraper": {
      "env": {
        "WEBSCRAPING_AI_API_KEY": "your_key"
      }
    }
  }
}

Issue: "401 Unauthorized" errors

Causes: - Expired or invalid API key - API key not properly passed to scraping service - Incorrect parameter format

Solution:

# Add debug logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Verify API key is being sent
print(f"Using API key: {API_KEY[:10]}...{API_KEY[-4:]}")  # Partial display only

Issue: Authentication works locally but fails in MCP

Solution: Restart MCP client after configuration changes:

Quit Claude Desktop completely
Update claude_desktop_config.json
Relaunch Claude Desktop
Test scraper tools

Conclusion

Integrating authentication into MCP-based scrapers provides a secure, maintainable approach to credential management while enabling powerful web scraping capabilities. By centralizing authentication in your MCP server configuration and implementing proper error handling, rate limiting, and security practices, you can build robust scraping tools that handle both public and authenticated content reliably.

When building your scraper, remember to validate credentials at startup, implement comprehensive logging without exposing secrets, and test authentication thoroughly before deploying to production. For more complex scenarios requiring browser session handling or navigating multi-step authentication flows, consider combining MCP server authentication with browser automation tools.

The patterns demonstrated here work with any web scraping API that uses API key authentication, making them broadly applicable whether you're scraping social media platforms, e-commerce sites, or internal web applications that require authenticated access.

Table of contents