Table of contents

What are MCP Resources and How Do I Use Them?

MCP (Model Context Protocol) resources are a fundamental concept in the MCP architecture that enable AI assistants to access and interact with external data sources. Resources represent pieces of content or data that an MCP server can provide to clients, making them essential for web scraping workflows, data extraction, and automation tasks.

Understanding MCP Resources

MCP resources are URI-addressable pieces of content that servers expose to clients. They function similarly to REST API endpoints but are specifically designed for AI model consumption. Each resource has a unique identifier (URI) and can contain various types of data, from simple text to complex structured information.

Key Characteristics of MCP Resources

Resources in the Model Context Protocol have several important properties:

  • URI-based identification: Each resource is uniquely identified by a URI scheme
  • Typed content: Resources can contain text, binary data, or structured information
  • Metadata support: Resources include descriptive metadata like names, descriptions, and MIME types
  • Dynamic or static: Resources can be static files or dynamically generated content
  • Access control: Servers can implement permissions and authentication for resources

Types of MCP Resources

MCP resources generally fall into several categories depending on their use case:

1. File System Resources

These resources represent files and directories on the local or remote file system. They're commonly used for accessing scraped data, configuration files, or output results.

// Example: File system resource URI
const resourceUri = 'file:///path/to/scraped/data.json';

2. Web Resources

Web resources represent data fetched from URLs or web services. These are particularly useful for web scraping scenarios where you need to access and process web content.

# Example: Web resource URI
resource_uri = 'https://example.com/api/data'

3. Database Resources

Database resources provide access to structured data stored in databases, allowing AI assistants to query and analyze scraped data.

4. Custom Resources

Servers can define custom resource types specific to their domain, such as browser snapshots, scraped pages, or extracted entities.

How to Use MCP Resources in Web Scraping

Listing Available Resources

The first step in working with MCP resources is discovering what resources are available from connected servers. Use the resource listing capability to enumerate available resources:

import asyncio
from mcp import Client

async def list_resources():
    async with Client("http://localhost:3000") as client:
        # List all available resources
        resources = await client.list_resources()

        for resource in resources:
            print(f"URI: {resource.uri}")
            print(f"Name: {resource.name}")
            print(f"Description: {resource.description}")
            print(f"MIME Type: {resource.mime_type}")
            print("---")

asyncio.run(list_resources())
// JavaScript example using MCP SDK
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

async function listResources() {
  const transport = new StdioClientTransport({
    command: 'mcp-server-command',
    args: []
  });

  const client = new Client({
    name: 'scraping-client',
    version: '1.0.0'
  }, {
    capabilities: {
      resources: {}
    }
  });

  await client.connect(transport);

  // List all resources
  const response = await client.listResources();

  response.resources.forEach(resource => {
    console.log(`URI: ${resource.uri}`);
    console.log(`Name: ${resource.name}`);
    console.log(`Description: ${resource.description}`);
    console.log('---');
  });

  await client.close();
}

listResources();

Reading Resource Content

Once you've identified the resources you need, you can read their content using the resource read operation:

async def read_scraped_data():
    async with Client("http://localhost:3000") as client:
        # Read a specific resource
        resource_content = await client.read_resource(
            uri="scraping://pages/product-list"
        )

        # Access the content
        for content_item in resource_content.contents:
            if content_item.mime_type == "application/json":
                import json
                data = json.loads(content_item.text)
                print(f"Scraped {len(data)} items")
            elif content_item.mime_type == "text/html":
                print(f"HTML content length: {len(content_item.text)}")

asyncio.run(read_scraped_data())
async function readScrapedData() {
  // ... (client setup as before)

  const resourceUri = 'scraping://pages/product-list';
  const response = await client.readResource({ uri: resourceUri });

  response.contents.forEach(content => {
    if (content.mimeType === 'application/json') {
      const data = JSON.parse(content.text);
      console.log(`Scraped ${data.length} items`);
    } else if (content.mimeType === 'text/html') {
      console.log(`HTML content length: ${content.text.length}`);
    }
  });
}

Practical Web Scraping Example with MCP Resources

Here's a complete example demonstrating how to use MCP resources for a web scraping workflow:

import asyncio
from mcp import Client
import json

class WebScrapingMCPClient:
    def __init__(self, server_url):
        self.server_url = server_url

    async def scrape_and_process(self, target_url):
        async with Client(self.server_url) as client:
            # Step 1: Trigger scraping via a tool call
            scrape_result = await client.call_tool(
                name="scrape_page",
                arguments={
                    "url": target_url,
                    "wait_for": "body",
                    "extract_links": True
                }
            )

            # Step 2: Get the resource URI from the result
            resource_uri = scrape_result.content[0].text

            # Step 3: Read the scraped content as a resource
            resource = await client.read_resource(uri=resource_uri)

            # Step 4: Process the scraped data
            for content in resource.contents:
                if content.mime_type == "application/json":
                    scraped_data = json.loads(content.text)
                    print(f"Title: {scraped_data.get('title')}")
                    print(f"Links found: {len(scraped_data.get('links', []))}")

                    # Save processed data
                    with open('output.json', 'w') as f:
                        json.dump(scraped_data, f, indent=2)

            return scraped_data

# Usage
async def main():
    scraper = WebScrapingMCPClient("http://localhost:3000")
    data = await scraper.scrape_and_process("https://example.com/products")

asyncio.run(main())

Working with Playwright MCP Server Resources

When using the Playwright MCP server for web scraping, resources typically represent browser snapshots, screenshots, and page content:

async function capturePageSnapshot() {
  const client = new Client({
    name: 'playwright-scraper',
    version: '1.0.0'
  }, {
    capabilities: { resources: {} }
  });

  await client.connect(transport);

  // Take a snapshot using a tool
  const snapshotResult = await client.callTool({
    name: 'browser_snapshot',
    arguments: {}
  });

  // The snapshot might be available as a resource
  const resources = await client.listResources();
  const snapshotResource = resources.resources.find(
    r => r.uri.includes('snapshot')
  );

  if (snapshotResource) {
    const content = await client.readResource({
      uri: snapshotResource.uri
    });

    console.log('Snapshot content:', content.contents[0].text);
  }
}

Resource URI Schemes

Different MCP servers use different URI schemes to organize their resources. Here are common patterns:

File-based Resources

file:///absolute/path/to/resource.json
file://./relative/path/to/data.html

Custom Scheme Resources

scraping://sessions/session-id-123
browser://tabs/tab-1/screenshot
database://tables/scraped_products/rows

HTTP/HTTPS Resources

https://api.example.com/data
http://localhost:8080/scraping-results

Resource Subscriptions and Updates

Some MCP servers support resource subscriptions, allowing clients to receive notifications when resources change. This is particularly useful for monitoring scraping jobs:

async def monitor_scraping_job():
    async with Client("http://localhost:3000") as client:
        # Subscribe to resource updates
        await client.subscribe_resource(
            uri="scraping://jobs/job-123/status"
        )

        # Listen for updates
        async for notification in client.notifications():
            if notification.method == "notifications/resources/updated":
                uri = notification.params["uri"]

                # Read the updated resource
                resource = await client.read_resource(uri=uri)
                status = json.loads(resource.contents[0].text)

                print(f"Job status: {status['state']}")
                print(f"Progress: {status['progress']}%")

                if status['state'] == 'completed':
                    break

asyncio.run(monitor_scraping_job())

Best Practices for Using MCP Resources

1. Cache Resource Listings

Resource listings can be expensive operations. Cache the results when possible:

class ResourceCache:
    def __init__(self, client, ttl=300):
        self.client = client
        self.ttl = ttl
        self._cache = {}
        self._timestamps = {}

    async def get_resources(self, force_refresh=False):
        import time
        now = time.time()

        if force_refresh or now - self._timestamps.get('resources', 0) > self.ttl:
            self._cache['resources'] = await self.client.list_resources()
            self._timestamps['resources'] = now

        return self._cache['resources']

2. Handle Resource Errors Gracefully

Resources might not always be available. Implement proper error handling:

async function safeReadResource(client, uri) {
  try {
    const resource = await client.readResource({ uri });
    return resource;
  } catch (error) {
    if (error.code === 'RESOURCE_NOT_FOUND') {
      console.error(`Resource not found: ${uri}`);
      return null;
    }
    throw error;
  }
}

3. Validate MIME Types

Always check the MIME type before processing resource content:

def process_resource_content(content):
    mime_type = content.mime_type

    if mime_type == "application/json":
        return json.loads(content.text)
    elif mime_type == "text/html":
        from bs4 import BeautifulSoup
        return BeautifulSoup(content.text, 'html.parser')
    elif mime_type.startswith("image/"):
        import base64
        return base64.b64decode(content.blob)
    else:
        raise ValueError(f"Unsupported MIME type: {mime_type}")

Integration with Web Scraping Workflows

MCP resources integrate seamlessly with web scraping workflows. When you handle browser sessions in Puppeteer or work with other automation tools, resources provide a standardized way to access scraped data:

async def automated_scraping_pipeline():
    async with Client("http://localhost:3000") as client:
        # Step 1: List available scraping templates
        resources = await client.list_resources()
        templates = [r for r in resources if r.uri.startswith('templates://')]

        # Step 2: Use a template resource
        template = await client.read_resource(uri=templates[0].uri)
        template_config = json.loads(template.contents[0].text)

        # Step 3: Execute scraping with template
        result = await client.call_tool(
            name="execute_template",
            arguments=template_config
        )

        # Step 4: Access results as resources
        result_uri = result.content[0].text
        scraped_data = await client.read_resource(uri=result_uri)

        return scraped_data

Conclusion

MCP resources provide a powerful abstraction for accessing and managing data in AI-powered web scraping workflows. By understanding resource types, URI schemes, and proper usage patterns, you can build robust scraping systems that leverage the full capabilities of the Model Context Protocol.

Whether you're working with browser automation tools, managing scraped data, or building complex extraction pipelines, MCP resources offer a standardized interface that simplifies data access and enables seamless integration between different components of your scraping infrastructure. When combined with proper authentication mechanisms, MCP resources become an essential tool in your web scraping toolkit.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon