What are MCP Resources and How Do I Use Them?
MCP (Model Context Protocol) resources are a fundamental concept in the MCP architecture that enable AI assistants to access and interact with external data sources. Resources represent pieces of content or data that an MCP server can provide to clients, making them essential for web scraping workflows, data extraction, and automation tasks.
Understanding MCP Resources
MCP resources are URI-addressable pieces of content that servers expose to clients. They function similarly to REST API endpoints but are specifically designed for AI model consumption. Each resource has a unique identifier (URI) and can contain various types of data, from simple text to complex structured information.
Key Characteristics of MCP Resources
Resources in the Model Context Protocol have several important properties:
- URI-based identification: Each resource is uniquely identified by a URI scheme
- Typed content: Resources can contain text, binary data, or structured information
- Metadata support: Resources include descriptive metadata like names, descriptions, and MIME types
- Dynamic or static: Resources can be static files or dynamically generated content
- Access control: Servers can implement permissions and authentication for resources
Types of MCP Resources
MCP resources generally fall into several categories depending on their use case:
1. File System Resources
These resources represent files and directories on the local or remote file system. They're commonly used for accessing scraped data, configuration files, or output results.
// Example: File system resource URI
const resourceUri = 'file:///path/to/scraped/data.json';
2. Web Resources
Web resources represent data fetched from URLs or web services. These are particularly useful for web scraping scenarios where you need to access and process web content.
# Example: Web resource URI
resource_uri = 'https://example.com/api/data'
3. Database Resources
Database resources provide access to structured data stored in databases, allowing AI assistants to query and analyze scraped data.
4. Custom Resources
Servers can define custom resource types specific to their domain, such as browser snapshots, scraped pages, or extracted entities.
How to Use MCP Resources in Web Scraping
Listing Available Resources
The first step in working with MCP resources is discovering what resources are available from connected servers. Use the resource listing capability to enumerate available resources:
import asyncio
from mcp import Client
async def list_resources():
async with Client("http://localhost:3000") as client:
# List all available resources
resources = await client.list_resources()
for resource in resources:
print(f"URI: {resource.uri}")
print(f"Name: {resource.name}")
print(f"Description: {resource.description}")
print(f"MIME Type: {resource.mime_type}")
print("---")
asyncio.run(list_resources())
// JavaScript example using MCP SDK
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
async function listResources() {
const transport = new StdioClientTransport({
command: 'mcp-server-command',
args: []
});
const client = new Client({
name: 'scraping-client',
version: '1.0.0'
}, {
capabilities: {
resources: {}
}
});
await client.connect(transport);
// List all resources
const response = await client.listResources();
response.resources.forEach(resource => {
console.log(`URI: ${resource.uri}`);
console.log(`Name: ${resource.name}`);
console.log(`Description: ${resource.description}`);
console.log('---');
});
await client.close();
}
listResources();
Reading Resource Content
Once you've identified the resources you need, you can read their content using the resource read operation:
async def read_scraped_data():
async with Client("http://localhost:3000") as client:
# Read a specific resource
resource_content = await client.read_resource(
uri="scraping://pages/product-list"
)
# Access the content
for content_item in resource_content.contents:
if content_item.mime_type == "application/json":
import json
data = json.loads(content_item.text)
print(f"Scraped {len(data)} items")
elif content_item.mime_type == "text/html":
print(f"HTML content length: {len(content_item.text)}")
asyncio.run(read_scraped_data())
async function readScrapedData() {
// ... (client setup as before)
const resourceUri = 'scraping://pages/product-list';
const response = await client.readResource({ uri: resourceUri });
response.contents.forEach(content => {
if (content.mimeType === 'application/json') {
const data = JSON.parse(content.text);
console.log(`Scraped ${data.length} items`);
} else if (content.mimeType === 'text/html') {
console.log(`HTML content length: ${content.text.length}`);
}
});
}
Practical Web Scraping Example with MCP Resources
Here's a complete example demonstrating how to use MCP resources for a web scraping workflow:
import asyncio
from mcp import Client
import json
class WebScrapingMCPClient:
def __init__(self, server_url):
self.server_url = server_url
async def scrape_and_process(self, target_url):
async with Client(self.server_url) as client:
# Step 1: Trigger scraping via a tool call
scrape_result = await client.call_tool(
name="scrape_page",
arguments={
"url": target_url,
"wait_for": "body",
"extract_links": True
}
)
# Step 2: Get the resource URI from the result
resource_uri = scrape_result.content[0].text
# Step 3: Read the scraped content as a resource
resource = await client.read_resource(uri=resource_uri)
# Step 4: Process the scraped data
for content in resource.contents:
if content.mime_type == "application/json":
scraped_data = json.loads(content.text)
print(f"Title: {scraped_data.get('title')}")
print(f"Links found: {len(scraped_data.get('links', []))}")
# Save processed data
with open('output.json', 'w') as f:
json.dump(scraped_data, f, indent=2)
return scraped_data
# Usage
async def main():
scraper = WebScrapingMCPClient("http://localhost:3000")
data = await scraper.scrape_and_process("https://example.com/products")
asyncio.run(main())
Working with Playwright MCP Server Resources
When using the Playwright MCP server for web scraping, resources typically represent browser snapshots, screenshots, and page content:
async function capturePageSnapshot() {
const client = new Client({
name: 'playwright-scraper',
version: '1.0.0'
}, {
capabilities: { resources: {} }
});
await client.connect(transport);
// Take a snapshot using a tool
const snapshotResult = await client.callTool({
name: 'browser_snapshot',
arguments: {}
});
// The snapshot might be available as a resource
const resources = await client.listResources();
const snapshotResource = resources.resources.find(
r => r.uri.includes('snapshot')
);
if (snapshotResource) {
const content = await client.readResource({
uri: snapshotResource.uri
});
console.log('Snapshot content:', content.contents[0].text);
}
}
Resource URI Schemes
Different MCP servers use different URI schemes to organize their resources. Here are common patterns:
File-based Resources
file:///absolute/path/to/resource.json
file://./relative/path/to/data.html
Custom Scheme Resources
scraping://sessions/session-id-123
browser://tabs/tab-1/screenshot
database://tables/scraped_products/rows
HTTP/HTTPS Resources
https://api.example.com/data
http://localhost:8080/scraping-results
Resource Subscriptions and Updates
Some MCP servers support resource subscriptions, allowing clients to receive notifications when resources change. This is particularly useful for monitoring scraping jobs:
async def monitor_scraping_job():
async with Client("http://localhost:3000") as client:
# Subscribe to resource updates
await client.subscribe_resource(
uri="scraping://jobs/job-123/status"
)
# Listen for updates
async for notification in client.notifications():
if notification.method == "notifications/resources/updated":
uri = notification.params["uri"]
# Read the updated resource
resource = await client.read_resource(uri=uri)
status = json.loads(resource.contents[0].text)
print(f"Job status: {status['state']}")
print(f"Progress: {status['progress']}%")
if status['state'] == 'completed':
break
asyncio.run(monitor_scraping_job())
Best Practices for Using MCP Resources
1. Cache Resource Listings
Resource listings can be expensive operations. Cache the results when possible:
class ResourceCache:
def __init__(self, client, ttl=300):
self.client = client
self.ttl = ttl
self._cache = {}
self._timestamps = {}
async def get_resources(self, force_refresh=False):
import time
now = time.time()
if force_refresh or now - self._timestamps.get('resources', 0) > self.ttl:
self._cache['resources'] = await self.client.list_resources()
self._timestamps['resources'] = now
return self._cache['resources']
2. Handle Resource Errors Gracefully
Resources might not always be available. Implement proper error handling:
async function safeReadResource(client, uri) {
try {
const resource = await client.readResource({ uri });
return resource;
} catch (error) {
if (error.code === 'RESOURCE_NOT_FOUND') {
console.error(`Resource not found: ${uri}`);
return null;
}
throw error;
}
}
3. Validate MIME Types
Always check the MIME type before processing resource content:
def process_resource_content(content):
mime_type = content.mime_type
if mime_type == "application/json":
return json.loads(content.text)
elif mime_type == "text/html":
from bs4 import BeautifulSoup
return BeautifulSoup(content.text, 'html.parser')
elif mime_type.startswith("image/"):
import base64
return base64.b64decode(content.blob)
else:
raise ValueError(f"Unsupported MIME type: {mime_type}")
Integration with Web Scraping Workflows
MCP resources integrate seamlessly with web scraping workflows. When you handle browser sessions in Puppeteer or work with other automation tools, resources provide a standardized way to access scraped data:
async def automated_scraping_pipeline():
async with Client("http://localhost:3000") as client:
# Step 1: List available scraping templates
resources = await client.list_resources()
templates = [r for r in resources if r.uri.startswith('templates://')]
# Step 2: Use a template resource
template = await client.read_resource(uri=templates[0].uri)
template_config = json.loads(template.contents[0].text)
# Step 3: Execute scraping with template
result = await client.call_tool(
name="execute_template",
arguments=template_config
)
# Step 4: Access results as resources
result_uri = result.content[0].text
scraped_data = await client.read_resource(uri=result_uri)
return scraped_data
Conclusion
MCP resources provide a powerful abstraction for accessing and managing data in AI-powered web scraping workflows. By understanding resource types, URI schemes, and proper usage patterns, you can build robust scraping systems that leverage the full capabilities of the Model Context Protocol.
Whether you're working with browser automation tools, managing scraped data, or building complex extraction pipelines, MCP resources offer a standardized interface that simplifies data access and enables seamless integration between different components of your scraping infrastructure. When combined with proper authentication mechanisms, MCP resources become an essential tool in your web scraping toolkit.