How do I configure request interception for specific URLs in Puppeteer-Sharp?
Request interception in Puppeteer-Sharp is a powerful feature that allows you to monitor, modify, or block HTTP requests made by a page. This capability is essential for testing, debugging, performance optimization, and implementing custom behaviors during web scraping or automation tasks.
Understanding Request Interception
Request interception works by capturing network requests before they are sent to the server. Once intercepted, you can:
- Modify request headers, URL, or payload
- Block specific requests (images, stylesheets, ads)
- Mock responses for testing
- Log network activity
- Implement custom caching strategies
Basic Request Interception Setup
To enable request interception in Puppeteer-Sharp, you must first enable it on the page and then set up event handlers:
using PuppeteerSharp;
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
});
var page = await browser.NewPageAsync();
// Enable request interception
await page.SetRequestInterceptionAsync(true);
// Set up request handler
page.Request += async (sender, e) =>
{
var request = e.Request;
// Your interception logic here
await request.ContinueAsync();
};
await page.GoToAsync("https://example.com");
Filtering Requests by URL
Exact URL Matching
The most straightforward approach is to match exact URLs:
page.Request += async (sender, e) =>
{
var request = e.Request;
var url = request.Url;
if (url == "https://example.com/api/data")
{
// Intercept this specific URL
await request.RespondAsync(new ResponseData
{
Status = HttpStatusCode.OK,
ContentType = "application/json",
Body = "{\"message\": \"Mocked response\"}"
});
}
else
{
await request.ContinueAsync();
}
};
URL Pattern Matching
For more flexible filtering, use pattern matching with regular expressions or string methods:
using System.Text.RegularExpressions;
page.Request += async (sender, e) =>
{
var request = e.Request;
var url = request.Url;
// Block all image requests
if (Regex.IsMatch(url, @"\.(jpg|jpeg|png|gif|svg|webp)(\?.*)?$", RegexOptions.IgnoreCase))
{
await request.AbortAsync();
return;
}
// Intercept API endpoints
if (url.Contains("/api/") && url.Contains("users"))
{
// Modify API requests
await request.ContinueAsync(new Payload
{
Headers = request.Headers.Concat(new Dictionary<string, string>
{
["Authorization"] = "Bearer your-token-here"
}).ToDictionary(x => x.Key, x => x.Value)
});
return;
}
// Allow all other requests
await request.ContinueAsync();
};
Domain-Based Filtering
Filter requests based on domain or subdomain:
page.Request += async (sender, e) =>
{
var request = e.Request;
var uri = new Uri(request.Url);
// Block requests to tracking domains
var blockedDomains = new[] { "google-analytics.com", "facebook.com", "doubleclick.net" };
if (blockedDomains.Any(domain => uri.Host.Contains(domain)))
{
await request.AbortAsync();
return;
}
// Intercept requests to specific subdomain
if (uri.Host == "api.example.com")
{
// Handle API requests differently
await HandleApiRequest(request);
return;
}
await request.ContinueAsync();
};
private async Task HandleApiRequest(Request request)
{
// Custom logic for API requests
var modifiedHeaders = request.Headers.ToDictionary(x => x.Key, x => x.Value);
modifiedHeaders["X-Custom-Header"] = "InterceptedRequest";
await request.ContinueAsync(new Payload
{
Headers = modifiedHeaders
});
}
Advanced Request Modification
Modifying Request Payload
page.Request += async (sender, e) =>
{
var request = e.Request;
if (request.Method == HttpMethod.Post && request.Url.Contains("/submit-form"))
{
// Modify POST data
var originalData = request.PostData;
var modifiedData = originalData + "&additional_field=value";
await request.ContinueAsync(new Payload
{
PostData = modifiedData
});
return;
}
await request.ContinueAsync();
};
Adding Custom Headers
page.Request += async (sender, e) =>
{
var request = e.Request;
// Add custom headers to specific URLs
if (request.Url.StartsWith("https://api.example.com"))
{
var headers = request.Headers.ToDictionary(x => x.Key, x => x.Value);
headers["Authorization"] = "Bearer your-api-token";
headers["X-Client-Version"] = "1.0.0";
headers["User-Agent"] = "CustomBot/1.0";
await request.ContinueAsync(new Payload
{
Headers = headers
});
return;
}
await request.ContinueAsync();
};
Resource Type Filtering
Puppeteer-Sharp provides resource type information that you can use for filtering:
page.Request += async (sender, e) =>
{
var request = e.Request;
var resourceType = request.ResourceType;
switch (resourceType)
{
case ResourceType.Image:
// Block images to improve performance
await request.AbortAsync();
break;
case ResourceType.Stylesheet:
// Allow stylesheets but log them
Console.WriteLine($"Loading CSS: {request.Url}");
await request.ContinueAsync();
break;
case ResourceType.Script:
// Intercept specific JavaScript files
if (request.Url.Contains("analytics.js"))
{
await request.AbortAsync();
}
else
{
await request.ContinueAsync();
}
break;
case ResourceType.XHR:
case ResourceType.Fetch:
// Handle AJAX requests
await HandleAjaxRequest(request);
break;
default:
await request.ContinueAsync();
break;
}
};
Mocking Responses for Testing
Request interception is particularly useful for testing scenarios where you need to mock API responses:
public class RequestInterceptor
{
private readonly Dictionary<string, ResponseData> _mockedResponses;
public RequestInterceptor()
{
_mockedResponses = new Dictionary<string, ResponseData>
{
["https://api.example.com/users"] = new ResponseData
{
Status = HttpStatusCode.OK,
ContentType = "application/json",
Body = """
{
"users": [
{"id": 1, "name": "John Doe"},
{"id": 2, "name": "Jane Smith"}
]
}
"""
},
["https://api.example.com/config"] = new ResponseData
{
Status = HttpStatusCode.OK,
ContentType = "application/json",
Body = """{"theme": "dark", "version": "1.2.3"}"""
}
};
}
public async Task InterceptRequest(object sender, RequestEventArgs e)
{
var request = e.Request;
if (_mockedResponses.TryGetValue(request.Url, out var mockResponse))
{
await request.RespondAsync(mockResponse);
}
else
{
await request.ContinueAsync();
}
}
}
// Usage
var interceptor = new RequestInterceptor();
page.Request += interceptor.InterceptRequest;
Performance Optimization Strategies
When dealing with request interception, especially for web scraping, performance is crucial. Here are some optimization techniques:
public class OptimizedRequestInterceptor
{
private readonly HashSet<string> _blockedPatterns;
private readonly Dictionary<string, string> _urlRedirects;
public OptimizedRequestInterceptor()
{
_blockedPatterns = new HashSet<string>
{
".css", ".jpg", ".png", ".gif", ".svg", ".woff", ".woff2",
"google-analytics", "facebook.com", "twitter.com"
};
_urlRedirects = new Dictionary<string, string>
{
["https://slow-api.com/data"] = "https://fast-cache.com/data"
};
}
public async Task HandleRequest(object sender, RequestEventArgs e)
{
var request = e.Request;
var url = request.Url;
// Quick blocking check
if (_blockedPatterns.Any(pattern => url.Contains(pattern)))
{
await request.AbortAsync();
return;
}
// URL redirection for performance
if (_urlRedirects.TryGetValue(url, out var redirectUrl))
{
await request.ContinueAsync(new Payload
{
Url = redirectUrl
});
return;
}
await request.ContinueAsync();
}
}
Integration with Browser Sessions
When working with browser sessions in Puppeteer, you might need to maintain request interception across multiple pages. Here's how to set up persistent interception:
public class SessionManager
{
private readonly BrowserContext _context;
public SessionManager(BrowserContext context)
{
_context = context;
}
public async Task SetupGlobalInterception()
{
// Apply interception to all pages in the context
_context.TargetCreated += async (sender, e) =>
{
if (e.Target.Type == TargetType.Page)
{
var page = await e.Target.PageAsync();
if (page != null)
{
await SetupPageInterception(page);
}
}
};
}
private async Task SetupPageInterception(Page page)
{
await page.SetRequestInterceptionAsync(true);
page.Request += async (sender, e) =>
{
// Your global interception logic
await e.Request.ContinueAsync();
};
}
}
Monitoring Network Requests
Request interception can be combined with monitoring network requests in Puppeteer for comprehensive network analysis:
public class NetworkMonitor
{
private readonly List<RequestInfo> _requests = new();
public async Task SetupMonitoring(Page page)
{
await page.SetRequestInterceptionAsync(true);
page.Request += async (sender, e) =>
{
var request = e.Request;
var requestInfo = new RequestInfo
{
Url = request.Url,
Method = request.Method.ToString(),
Headers = request.Headers,
Timestamp = DateTime.UtcNow,
ResourceType = request.ResourceType.ToString()
};
_requests.Add(requestInfo);
// Continue with the request
await request.ContinueAsync();
};
}
public void PrintNetworkSummary()
{
var summary = _requests
.GroupBy(r => new Uri(r.Url).Host)
.Select(g => new { Host = g.Key, Count = g.Count() })
.OrderByDescending(x => x.Count);
Console.WriteLine("Network Request Summary:");
foreach (var item in summary)
{
Console.WriteLine($"{item.Host}: {item.Count} requests");
}
}
}
public class RequestInfo
{
public string Url { get; set; }
public string Method { get; set; }
public Dictionary<string, string> Headers { get; set; }
public DateTime Timestamp { get; set; }
public string ResourceType { get; set; }
}
Error Handling and Best Practices
Always implement proper error handling when working with request interception:
page.Request += async (sender, e) =>
{
try
{
var request = e.Request;
// Your interception logic here
await request.ContinueAsync();
}
catch (Exception ex)
{
Console.WriteLine($"Request interception error: {ex.Message}");
// Always ensure the request is handled to prevent hanging
try
{
await e.Request.ContinueAsync();
}
catch
{
// Request might already be handled
}
}
};
Combining with Page Navigation
When implementing request interception alongside page navigation in Puppeteer, ensure your interception is set up before navigating:
var page = await browser.NewPageAsync();
// Set up interception BEFORE navigating
await page.SetRequestInterceptionAsync(true);
page.Request += async (sender, e) =>
{
var request = e.Request;
// Filter out unnecessary resources for faster navigation
if (request.ResourceType == ResourceType.Image ||
request.ResourceType == ResourceType.Stylesheet)
{
await request.AbortAsync();
return;
}
await request.ContinueAsync();
};
// Now navigate - the interceptor is already active
await page.GoToAsync("https://example.com");
Conclusion
Request interception in Puppeteer-Sharp provides powerful capabilities for controlling network behavior during web automation and scraping tasks. Whether you're optimizing performance by blocking unnecessary resources, testing with mocked responses, or implementing custom authentication flows, proper request interception can significantly enhance your web automation projects.
Remember to always handle requests appropriately (either continue, abort, or respond) to prevent your application from hanging, and implement proper error handling to ensure robust operation in production environments.