How to Intercept and Modify HTTP Requests in Puppeteer-Sharp
Request interception in Puppeteer-Sharp is a powerful feature that allows you to monitor, modify, block, or redirect HTTP requests before they reach the server. This capability is essential for web scraping, testing, and performance optimization scenarios where you need fine-grained control over network traffic.
Understanding Request Interception
Request interception works by enabling a special mode in Puppeteer-Sharp where all network requests are paused before being sent. This gives you the opportunity to examine the request details and decide how to handle each one - whether to continue unchanged, modify parameters, block entirely, or provide a custom response.
Basic Request Interception Setup
To start intercepting requests, you must first enable request interception on a page and then set up event handlers to process incoming requests.
Enabling Request Interception
using PuppeteerSharp;
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
});
var page = await browser.NewPageAsync();
// Enable request interception
await page.SetRequestInterceptionAsync(true);
// Set up request handler
page.Request += async (sender, e) =>
{
var request = e.Request;
// Log all requests
Console.WriteLine($"Request: {request.Method} {request.Url}");
// Continue with original request
await request.ContinueAsync();
};
await page.GoToAsync("https://example.com");
Modifying Request Headers
One of the most common use cases is modifying request headers, such as changing the User-Agent or adding custom headers for authentication:
page.Request += async (sender, e) =>
{
var request = e.Request;
// Create modified headers
var headers = new Dictionary<string, string>(request.Headers)
{
["User-Agent"] = "Custom Bot 1.0",
["Authorization"] = "Bearer your-token-here",
["X-Custom-Header"] = "custom-value"
};
// Continue with modified headers
await request.ContinueAsync(new Payload
{
Headers = headers
});
};
Blocking Specific Requests
You can block requests to improve performance by preventing unnecessary resources from loading:
page.Request += async (sender, e) =>
{
var request = e.Request;
var url = request.Url;
// Block images, stylesheets, and fonts
if (request.ResourceType == ResourceType.Image ||
request.ResourceType == ResourceType.StyleSheet ||
request.ResourceType == ResourceType.Font)
{
await request.AbortAsync();
return;
}
// Block specific domains
if (url.Contains("ads.google.com") || url.Contains("analytics.google.com"))
{
await request.AbortAsync();
return;
}
// Continue with allowed requests
await request.ContinueAsync();
};
Modifying Request URLs
You can redirect requests to different URLs, which is useful for testing or working with different environments:
page.Request += async (sender, e) =>
{
var request = e.Request;
var originalUrl = request.Url;
// Redirect API calls to staging environment
if (originalUrl.Contains("api.production.com"))
{
var newUrl = originalUrl.Replace("api.production.com", "api.staging.com");
await request.ContinueAsync(new Payload
{
Url = newUrl
});
return;
}
// Continue with original URL
await request.ContinueAsync();
};
Modifying POST Data
For POST requests, you can intercept and modify the request body:
page.Request += async (sender, e) =>
{
var request = e.Request;
if (request.Method == HttpMethod.Post && request.Url.Contains("/api/submit"))
{
// Parse existing POST data
var originalData = request.PostData;
// Create modified data
var modifiedData = originalData + "&additional_field=custom_value";
await request.ContinueAsync(new Payload
{
PostData = modifiedData,
Headers = new Dictionary<string, string>(request.Headers)
{
["Content-Length"] = modifiedData.Length.ToString()
}
});
return;
}
await request.ContinueAsync();
};
Providing Mock Responses
Instead of making actual HTTP requests, you can provide custom responses directly:
page.Request += async (sender, e) =>
{
var request = e.Request;
// Mock API responses
if (request.Url.Contains("/api/user"))
{
var mockResponse = new
{
id = 123,
name = "Test User",
email = "test@example.com"
};
await request.RespondAsync(new ResponseData
{
Status = HttpStatusCode.OK,
ContentType = "application/json",
Body = System.Text.Json.JsonSerializer.Serialize(mockResponse)
});
return;
}
await request.ContinueAsync();
};
Advanced Request Filtering
For complex scenarios, you can implement sophisticated filtering logic:
public class RequestInterceptor
{
private readonly HashSet<string> _blockedDomains;
private readonly Dictionary<string, string> _urlReplacements;
public RequestInterceptor()
{
_blockedDomains = new HashSet<string>
{
"ads.google.com",
"facebook.com/tr",
"analytics.google.com"
};
_urlReplacements = new Dictionary<string, string>
{
{ "cdn.production.com", "cdn.staging.com" },
{ "api.v1.com", "api.v2.com" }
};
}
public async Task HandleRequestAsync(object sender, RequestEventArgs e)
{
var request = e.Request;
var url = request.Url;
// Check if request should be blocked
if (ShouldBlockRequest(url))
{
await request.AbortAsync();
return;
}
// Apply URL replacements
var modifiedUrl = ApplyUrlReplacements(url);
// Add custom headers
var headers = AddCustomHeaders(request.Headers);
await request.ContinueAsync(new Payload
{
Url = modifiedUrl != url ? modifiedUrl : null,
Headers = headers
});
}
private bool ShouldBlockRequest(string url)
{
return _blockedDomains.Any(domain => url.Contains(domain));
}
private string ApplyUrlReplacements(string url)
{
foreach (var replacement in _urlReplacements)
{
if (url.Contains(replacement.Key))
{
return url.Replace(replacement.Key, replacement.Value);
}
}
return url;
}
private Dictionary<string, string> AddCustomHeaders(Dictionary<string, string> originalHeaders)
{
var headers = new Dictionary<string, string>(originalHeaders)
{
["X-Scraper-Version"] = "1.0",
["Accept-Language"] = "en-US,en;q=0.9"
};
return headers;
}
}
// Usage
var interceptor = new RequestInterceptor();
page.Request += interceptor.HandleRequestAsync;
Monitoring Network Activity
Similar to monitoring network requests in Puppeteer, you can track and log all network activity:
public class NetworkMonitor
{
private readonly List<RequestInfo> _requests = new List<RequestInfo>();
public async Task SetupMonitoring(IPage page)
{
await page.SetRequestInterceptionAsync(true);
page.Request += async (sender, e) =>
{
var request = e.Request;
_requests.Add(new RequestInfo
{
Url = request.Url,
Method = request.Method.ToString(),
Headers = request.Headers,
ResourceType = request.ResourceType.ToString(),
Timestamp = DateTime.UtcNow
});
await request.ContinueAsync();
};
page.Response += (sender, e) =>
{
var response = e.Response;
Console.WriteLine($"Response: {response.Status} {response.Url}");
};
}
public void PrintNetworkSummary()
{
Console.WriteLine($"Total Requests: {_requests.Count}");
var groupedByType = _requests.GroupBy(r => r.ResourceType);
foreach (var group in groupedByType)
{
Console.WriteLine($"{group.Key}: {group.Count()}");
}
}
}
public class RequestInfo
{
public string Url { get; set; }
public string Method { get; set; }
public Dictionary<string, string> Headers { get; set; }
public string ResourceType { get; set; }
public DateTime Timestamp { get; set; }
}
Error Handling and Best Practices
When implementing request interception, it's important to handle errors gracefully:
page.Request += async (sender, e) =>
{
var request = e.Request;
try
{
// Your interception logic here
await ProcessRequest(request);
}
catch (Exception ex)
{
Console.WriteLine($"Error processing request {request.Url}: {ex.Message}");
// Always continue or abort the request to prevent hanging
try
{
await request.ContinueAsync();
}
catch
{
// Request might have already been handled
}
}
};
private async Task ProcessRequest(IRequest request)
{
// Complex processing logic
if (request.ResourceType == ResourceType.Document)
{
// Handle main document requests differently
await request.ContinueAsync();
}
else if (request.ResourceType == ResourceType.XHR)
{
// Handle AJAX requests, similar to [handling AJAX requests using Puppeteer](/faq/puppeteer/how-to-handle-ajax-requests-using-puppeteer)
await ModifyAjaxRequest(request);
}
else
{
await request.ContinueAsync();
}
}
Performance Considerations
Request interception adds overhead to page loading. To minimize impact:
- Be Selective: Only intercept when necessary
- Fast Processing: Keep request handlers lightweight
- Avoid Blocking: Don't perform long-running operations in handlers
- Use Patterns: Implement efficient URL matching
// Efficient URL pattern matching
private static readonly Regex BlockedUrlPattern = new Regex(
@"(ads\.google\.com|facebook\.com/tr|analytics\.google\.com)",
RegexOptions.Compiled | RegexOptions.IgnoreCase
);
page.Request += async (sender, e) =>
{
var request = e.Request;
if (BlockedUrlPattern.IsMatch(request.Url))
{
await request.AbortAsync();
return;
}
await request.ContinueAsync();
};
Integration with Authentication Workflows
Request interception is particularly useful for handling authentication in web scraping scenarios:
public class AuthenticationInterceptor
{
private readonly string _authToken;
public AuthenticationInterceptor(string authToken)
{
_authToken = authToken;
}
public async Task HandleRequest(object sender, RequestEventArgs e)
{
var request = e.Request;
// Add authentication to API requests
if (request.Url.Contains("/api/"))
{
var headers = new Dictionary<string, string>(request.Headers)
{
["Authorization"] = $"Bearer {_authToken}"
};
await request.ContinueAsync(new Payload { Headers = headers });
return;
}
await request.ContinueAsync();
}
}
Combining with Page Navigation
When working with multi-page applications, you can combine request interception with page navigation techniques to create comprehensive scraping workflows:
public class MultiPageScraper
{
private readonly IPage _page;
private readonly RequestInterceptor _interceptor;
public MultiPageScraper(IPage page)
{
_page = page;
_interceptor = new RequestInterceptor();
}
public async Task SetupAndNavigate(string url)
{
await _page.SetRequestInterceptionAsync(true);
_page.Request += _interceptor.HandleRequestAsync;
// Navigate with interception active
await _page.GoToAsync(url);
// Wait for dynamic content to load
await _page.WaitForSelectorAsync(".content");
}
}
Conclusion
Request interception in Puppeteer-Sharp provides powerful capabilities for controlling network traffic during web automation and scraping tasks. Whether you need to modify headers, block resources, redirect URLs, or provide mock responses, the request interception API gives you complete control over HTTP requests.
Remember to handle errors appropriately, keep processing efficient, and always ensure that every intercepted request is either continued or aborted to prevent hanging operations. With proper implementation, request interception can significantly enhance your web scraping capabilities and testing scenarios.