Yes, in Puppeteer-Sharp, you can limit the resources loaded by a page by intercepting network requests and aborting those you don't want to load. This can be useful to speed up page loads and save bandwidth, especially when you're only interested in certain types of resources, such as document markup, and not in images, stylesheets, or scripts.
Here's an example of how to use Puppeteer-Sharp to intercept network requests and cancel loading of all resources except for documents (HTML):
using System;
using System.Threading.Tasks;
using PuppeteerSharp;
class Program
{
public static async Task Main(string[] args)
{
// Download the Chromium revision if it does not already exist
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
// Launch the browser
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true // Change to false if you need a GUI
});
// Create a new page
var page = await browser.NewPageAsync();
// Attach an event listener to intercept network requests
await page.SetRequestInterceptionAsync(true);
page.Request += (sender, e) =>
{
// Abort requests for resources that are not documents (HTML)
if (e.Request.ResourceType != ResourceType.Document)
{
e.Request.AbortAsync();
}
else
{
e.Request.ContinueAsync();
}
};
// Navigate to the target URL
await page.GoToAsync("https://example.com");
// Do something with the page content, like extracting data or taking a screenshot
// ...
// Close the browser
await browser.CloseAsync();
}
}
In this example, we use the SetRequestInterceptionAsync(true)
method to enable request interception for the page. Then, we attach an event listener to the Request
event, which will be triggered for each network request made by the page. Inside the event handler, we check the ResourceType
of the request, and if it's not a Document
, we call AbortAsync()
to cancel the request. If it is a document, we call ContinueAsync()
to allow the request to proceed.
You can adjust the condition to allow other types of resources by checking against other ResourceType
values, such as Image
, StyleSheet
, Script
, Font
, etc., depending on your scraping needs.
Remember to include error handling and dispose of resources properly in a real-world application. The example above is simplified for clarity and brevity.