Table of contents

How do I configure request interception for specific URLs in Puppeteer-Sharp?

Request interception in Puppeteer-Sharp is a powerful feature that allows you to monitor, modify, or block HTTP requests made by a page. This capability is essential for testing, debugging, performance optimization, and implementing custom behaviors during web scraping or automation tasks.

Understanding Request Interception

Request interception works by capturing network requests before they are sent to the server. Once intercepted, you can:

  • Modify request headers, URL, or payload
  • Block specific requests (images, stylesheets, ads)
  • Mock responses for testing
  • Log network activity
  • Implement custom caching strategies

Basic Request Interception Setup

To enable request interception in Puppeteer-Sharp, you must first enable it on the page and then set up event handlers:

using PuppeteerSharp;

var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = true
});

var page = await browser.NewPageAsync();

// Enable request interception
await page.SetRequestInterceptionAsync(true);

// Set up request handler
page.Request += async (sender, e) =>
{
    var request = e.Request;

    // Your interception logic here
    await request.ContinueAsync();
};

await page.GoToAsync("https://example.com");

Filtering Requests by URL

Exact URL Matching

The most straightforward approach is to match exact URLs:

page.Request += async (sender, e) =>
{
    var request = e.Request;
    var url = request.Url;

    if (url == "https://example.com/api/data")
    {
        // Intercept this specific URL
        await request.RespondAsync(new ResponseData
        {
            Status = HttpStatusCode.OK,
            ContentType = "application/json",
            Body = "{\"message\": \"Mocked response\"}"
        });
    }
    else
    {
        await request.ContinueAsync();
    }
};

URL Pattern Matching

For more flexible filtering, use pattern matching with regular expressions or string methods:

using System.Text.RegularExpressions;

page.Request += async (sender, e) =>
{
    var request = e.Request;
    var url = request.Url;

    // Block all image requests
    if (Regex.IsMatch(url, @"\.(jpg|jpeg|png|gif|svg|webp)(\?.*)?$", RegexOptions.IgnoreCase))
    {
        await request.AbortAsync();
        return;
    }

    // Intercept API endpoints
    if (url.Contains("/api/") && url.Contains("users"))
    {
        // Modify API requests
        await request.ContinueAsync(new Payload
        {
            Headers = request.Headers.Concat(new Dictionary<string, string>
            {
                ["Authorization"] = "Bearer your-token-here"
            }).ToDictionary(x => x.Key, x => x.Value)
        });
        return;
    }

    // Allow all other requests
    await request.ContinueAsync();
};

Domain-Based Filtering

Filter requests based on domain or subdomain:

page.Request += async (sender, e) =>
{
    var request = e.Request;
    var uri = new Uri(request.Url);

    // Block requests to tracking domains
    var blockedDomains = new[] { "google-analytics.com", "facebook.com", "doubleclick.net" };

    if (blockedDomains.Any(domain => uri.Host.Contains(domain)))
    {
        await request.AbortAsync();
        return;
    }

    // Intercept requests to specific subdomain
    if (uri.Host == "api.example.com")
    {
        // Handle API requests differently
        await HandleApiRequest(request);
        return;
    }

    await request.ContinueAsync();
};

private async Task HandleApiRequest(Request request)
{
    // Custom logic for API requests
    var modifiedHeaders = request.Headers.ToDictionary(x => x.Key, x => x.Value);
    modifiedHeaders["X-Custom-Header"] = "InterceptedRequest";

    await request.ContinueAsync(new Payload
    {
        Headers = modifiedHeaders
    });
}

Advanced Request Modification

Modifying Request Payload

page.Request += async (sender, e) =>
{
    var request = e.Request;

    if (request.Method == HttpMethod.Post && request.Url.Contains("/submit-form"))
    {
        // Modify POST data
        var originalData = request.PostData;
        var modifiedData = originalData + "&additional_field=value";

        await request.ContinueAsync(new Payload
        {
            PostData = modifiedData
        });
        return;
    }

    await request.ContinueAsync();
};

Adding Custom Headers

page.Request += async (sender, e) =>
{
    var request = e.Request;

    // Add custom headers to specific URLs
    if (request.Url.StartsWith("https://api.example.com"))
    {
        var headers = request.Headers.ToDictionary(x => x.Key, x => x.Value);
        headers["Authorization"] = "Bearer your-api-token";
        headers["X-Client-Version"] = "1.0.0";
        headers["User-Agent"] = "CustomBot/1.0";

        await request.ContinueAsync(new Payload
        {
            Headers = headers
        });
        return;
    }

    await request.ContinueAsync();
};

Resource Type Filtering

Puppeteer-Sharp provides resource type information that you can use for filtering:

page.Request += async (sender, e) =>
{
    var request = e.Request;
    var resourceType = request.ResourceType;

    switch (resourceType)
    {
        case ResourceType.Image:
            // Block images to improve performance
            await request.AbortAsync();
            break;

        case ResourceType.Stylesheet:
            // Allow stylesheets but log them
            Console.WriteLine($"Loading CSS: {request.Url}");
            await request.ContinueAsync();
            break;

        case ResourceType.Script:
            // Intercept specific JavaScript files
            if (request.Url.Contains("analytics.js"))
            {
                await request.AbortAsync();
            }
            else
            {
                await request.ContinueAsync();
            }
            break;

        case ResourceType.XHR:
        case ResourceType.Fetch:
            // Handle AJAX requests
            await HandleAjaxRequest(request);
            break;

        default:
            await request.ContinueAsync();
            break;
    }
};

Mocking Responses for Testing

Request interception is particularly useful for testing scenarios where you need to mock API responses:

public class RequestInterceptor
{
    private readonly Dictionary<string, ResponseData> _mockedResponses;

    public RequestInterceptor()
    {
        _mockedResponses = new Dictionary<string, ResponseData>
        {
            ["https://api.example.com/users"] = new ResponseData
            {
                Status = HttpStatusCode.OK,
                ContentType = "application/json",
                Body = """
                {
                    "users": [
                        {"id": 1, "name": "John Doe"},
                        {"id": 2, "name": "Jane Smith"}
                    ]
                }
                """
            },
            ["https://api.example.com/config"] = new ResponseData
            {
                Status = HttpStatusCode.OK,
                ContentType = "application/json",
                Body = """{"theme": "dark", "version": "1.2.3"}"""
            }
        };
    }

    public async Task InterceptRequest(object sender, RequestEventArgs e)
    {
        var request = e.Request;

        if (_mockedResponses.TryGetValue(request.Url, out var mockResponse))
        {
            await request.RespondAsync(mockResponse);
        }
        else
        {
            await request.ContinueAsync();
        }
    }
}

// Usage
var interceptor = new RequestInterceptor();
page.Request += interceptor.InterceptRequest;

Performance Optimization Strategies

When dealing with request interception, especially for web scraping, performance is crucial. Here are some optimization techniques:

public class OptimizedRequestInterceptor
{
    private readonly HashSet<string> _blockedPatterns;
    private readonly Dictionary<string, string> _urlRedirects;

    public OptimizedRequestInterceptor()
    {
        _blockedPatterns = new HashSet<string>
        {
            ".css", ".jpg", ".png", ".gif", ".svg", ".woff", ".woff2",
            "google-analytics", "facebook.com", "twitter.com"
        };

        _urlRedirects = new Dictionary<string, string>
        {
            ["https://slow-api.com/data"] = "https://fast-cache.com/data"
        };
    }

    public async Task HandleRequest(object sender, RequestEventArgs e)
    {
        var request = e.Request;
        var url = request.Url;

        // Quick blocking check
        if (_blockedPatterns.Any(pattern => url.Contains(pattern)))
        {
            await request.AbortAsync();
            return;
        }

        // URL redirection for performance
        if (_urlRedirects.TryGetValue(url, out var redirectUrl))
        {
            await request.ContinueAsync(new Payload
            {
                Url = redirectUrl
            });
            return;
        }

        await request.ContinueAsync();
    }
}

Integration with Browser Sessions

When working with browser sessions in Puppeteer, you might need to maintain request interception across multiple pages. Here's how to set up persistent interception:

public class SessionManager
{
    private readonly BrowserContext _context;

    public SessionManager(BrowserContext context)
    {
        _context = context;
    }

    public async Task SetupGlobalInterception()
    {
        // Apply interception to all pages in the context
        _context.TargetCreated += async (sender, e) =>
        {
            if (e.Target.Type == TargetType.Page)
            {
                var page = await e.Target.PageAsync();
                if (page != null)
                {
                    await SetupPageInterception(page);
                }
            }
        };
    }

    private async Task SetupPageInterception(Page page)
    {
        await page.SetRequestInterceptionAsync(true);

        page.Request += async (sender, e) =>
        {
            // Your global interception logic
            await e.Request.ContinueAsync();
        };
    }
}

Monitoring Network Requests

Request interception can be combined with monitoring network requests in Puppeteer for comprehensive network analysis:

public class NetworkMonitor
{
    private readonly List<RequestInfo> _requests = new();

    public async Task SetupMonitoring(Page page)
    {
        await page.SetRequestInterceptionAsync(true);

        page.Request += async (sender, e) =>
        {
            var request = e.Request;

            var requestInfo = new RequestInfo
            {
                Url = request.Url,
                Method = request.Method.ToString(),
                Headers = request.Headers,
                Timestamp = DateTime.UtcNow,
                ResourceType = request.ResourceType.ToString()
            };

            _requests.Add(requestInfo);

            // Continue with the request
            await request.ContinueAsync();
        };
    }

    public void PrintNetworkSummary()
    {
        var summary = _requests
            .GroupBy(r => new Uri(r.Url).Host)
            .Select(g => new { Host = g.Key, Count = g.Count() })
            .OrderByDescending(x => x.Count);

        Console.WriteLine("Network Request Summary:");
        foreach (var item in summary)
        {
            Console.WriteLine($"{item.Host}: {item.Count} requests");
        }
    }
}

public class RequestInfo
{
    public string Url { get; set; }
    public string Method { get; set; }
    public Dictionary<string, string> Headers { get; set; }
    public DateTime Timestamp { get; set; }
    public string ResourceType { get; set; }
}

Error Handling and Best Practices

Always implement proper error handling when working with request interception:

page.Request += async (sender, e) =>
{
    try
    {
        var request = e.Request;

        // Your interception logic here

        await request.ContinueAsync();
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Request interception error: {ex.Message}");

        // Always ensure the request is handled to prevent hanging
        try
        {
            await e.Request.ContinueAsync();
        }
        catch
        {
            // Request might already be handled
        }
    }
};

Combining with Page Navigation

When implementing request interception alongside page navigation in Puppeteer, ensure your interception is set up before navigating:

var page = await browser.NewPageAsync();

// Set up interception BEFORE navigating
await page.SetRequestInterceptionAsync(true);

page.Request += async (sender, e) =>
{
    var request = e.Request;

    // Filter out unnecessary resources for faster navigation
    if (request.ResourceType == ResourceType.Image || 
        request.ResourceType == ResourceType.Stylesheet)
    {
        await request.AbortAsync();
        return;
    }

    await request.ContinueAsync();
};

// Now navigate - the interceptor is already active
await page.GoToAsync("https://example.com");

Conclusion

Request interception in Puppeteer-Sharp provides powerful capabilities for controlling network behavior during web automation and scraping tasks. Whether you're optimizing performance by blocking unnecessary resources, testing with mocked responses, or implementing custom authentication flows, proper request interception can significantly enhance your web automation projects.

Remember to always handle requests appropriately (either continue, abort, or respond) to prevent your application from hanging, and implement proper error handling to ensure robust operation in production environments.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon