Table of contents

How to Intercept and Modify HTTP Requests in Puppeteer-Sharp

Request interception in Puppeteer-Sharp is a powerful feature that allows you to monitor, modify, block, or redirect HTTP requests before they reach the server. This capability is essential for web scraping, testing, and performance optimization scenarios where you need fine-grained control over network traffic.

Understanding Request Interception

Request interception works by enabling a special mode in Puppeteer-Sharp where all network requests are paused before being sent. This gives you the opportunity to examine the request details and decide how to handle each one - whether to continue unchanged, modify parameters, block entirely, or provide a custom response.

Basic Request Interception Setup

To start intercepting requests, you must first enable request interception on a page and then set up event handlers to process incoming requests.

Enabling Request Interception

using PuppeteerSharp;

var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
    Headless = true
});

var page = await browser.NewPageAsync();

// Enable request interception
await page.SetRequestInterceptionAsync(true);

// Set up request handler
page.Request += async (sender, e) =>
{
    var request = e.Request;

    // Log all requests
    Console.WriteLine($"Request: {request.Method} {request.Url}");

    // Continue with original request
    await request.ContinueAsync();
};

await page.GoToAsync("https://example.com");

Modifying Request Headers

One of the most common use cases is modifying request headers, such as changing the User-Agent or adding custom headers for authentication:

page.Request += async (sender, e) =>
{
    var request = e.Request;

    // Create modified headers
    var headers = new Dictionary<string, string>(request.Headers)
    {
        ["User-Agent"] = "Custom Bot 1.0",
        ["Authorization"] = "Bearer your-token-here",
        ["X-Custom-Header"] = "custom-value"
    };

    // Continue with modified headers
    await request.ContinueAsync(new Payload
    {
        Headers = headers
    });
};

Blocking Specific Requests

You can block requests to improve performance by preventing unnecessary resources from loading:

page.Request += async (sender, e) =>
{
    var request = e.Request;
    var url = request.Url;

    // Block images, stylesheets, and fonts
    if (request.ResourceType == ResourceType.Image ||
        request.ResourceType == ResourceType.StyleSheet ||
        request.ResourceType == ResourceType.Font)
    {
        await request.AbortAsync();
        return;
    }

    // Block specific domains
    if (url.Contains("ads.google.com") || url.Contains("analytics.google.com"))
    {
        await request.AbortAsync();
        return;
    }

    // Continue with allowed requests
    await request.ContinueAsync();
};

Modifying Request URLs

You can redirect requests to different URLs, which is useful for testing or working with different environments:

page.Request += async (sender, e) =>
{
    var request = e.Request;
    var originalUrl = request.Url;

    // Redirect API calls to staging environment
    if (originalUrl.Contains("api.production.com"))
    {
        var newUrl = originalUrl.Replace("api.production.com", "api.staging.com");

        await request.ContinueAsync(new Payload
        {
            Url = newUrl
        });
        return;
    }

    // Continue with original URL
    await request.ContinueAsync();
};

Modifying POST Data

For POST requests, you can intercept and modify the request body:

page.Request += async (sender, e) =>
{
    var request = e.Request;

    if (request.Method == HttpMethod.Post && request.Url.Contains("/api/submit"))
    {
        // Parse existing POST data
        var originalData = request.PostData;

        // Create modified data
        var modifiedData = originalData + "&additional_field=custom_value";

        await request.ContinueAsync(new Payload
        {
            PostData = modifiedData,
            Headers = new Dictionary<string, string>(request.Headers)
            {
                ["Content-Length"] = modifiedData.Length.ToString()
            }
        });
        return;
    }

    await request.ContinueAsync();
};

Providing Mock Responses

Instead of making actual HTTP requests, you can provide custom responses directly:

page.Request += async (sender, e) =>
{
    var request = e.Request;

    // Mock API responses
    if (request.Url.Contains("/api/user"))
    {
        var mockResponse = new
        {
            id = 123,
            name = "Test User",
            email = "test@example.com"
        };

        await request.RespondAsync(new ResponseData
        {
            Status = HttpStatusCode.OK,
            ContentType = "application/json",
            Body = System.Text.Json.JsonSerializer.Serialize(mockResponse)
        });
        return;
    }

    await request.ContinueAsync();
};

Advanced Request Filtering

For complex scenarios, you can implement sophisticated filtering logic:

public class RequestInterceptor
{
    private readonly HashSet<string> _blockedDomains;
    private readonly Dictionary<string, string> _urlReplacements;

    public RequestInterceptor()
    {
        _blockedDomains = new HashSet<string>
        {
            "ads.google.com",
            "facebook.com/tr",
            "analytics.google.com"
        };

        _urlReplacements = new Dictionary<string, string>
        {
            { "cdn.production.com", "cdn.staging.com" },
            { "api.v1.com", "api.v2.com" }
        };
    }

    public async Task HandleRequestAsync(object sender, RequestEventArgs e)
    {
        var request = e.Request;
        var url = request.Url;

        // Check if request should be blocked
        if (ShouldBlockRequest(url))
        {
            await request.AbortAsync();
            return;
        }

        // Apply URL replacements
        var modifiedUrl = ApplyUrlReplacements(url);

        // Add custom headers
        var headers = AddCustomHeaders(request.Headers);

        await request.ContinueAsync(new Payload
        {
            Url = modifiedUrl != url ? modifiedUrl : null,
            Headers = headers
        });
    }

    private bool ShouldBlockRequest(string url)
    {
        return _blockedDomains.Any(domain => url.Contains(domain));
    }

    private string ApplyUrlReplacements(string url)
    {
        foreach (var replacement in _urlReplacements)
        {
            if (url.Contains(replacement.Key))
            {
                return url.Replace(replacement.Key, replacement.Value);
            }
        }
        return url;
    }

    private Dictionary<string, string> AddCustomHeaders(Dictionary<string, string> originalHeaders)
    {
        var headers = new Dictionary<string, string>(originalHeaders)
        {
            ["X-Scraper-Version"] = "1.0",
            ["Accept-Language"] = "en-US,en;q=0.9"
        };

        return headers;
    }
}

// Usage
var interceptor = new RequestInterceptor();
page.Request += interceptor.HandleRequestAsync;

Monitoring Network Activity

Similar to monitoring network requests in Puppeteer, you can track and log all network activity:

public class NetworkMonitor
{
    private readonly List<RequestInfo> _requests = new List<RequestInfo>();

    public async Task SetupMonitoring(IPage page)
    {
        await page.SetRequestInterceptionAsync(true);

        page.Request += async (sender, e) =>
        {
            var request = e.Request;

            _requests.Add(new RequestInfo
            {
                Url = request.Url,
                Method = request.Method.ToString(),
                Headers = request.Headers,
                ResourceType = request.ResourceType.ToString(),
                Timestamp = DateTime.UtcNow
            });

            await request.ContinueAsync();
        };

        page.Response += (sender, e) =>
        {
            var response = e.Response;
            Console.WriteLine($"Response: {response.Status} {response.Url}");
        };
    }

    public void PrintNetworkSummary()
    {
        Console.WriteLine($"Total Requests: {_requests.Count}");

        var groupedByType = _requests.GroupBy(r => r.ResourceType);
        foreach (var group in groupedByType)
        {
            Console.WriteLine($"{group.Key}: {group.Count()}");
        }
    }
}

public class RequestInfo
{
    public string Url { get; set; }
    public string Method { get; set; }
    public Dictionary<string, string> Headers { get; set; }
    public string ResourceType { get; set; }
    public DateTime Timestamp { get; set; }
}

Error Handling and Best Practices

When implementing request interception, it's important to handle errors gracefully:

page.Request += async (sender, e) =>
{
    var request = e.Request;

    try
    {
        // Your interception logic here
        await ProcessRequest(request);
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error processing request {request.Url}: {ex.Message}");

        // Always continue or abort the request to prevent hanging
        try
        {
            await request.ContinueAsync();
        }
        catch
        {
            // Request might have already been handled
        }
    }
};

private async Task ProcessRequest(IRequest request)
{
    // Complex processing logic
    if (request.ResourceType == ResourceType.Document)
    {
        // Handle main document requests differently
        await request.ContinueAsync();
    }
    else if (request.ResourceType == ResourceType.XHR)
    {
        // Handle AJAX requests, similar to [handling AJAX requests using Puppeteer](/faq/puppeteer/how-to-handle-ajax-requests-using-puppeteer)
        await ModifyAjaxRequest(request);
    }
    else
    {
        await request.ContinueAsync();
    }
}

Performance Considerations

Request interception adds overhead to page loading. To minimize impact:

  1. Be Selective: Only intercept when necessary
  2. Fast Processing: Keep request handlers lightweight
  3. Avoid Blocking: Don't perform long-running operations in handlers
  4. Use Patterns: Implement efficient URL matching
// Efficient URL pattern matching
private static readonly Regex BlockedUrlPattern = new Regex(
    @"(ads\.google\.com|facebook\.com/tr|analytics\.google\.com)",
    RegexOptions.Compiled | RegexOptions.IgnoreCase
);

page.Request += async (sender, e) =>
{
    var request = e.Request;

    if (BlockedUrlPattern.IsMatch(request.Url))
    {
        await request.AbortAsync();
        return;
    }

    await request.ContinueAsync();
};

Integration with Authentication Workflows

Request interception is particularly useful for handling authentication in web scraping scenarios:

public class AuthenticationInterceptor
{
    private readonly string _authToken;

    public AuthenticationInterceptor(string authToken)
    {
        _authToken = authToken;
    }

    public async Task HandleRequest(object sender, RequestEventArgs e)
    {
        var request = e.Request;

        // Add authentication to API requests
        if (request.Url.Contains("/api/"))
        {
            var headers = new Dictionary<string, string>(request.Headers)
            {
                ["Authorization"] = $"Bearer {_authToken}"
            };

            await request.ContinueAsync(new Payload { Headers = headers });
            return;
        }

        await request.ContinueAsync();
    }
}

Combining with Page Navigation

When working with multi-page applications, you can combine request interception with page navigation techniques to create comprehensive scraping workflows:

public class MultiPageScraper
{
    private readonly IPage _page;
    private readonly RequestInterceptor _interceptor;

    public MultiPageScraper(IPage page)
    {
        _page = page;
        _interceptor = new RequestInterceptor();
    }

    public async Task SetupAndNavigate(string url)
    {
        await _page.SetRequestInterceptionAsync(true);
        _page.Request += _interceptor.HandleRequestAsync;

        // Navigate with interception active
        await _page.GoToAsync(url);

        // Wait for dynamic content to load
        await _page.WaitForSelectorAsync(".content");
    }
}

Conclusion

Request interception in Puppeteer-Sharp provides powerful capabilities for controlling network traffic during web automation and scraping tasks. Whether you need to modify headers, block resources, redirect URLs, or provide mock responses, the request interception API gives you complete control over HTTP requests.

Remember to handle errors appropriately, keep processing efficient, and always ensure that every intercepted request is either continued or aborted to prevent hanging operations. With proper implementation, request interception can significantly enhance your web scraping capabilities and testing scenarios.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon