Table of contents

How do I configure proxy settings for Puppeteer-Sharp?

Configuring proxy settings in Puppeteer-Sharp is essential for web scraping applications that need to route traffic through proxy servers for anonymity, geo-location changes, or to bypass rate limiting. This guide covers various proxy configurations including HTTP proxies, SOCKS proxies, and authenticated proxies.

Basic Proxy Configuration

The most straightforward way to configure a proxy in Puppeteer-Sharp is through the LaunchOptions when creating a browser instance. Here's the basic syntax:

using PuppeteerSharp;

var launchOptions = new LaunchOptions
{
    Args = new[] { "--proxy-server=http://proxy-server:port" }
};

using var browser = await Puppeteer.LaunchAsync(launchOptions);
using var page = await browser.NewPageAsync();

HTTP Proxy Configuration

For standard HTTP proxies, you can configure them using the --proxy-server argument:

using PuppeteerSharp;
using System;
using System.Threading.Tasks;

class Program
{
    static async Task Main(string[] args)
    {
        // Download browser if not already downloaded
        await new BrowserFetcher().DownloadAsync();

        var launchOptions = new LaunchOptions
        {
            Headless = true,
            Args = new[]
            {
                "--proxy-server=http://your-proxy-server.com:8080",
                "--no-sandbox",
                "--disable-setuid-sandbox"
            }
        };

        using var browser = await Puppeteer.LaunchAsync(launchOptions);
        using var page = await browser.NewPageAsync();

        // Navigate to a page to test proxy
        await page.GoToAsync("https://httpbin.org/ip");
        var content = await page.GetContentAsync();
        Console.WriteLine(content);
    }
}

SOCKS Proxy Configuration

Puppeteer-Sharp also supports SOCKS4 and SOCKS5 proxies. Here's how to configure them:

// SOCKS5 proxy configuration
var launchOptions = new LaunchOptions
{
    Args = new[]
    {
        "--proxy-server=socks5://your-socks-proxy.com:1080",
        "--host-resolver-rules=MAP * ~NOTFOUND, EXCLUDE your-socks-proxy.com"
    }
};

// SOCKS4 proxy configuration
var launchOptionsSOCKS4 = new LaunchOptions
{
    Args = new[]
    {
        "--proxy-server=socks4://your-socks4-proxy.com:1080"
    }
};

Authenticated Proxy Configuration

For proxies that require authentication, you need to handle credentials properly. Puppeteer-Sharp supports authenticated proxies through the page's authentication handler:

using PuppeteerSharp;

var launchOptions = new LaunchOptions
{
    Args = new[] { "--proxy-server=http://proxy-server:port" }
};

using var browser = await Puppeteer.LaunchAsync(launchOptions);
using var page = await browser.NewPageAsync();

// Set up authentication for the proxy
await page.AuthenticateAsync(new Credentials
{
    Username = "your-username",
    Password = "your-password"
});

await page.GoToAsync("https://example.com");

Advanced Proxy Configuration with Multiple Options

For more complex scenarios, you can combine multiple proxy-related arguments:

var launchOptions = new LaunchOptions
{
    Headless = true,
    Args = new[]
    {
        "--proxy-server=http://proxy-server:port",
        "--proxy-bypass-list=localhost,127.0.0.1",
        "--proxy-pac-url=http://example.com/proxy.pac",
        "--disable-web-security",
        "--ignore-certificate-errors",
        "--ignore-ssl-errors",
        "--ignore-certificate-errors-spki-list"
    }
};

Dynamic Proxy Configuration

For applications that need to rotate proxies or change proxy settings dynamically, you can create multiple browser instances:

public class ProxyManager
{
    private readonly string[] _proxies = 
    {
        "http://proxy1.example.com:8080",
        "http://proxy2.example.com:8080",
        "http://proxy3.example.com:8080"
    };

    public async Task<Browser> CreateBrowserWithProxy(int proxyIndex)
    {
        var proxy = _proxies[proxyIndex % _proxies.Length];

        var launchOptions = new LaunchOptions
        {
            Args = new[] { $"--proxy-server={proxy}" }
        };

        return await Puppeteer.LaunchAsync(launchOptions);
    }
}

// Usage
var proxyManager = new ProxyManager();
using var browser1 = await proxyManager.CreateBrowserWithProxy(0);
using var browser2 = await proxyManager.CreateBrowserWithProxy(1);

Proxy Configuration with Custom User Agent

When using proxies, it's often beneficial to also configure custom user agents to avoid detection. This approach is similar to handling browser sessions in Puppeteer where you manage browser identity:

var launchOptions = new LaunchOptions
{
    Args = new[]
    {
        "--proxy-server=http://proxy-server:port",
        "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    }
};

using var browser = await Puppeteer.LaunchAsync(launchOptions);
using var page = await browser.NewPageAsync();

// Additional user agent setting at page level
await page.SetUserAgentAsync("Custom User Agent String");

Testing Proxy Configuration

To verify that your proxy is working correctly, you can test it by checking the IP address:

public async Task TestProxyConfiguration(string proxyUrl)
{
    var launchOptions = new LaunchOptions
    {
        Args = new[] { $"--proxy-server={proxyUrl}" }
    };

    using var browser = await Puppeteer.LaunchAsync(launchOptions);
    using var page = await browser.NewPageAsync();

    try
    {
        await page.GoToAsync("https://httpbin.org/ip", new NavigationOptions
        {
            WaitUntil = new[] { WaitUntilNavigation.Networkidle0 },
            Timeout = 30000
        });

        var ipInfo = await page.EvaluateExpressionAsync<dynamic>("document.body.innerText");
        Console.WriteLine($"Current IP: {ipInfo}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Proxy test failed: {ex.Message}");
    }
}

Error Handling and Troubleshooting

When working with proxies, it's important to implement proper error handling, especially for timeout scenarios which you can learn more about in handling timeouts in Puppeteer:

public async Task<Page> CreatePageWithProxyAndErrorHandling(string proxyUrl)
{
    var launchOptions = new LaunchOptions
    {
        Args = new[] 
        { 
            $"--proxy-server={proxyUrl}",
            "--ignore-certificate-errors"
        },
        Timeout = 60000 // 60 seconds timeout
    };

    try
    {
        var browser = await Puppeteer.LaunchAsync(launchOptions);
        var page = await browser.NewPageAsync();

        // Set timeouts
        page.DefaultTimeout = 30000;
        page.DefaultNavigationTimeout = 30000;

        return page;
    }
    catch (PuppeteerException ex)
    {
        Console.WriteLine($"Failed to launch browser with proxy: {ex.Message}");
        throw;
    }
}

Proxy Rotation Implementation

For large-scale scraping operations, implementing proxy rotation is crucial:

public class RotatingProxyManager
{
    private readonly List<string> _proxies;
    private int _currentIndex = 0;
    private readonly object _lock = new object();

    public RotatingProxyManager(IEnumerable<string> proxies)
    {
        _proxies = proxies.ToList();
    }

    public string GetNextProxy()
    {
        lock (_lock)
        {
            var proxy = _proxies[_currentIndex];
            _currentIndex = (_currentIndex + 1) % _proxies.Count;
            return proxy;
        }
    }

    public async Task<Browser> CreateBrowserWithRotatedProxy()
    {
        var proxy = GetNextProxy();
        var launchOptions = new LaunchOptions
        {
            Args = new[] { $"--proxy-server={proxy}" }
        };

        return await Puppeteer.LaunchAsync(launchOptions);
    }
}

Proxy Authentication with Custom Headers

For some proxy providers that require custom authentication headers, you can combine proxy configuration with custom header settings in Puppeteer-Sharp:

var launchOptions = new LaunchOptions
{
    Args = new[] { "--proxy-server=http://proxy-server:port" }
};

using var browser = await Puppeteer.LaunchAsync(launchOptions);
using var page = await browser.NewPageAsync();

// Set custom headers including proxy authentication
await page.SetExtraHttpHeadersAsync(new Dictionary<string, string>
{
    {"Proxy-Authorization", "Basic " + Convert.ToBase64String(Encoding.UTF8.GetBytes("username:password"))},
    {"User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}
});

await page.GoToAsync("https://example.com");

Performance Optimization with Proxy Pools

When using multiple proxies, consider implementing a proxy pool manager that tracks performance and availability:

public class ProxyPool
{
    private readonly ConcurrentQueue<string> _availableProxies;
    private readonly Dictionary<string, DateTime> _lastUsed;
    private readonly Dictionary<string, int> _failureCount;
    private readonly object _lockObj = new object();

    public ProxyPool(IEnumerable<string> proxies)
    {
        _availableProxies = new ConcurrentQueue<string>(proxies);
        _lastUsed = new Dictionary<string, DateTime>();
        _failureCount = new Dictionary<string, int>();
    }

    public string GetProxy()
    {
        lock (_lockObj)
        {
            if (_availableProxies.TryDequeue(out string proxy))
            {
                _lastUsed[proxy] = DateTime.Now;
                return proxy;
            }
            return null; // No proxies available
        }
    }

    public void ReturnProxy(string proxy, bool wasSuccessful)
    {
        lock (_lockObj)
        {
            if (wasSuccessful)
            {
                _failureCount[proxy] = 0;
                _availableProxies.Enqueue(proxy);
            }
            else
            {
                _failureCount[proxy] = _failureCount.GetValueOrDefault(proxy, 0) + 1;

                // Only return to pool if failure count is below threshold
                if (_failureCount[proxy] < 3)
                {
                    _availableProxies.Enqueue(proxy);
                }
            }
        }
    }
}

Best Practices

  1. Always test proxy connectivity before using them in production
  2. Implement retry logic for failed proxy connections
  3. Monitor proxy performance and rotate slow or failed proxies
  4. Use appropriate timeouts to avoid hanging on unresponsive proxies
  5. Consider proxy authentication requirements and handle them properly
  6. Validate proxy configuration with simple requests before complex scraping
  7. Respect rate limits even when using proxies to avoid detection
  8. Keep proxy credentials secure and avoid hardcoding them in your application

Common Issues and Solutions

Connection Refused Errors

// Add connection retry logic
var maxRetries = 3;
var retryDelay = TimeSpan.FromSeconds(5);

for (int i = 0; i < maxRetries; i++)
{
    try
    {
        using var browser = await Puppeteer.LaunchAsync(launchOptions);
        // Success - break out of retry loop
        break;
    }
    catch (Exception ex) when (i < maxRetries - 1)
    {
        Console.WriteLine($"Connection attempt {i + 1} failed: {ex.Message}");
        await Task.Delay(retryDelay);
    }
}

Proxy Authentication Issues

// Ensure proper encoding for proxy credentials
var credentials = $"{username}:{password}";
var encodedCredentials = Convert.ToBase64String(Encoding.UTF8.GetBytes(credentials));
var proxyAuthHeader = $"Basic {encodedCredentials}";

Conclusion

Configuring proxy settings in Puppeteer-Sharp provides flexibility for various web scraping scenarios. Whether you need simple HTTP proxies, authenticated SOCKS proxies, or dynamic proxy rotation, Puppeteer-Sharp offers the tools needed to implement robust proxy solutions. Remember to always test your proxy configuration and implement proper error handling for production applications.

The key to successful proxy implementation is understanding your specific requirements and choosing the appropriate configuration method that balances performance, reliability, and security for your web scraping needs. Combined with proper error handling practices in Puppeteer-Sharp, proxy configuration becomes a powerful tool for scalable web scraping applications.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon