Table of contents

How do I make HTTP GET requests in C# for web scraping?

Making HTTP GET requests is the foundation of web scraping in C#. Whether you're extracting data from APIs, downloading HTML pages, or collecting information from multiple websites, understanding how to properly execute GET requests is essential. This guide covers multiple approaches, from modern HttpClient to legacy methods, with practical examples for building robust web scrapers.

Understanding HTTP GET Requests

HTTP GET requests retrieve data from a specified resource. In web scraping, GET requests are used to:

  • Fetch HTML content from web pages
  • Retrieve data from RESTful APIs
  • Download files and images
  • Access paginated content
  • Collect structured data in JSON or XML format

C# provides several ways to make HTTP GET requests, each with different features and use cases.

Using HttpClient (Recommended Method)

HttpClient is the modern, recommended approach for making HTTP requests in C#. It's part of the System.Net.Http namespace and is designed for async operations, reusability, and high performance.

Basic GET Request with HttpClient

Here's a simple example of making a GET request:

using System;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
    private static readonly HttpClient client = new HttpClient();

    static async Task Main(string[] args)
    {
        try
        {
            string url = "https://example.com";

            // Send GET request
            HttpResponseMessage response = await client.GetAsync(url);

            // Ensure success status code (throws exception if not successful)
            response.EnsureSuccessStatusCode();

            // Read response content as string
            string htmlContent = await response.Content.ReadAsStringAsync();

            Console.WriteLine($"Retrieved {htmlContent.Length} characters");
            Console.WriteLine(htmlContent);
        }
        catch (HttpRequestException e)
        {
            Console.WriteLine($"Request error: {e.Message}");
        }
    }
}

Setting Headers and User-Agent

Many websites require proper headers to accept requests. Here's how to configure them:

using System;
using System.Net.Http;
using System.Threading.Tasks;

class WebScraper
{
    private static readonly HttpClient client = new HttpClient();

    static WebScraper()
    {
        // Configure default request headers
        client.DefaultRequestHeaders.Add("User-Agent",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36");
        client.DefaultRequestHeaders.Add("Accept",
            "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
        client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.9");

        // Set timeout to 30 seconds
        client.Timeout = TimeSpan.FromSeconds(30);
    }

    public static async Task<string> FetchPage(string url)
    {
        var response = await client.GetAsync(url);
        response.EnsureSuccessStatusCode();
        return await response.Content.ReadAsStringAsync();
    }
}

// Usage
var html = await WebScraper.FetchPage("https://example.com");

This configuration makes your scraper appear more like a legitimate browser, which can help avoid being blocked by websites. For more advanced scenarios, you can learn about using async/await in C# for asynchronous web scraping.

Handling Query Parameters

When scraping URLs with query parameters, use UriBuilder for clean code:

using System;
using System.Net.Http;
using System.Threading.Tasks;
using System.Web;

public class QueryParameterExample
{
    private static readonly HttpClient client = new HttpClient();

    public static async Task<string> SearchPage(string baseUrl, string searchTerm, int page)
    {
        var uriBuilder = new UriBuilder(baseUrl);
        var query = HttpUtility.ParseQueryString(uriBuilder.Query);

        query["q"] = searchTerm;
        query["page"] = page.ToString();

        uriBuilder.Query = query.ToString();
        string finalUrl = uriBuilder.ToString();

        var response = await client.GetAsync(finalUrl);
        response.EnsureSuccessStatusCode();
        return await response.Content.ReadAsStringAsync();
    }
}

// Usage
var results = await QueryParameterExample.SearchPage(
    "https://example.com/search",
    "web scraping",
    1
);

Checking Response Status Codes

Always check status codes to handle different scenarios appropriately:

using System;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;

public class StatusCodeHandler
{
    private static readonly HttpClient client = new HttpClient();

    public static async Task<string> FetchWithStatusCheck(string url)
    {
        var response = await client.GetAsync(url);

        switch (response.StatusCode)
        {
            case HttpStatusCode.OK:
                return await response.Content.ReadAsStringAsync();

            case HttpStatusCode.NotFound:
                Console.WriteLine($"Page not found: {url}");
                return null;

            case HttpStatusCode.Forbidden:
                Console.WriteLine($"Access forbidden: {url}");
                return null;

            case HttpStatusCode.TooManyRequests:
                Console.WriteLine("Rate limit exceeded, waiting...");
                await Task.Delay(5000); // Wait 5 seconds
                return await FetchWithStatusCheck(url); // Retry

            default:
                Console.WriteLine($"Unexpected status: {response.StatusCode}");
                return null;
        }
    }
}

Using WebClient (Legacy Method)

WebClient is an older, simpler class for making HTTP requests. While it's considered legacy (marked as obsolete in .NET 6+), you may encounter it in existing code:

using System;
using System.Net;

class WebClientExample
{
    static void Main()
    {
        using (WebClient client = new WebClient())
        {
            try
            {
                // Set User-Agent header
                client.Headers.Add("User-Agent",
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36");

                // Download string content
                string html = client.DownloadString("https://example.com");

                Console.WriteLine($"Downloaded {html.Length} characters");
            }
            catch (WebException ex)
            {
                Console.WriteLine($"Error: {ex.Message}");
            }
        }
    }
}

WebClient with Query Parameters

using System;
using System.Collections.Specialized;
using System.Net;

class WebClientQueryExample
{
    static void Main()
    {
        using (WebClient client = new WebClient())
        {
            client.Headers.Add("User-Agent", "Mozilla/5.0");

            // Add query parameters
            var query = new NameValueCollection();
            query.Add("search", "web scraping");
            query.Add("page", "1");

            client.QueryString = query;

            string result = client.DownloadString("https://example.com/search");
            Console.WriteLine(result);
        }
    }
}

Note: While WebClient is simpler for basic scenarios, HttpClient is recommended for modern applications due to better performance and async support.

Using HttpWebRequest (Legacy Method)

HttpWebRequest offers fine-grained control but requires more verbose code:

using System;
using System.IO;
using System.Net;

class HttpWebRequestExample
{
    static void Main()
    {
        try
        {
            // Create request
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://example.com");

            // Configure request
            request.Method = "GET";
            request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36";
            request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
            request.Timeout = 30000; // 30 seconds

            // Get response
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                string html = reader.ReadToEnd();
                Console.WriteLine($"Status: {response.StatusCode}");
                Console.WriteLine($"Content: {html.Length} characters");
            }
        }
        catch (WebException ex)
        {
            Console.WriteLine($"Request failed: {ex.Message}");
        }
    }
}

Advanced GET Request Patterns

Handling Redirects

By default, HttpClient follows redirects automatically. To handle them manually:

using System;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;

public class RedirectHandler
{
    public static async Task<string> FollowRedirectsManually(string url)
    {
        var handler = new HttpClientHandler
        {
            AllowAutoRedirect = false
        };

        using (var client = new HttpClient(handler))
        {
            HttpResponseMessage response = await client.GetAsync(url);

            if (response.StatusCode == HttpStatusCode.MovedPermanently ||
                response.StatusCode == HttpStatusCode.Redirect)
            {
                string redirectUrl = response.Headers.Location.ToString();
                Console.WriteLine($"Redirected to: {redirectUrl}");

                // Follow redirect
                response = await client.GetAsync(redirectUrl);
            }

            return await response.Content.ReadAsStringAsync();
        }
    }
}

Setting Timeout Values

Proper timeout configuration prevents hanging requests. Learn more about setting up timeout values for HTTP requests in C#:

using System;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;

public class TimeoutExample
{
    public static async Task<string> FetchWithCustomTimeout(string url, int timeoutSeconds)
    {
        using (var client = new HttpClient())
        {
            client.Timeout = TimeSpan.FromSeconds(timeoutSeconds);

            // Or use CancellationToken for more control
            var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds));

            try
            {
                var response = await client.GetAsync(url, cts.Token);
                return await response.Content.ReadAsStringAsync();
            }
            catch (TaskCanceledException)
            {
                Console.WriteLine($"Request timed out after {timeoutSeconds} seconds");
                throw;
            }
        }
    }
}

Using Proxies

For large-scale scraping, proxies are essential. Check out the detailed guide on configuring proxy settings in C#:

using System;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;

public class ProxyExample
{
    public static async Task<string> FetchWithProxy(string url, string proxyUrl)
    {
        var proxy = new WebProxy(proxyUrl);

        var handler = new HttpClientHandler
        {
            Proxy = proxy,
            UseProxy = true
        };

        using (var client = new HttpClient(handler))
        {
            client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0");

            var response = await client.GetAsync(url);
            return await response.Content.ReadAsStringAsync();
        }
    }
}

Scraping Multiple Pages Concurrently

One of the most powerful features of HttpClient is the ability to make multiple concurrent requests:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading.Tasks;

public class ParallelScraper
{
    private static readonly HttpClient client = new HttpClient();

    public static async Task<List<string>> ScrapeManyPages(List<string> urls)
    {
        // Create tasks for all URLs
        var tasks = urls.Select(url => ScrapePageAsync(url));

        // Wait for all to complete
        var results = await Task.WhenAll(tasks);

        // Filter out null results (failed requests)
        return results.Where(r => r != null).ToList();
    }

    private static async Task<string> ScrapePageAsync(string url)
    {
        try
        {
            var response = await client.GetAsync(url);
            response.EnsureSuccessStatusCode();
            return await response.Content.ReadAsStringAsync();
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Failed to scrape {url}: {ex.Message}");
            return null;
        }
    }
}

// Usage
var urls = new List<string>
{
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
};

var results = await ParallelScraper.ScrapeManyPages(urls);
Console.WriteLine($"Successfully scraped {results.Count} pages");

Handling Cookies and Sessions

For websites requiring authentication or session management:

using System;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;

public class CookieExample
{
    public static async Task<string> ScrapeWithCookies(string url)
    {
        var cookieContainer = new CookieContainer();

        var handler = new HttpClientHandler
        {
            CookieContainer = cookieContainer,
            UseCookies = true
        };

        using (var client = new HttpClient(handler))
        {
            // Cookies will be automatically stored and sent with subsequent requests
            var response = await client.GetAsync(url);
            return await response.Content.ReadAsStringAsync();
        }
    }
}

Error Handling Best Practices

Robust error handling is crucial for production scrapers:

using System;
using System.Net.Http;
using System.Threading.Tasks;

public class RobustScraper
{
    private static readonly HttpClient client = new HttpClient();

    public static async Task<string> SafeFetch(string url, int maxRetries = 3)
    {
        int retries = 0;

        while (retries < maxRetries)
        {
            try
            {
                var response = await client.GetAsync(url);
                response.EnsureSuccessStatusCode();
                return await response.Content.ReadAsStringAsync();
            }
            catch (HttpRequestException ex)
            {
                retries++;
                Console.WriteLine($"Attempt {retries} failed: {ex.Message}");

                if (retries >= maxRetries)
                    throw;

                // Exponential backoff
                await Task.Delay(1000 * retries);
            }
            catch (TaskCanceledException ex)
            {
                Console.WriteLine($"Request timeout: {ex.Message}");
                throw;
            }
        }

        return null;
    }
}

Downloading Files with GET Requests

You can also use GET requests to download files:

using System;
using System.IO;
using System.Net.Http;
using System.Threading.Tasks;

public class FileDownloader
{
    private static readonly HttpClient client = new HttpClient();

    public static async Task DownloadFile(string url, string destinationPath)
    {
        try
        {
            var response = await client.GetAsync(url);
            response.EnsureSuccessStatusCode();

            byte[] fileBytes = await response.Content.ReadAsByteArrayAsync();
            await File.WriteAllBytesAsync(destinationPath, fileBytes);

            Console.WriteLine($"Downloaded {fileBytes.Length} bytes to {destinationPath}");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Download failed: {ex.Message}");
        }
    }
}

// Usage
await FileDownloader.DownloadFile(
    "https://example.com/document.pdf",
    "downloaded_document.pdf"
);

Best Practices for HTTP GET Requests in Web Scraping

  1. Reuse HttpClient: Create a single static instance instead of new instances for each request to avoid socket exhaustion
  2. Use async/await: Always use asynchronous methods for better performance and scalability
  3. Set appropriate headers: Include User-Agent and other headers to appear as a legitimate browser
  4. Implement retry logic: Network failures are common; retry with exponential backoff
  5. Handle rate limiting: Respect server resources and avoid overwhelming target websites
  6. Check status codes: Don't assume all requests succeed; handle different HTTP status codes appropriately
  7. Set timeouts: Always configure timeout values to prevent hanging requests
  8. Use proper error handling: Catch and handle HttpRequestException, TaskCanceledException, and other exceptions
  9. Respect robots.txt: Check the website's robots.txt file before scraping
  10. Consider using proxies: For large-scale scraping, rotate proxies to avoid IP blocks

Conclusion

Making HTTP GET requests in C# is straightforward with multiple options available. HttpClient is the recommended approach for modern applications, offering async support, excellent performance, and comprehensive features. Whether you're building a simple scraper or a complex data extraction system, understanding these HTTP GET request patterns will form the foundation of your web scraping projects.

For more complex scenarios involving form submissions, check out how to make HTTP POST requests in C#. Remember to always scrape responsibly, respect website terms of service, and implement appropriate delays and rate limiting to avoid overwhelming target servers.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon