How do I Check if a String Contains Specific Text in C# During Web Scraping?

When web scraping with C#, checking whether a string contains specific text is a fundamental operation you'll perform countless times. Whether you're validating scraped data, filtering content, or searching for specific elements, C# provides multiple powerful methods to accomplish this task efficiently.

Basic String Contains Method

The simplest way to check if a string contains specific text in C# is using the built-in Contains() method:

string scrapedContent = "This is a sample product description with price $99.99";

if (scrapedContent.Contains("price"))
{
    Console.WriteLine("The content mentions a price!");
}

This method is case-sensitive by default, meaning "Price" and "price" would be treated as different strings.

Case-Insensitive String Matching

For web scraping, you often need case-insensitive searches since HTML content can have inconsistent capitalization. Here are several approaches:

Using StringComparison.OrdinalIgnoreCase (.NET Core 2.1+)

string htmlContent = "<div class='Product-Title'>Gaming Laptop</div>";

if (htmlContent.Contains("product", StringComparison.OrdinalIgnoreCase))
{
    Console.WriteLine("Product information found!");
}

Using IndexOf for Older .NET Versions

string scrapedText = "Available in Multiple Colors";

if (scrapedText.IndexOf("multiple", StringComparison.OrdinalIgnoreCase) >= 0)
{
    Console.WriteLine("Multiple options available!");
}

Using ToLower() or ToUpper()

string productName = "WIRELESS HEADPHONES";

if (productName.ToLower().Contains("wireless"))
{
    Console.WriteLine("This is a wireless product");
}

Note: While simple, this approach creates new string objects and can be less efficient for large-scale scraping operations.

Advanced Pattern Matching with Regular Expressions

For complex text matching scenarios in web scraping, regular expressions provide powerful capabilities:

using System.Text.RegularExpressions;

string pageContent = "Price: $299.99, Discount: 20% off";

// Check if content contains a price pattern
if (Regex.IsMatch(pageContent, @"\$\d+\.\d{2}"))
{
    Console.WriteLine("Price information found");

    // Extract the actual price
    Match match = Regex.Match(pageContent, @"\$(\d+\.\d{2})");
    if (match.Success)
    {
        string price = match.Groups[1].Value;
        Console.WriteLine($"Extracted price: ${price}");
    }
}

// Case-insensitive regex search
if (Regex.IsMatch(pageContent, "discount", RegexOptions.IgnoreCase))
{
    Console.WriteLine("Discount information available");
}

Practical Web Scraping Examples

Example 1: Filtering Product Listings

using HtmlAgilityPack;
using System.Linq;

public class ProductScraper
{
    public List<string> FindProductsContainingKeyword(string html, string keyword)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var products = doc.DocumentNode
            .SelectNodes("//div[@class='product-item']")
            ?.Where(node => node.InnerText.Contains(keyword, StringComparison.OrdinalIgnoreCase))
            .Select(node => node.InnerText.Trim())
            .ToList();

        return products ?? new List<string>();
    }
}

// Usage
var scraper = new ProductScraper();
string htmlContent = await client.GetStringAsync("https://example.com/products");
var wirelessProducts = scraper.FindProductsContainingKeyword(htmlContent, "wireless");

Example 2: Validating Scraped Data

public class DataValidator
{
    public bool IsValidProductPage(string pageContent)
    {
        var requiredElements = new[] { "price", "add to cart", "description" };

        return requiredElements.All(element =>
            pageContent.Contains(element, StringComparison.OrdinalIgnoreCase));
    }

    public bool ContainsAnyKeyword(string text, params string[] keywords)
    {
        return keywords.Any(keyword =>
            text.Contains(keyword, StringComparison.OrdinalIgnoreCase));
    }
}

// Usage
var validator = new DataValidator();
string scrapedPage = GetScrapedContent();

if (validator.IsValidProductPage(scrapedPage))
{
    Console.WriteLine("Valid product page detected");
}

if (validator.ContainsAnyKeyword(scrapedPage, "sale", "discount", "offer"))
{
    Console.WriteLine("Special offer available!");
}

Example 3: Content Classification

public class ContentClassifier
{
    public string ClassifyProductCategory(string productDescription)
    {
        var categories = new Dictionary<string, string[]>
        {
            { "Electronics", new[] { "laptop", "phone", "tablet", "computer" } },
            { "Clothing", new[] { "shirt", "pants", "dress", "shoes" } },
            { "Home", new[] { "furniture", "decor", "kitchen", "bedroom" } }
        };

        foreach (var category in categories)
        {
            if (category.Value.Any(keyword =>
                productDescription.Contains(keyword, StringComparison.OrdinalIgnoreCase)))
            {
                return category.Key;
            }
        }

        return "Uncategorized";
    }
}

Performance Considerations

When dealing with large-scale web scraping operations, performance matters:

Use StringBuilder for Multiple Checks

using System.Text;

public bool ContainsMultipleKeywords(string content, List<string> keywords)
{
    // More efficient than multiple string concatenations
    var searchableContent = new StringBuilder(content.ToLower());

    return keywords.All(keyword =>
        searchableContent.ToString().Contains(keyword.ToLower()));
}

Compile Regular Expressions

For repeated pattern matching during web scraping, compile your regex patterns:

public class PatternMatcher
{
    private static readonly Regex EmailPattern = new Regex(
        @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
        RegexOptions.Compiled | RegexOptions.IgnoreCase
    );

    private static readonly Regex PhonePattern = new Regex(
        @"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}",
        RegexOptions.Compiled
    );

    public bool ContainsEmail(string text) => EmailPattern.IsMatch(text);
    public bool ContainsPhone(string text) => PhonePattern.IsMatch(text);
}

Handling Special Characters and Encoding

Web content often includes special characters that need careful handling:

using System.Net;
using System.Text;

public class TextProcessor
{
    public string NormalizeHtmlText(string htmlText)
    {
        // Decode HTML entities
        string decoded = WebUtility.HtmlDecode(htmlText);

        // Remove extra whitespace
        string normalized = Regex.Replace(decoded, @"\s+", " ").Trim();

        return normalized;
    }

    public bool ContainsTextInHtml(string html, string searchText)
    {
        string normalizedHtml = NormalizeHtmlText(html);
        string normalizedSearch = NormalizeHtmlText(searchText);

        return normalizedHtml.Contains(normalizedSearch, StringComparison.OrdinalIgnoreCase);
    }
}

// Usage
var processor = new TextProcessor();
string rawHtml = "&lt;div&gt;Product&nbsp;&nbsp;Name&lt;/div&gt;";

if (processor.ContainsTextInHtml(rawHtml, "Product Name"))
{
    Console.WriteLine("Match found after normalization");
}

Integration with LINQ for Data Filtering

LINQ provides powerful querying capabilities for filtering scraped data:

using System.Linq;

public class ScrapedDataFilter
{
    public class Product
    {
        public string Name { get; set; }
        public string Description { get; set; }
        public decimal Price { get; set; }
    }

    public List<Product> FilterProducts(List<Product> products, string keyword)
    {
        return products
            .Where(p => p.Name.Contains(keyword, StringComparison.OrdinalIgnoreCase) ||
                       p.Description.Contains(keyword, StringComparison.OrdinalIgnoreCase))
            .ToList();
    }

    public List<Product> FilterByMultipleKeywords(List<Product> products,
        params string[] keywords)
    {
        return products
            .Where(p => keywords.Any(k =>
                p.Name.Contains(k, StringComparison.OrdinalIgnoreCase) ||
                p.Description.Contains(k, StringComparison.OrdinalIgnoreCase)))
            .ToList();
    }
}

Error Handling and Null Safety

Always handle potential null values when working with scraped content:

public class SafeStringChecker
{
    public bool SafeContains(string source, string value)
    {
        if (string.IsNullOrEmpty(source) || string.IsNullOrEmpty(value))
        {
            return false;
        }

        return source.Contains(value, StringComparison.OrdinalIgnoreCase);
    }

    public bool SafeContainsAny(string source, params string[] values)
    {
        if (string.IsNullOrEmpty(source) || values == null || values.Length == 0)
        {
            return false;
        }

        return values.Any(v => !string.IsNullOrEmpty(v) &&
            source.Contains(v, StringComparison.OrdinalIgnoreCase));
    }
}

Complete Web Scraping Example

Here's a comprehensive example combining multiple techniques:

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class AdvancedProductScraper
{
    private readonly HttpClient _httpClient;

    public AdvancedProductScraper()
    {
        _httpClient = new HttpClient();
    }

    public async Task<List<ProductInfo>> ScrapeProducts(string url,
        string[] keywords, bool mustContainAll = false)
    {
        var html = await _httpClient.GetStringAsync(url);
        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var products = new List<ProductInfo>();
        var productNodes = doc.DocumentNode.SelectNodes("//div[@class='product']");

        if (productNodes == null) return products;

        foreach (var node in productNodes)
        {
            var productText = WebUtility.HtmlDecode(node.InnerText);

            bool matches = mustContainAll
                ? keywords.All(k => productText.Contains(k, StringComparison.OrdinalIgnoreCase))
                : keywords.Any(k => productText.Contains(k, StringComparison.OrdinalIgnoreCase));

            if (matches)
            {
                var product = new ProductInfo
                {
                    Name = node.SelectSingleNode(".//h3")?.InnerText.Trim(),
                    Description = node.SelectSingleNode(".//p")?.InnerText.Trim(),
                    HasDiscount = productText.Contains("sale", StringComparison.OrdinalIgnoreCase) ||
                                 productText.Contains("discount", StringComparison.OrdinalIgnoreCase),
                    MatchedKeywords = keywords.Where(k =>
                        productText.Contains(k, StringComparison.OrdinalIgnoreCase)).ToList()
                };

                products.Add(product);
            }
        }

        return products;
    }

    public class ProductInfo
    {
        public string Name { get; set; }
        public string Description { get; set; }
        public bool HasDiscount { get; set; }
        public List<string> MatchedKeywords { get; set; }
    }
}

Best Practices

Choose the Right Method: Use Contains() for simple checks, IndexOf() for position-aware searches, and regex for complex patterns.
Consider Case Sensitivity: Web content is unpredictable; use case-insensitive comparisons unless you specifically need case sensitivity.
Normalize Text: Always decode HTML entities and normalize whitespace before performing text searches.
Handle Nulls: Check for null or empty strings before performing contains operations to avoid NullReferenceException.
Optimize for Scale: When processing large amounts of scraped data, compile regex patterns and avoid creating unnecessary string copies.
Use String Interpolation: Modern C# string interpolation makes code more readable when building search queries.

Conclusion

Checking if strings contain specific text is a core operation in C# web scraping. By understanding the various methods available—from simple Contains() calls to advanced regex patterns—you can build robust and efficient web scrapers. Remember to always consider case sensitivity, handle special characters properly, and implement appropriate error handling for production-grade scraping applications.

Whether you're filtering product listings, validating scraped data, or classifying content, these techniques will help you effectively search and match text in your C# web scraping projects.

Table of contents

How do I Check if a String Contains Specific Text in C# During Web Scraping?

Basic String Contains Method

Case-Insensitive String Matching

Using StringComparison.OrdinalIgnoreCase (.NET Core 2.1+)

Using IndexOf for Older .NET Versions

Using ToLower() or ToUpper()

Advanced Pattern Matching with Regular Expressions

Practical Web Scraping Examples

Example 1: Filtering Product Listings

Example 2: Validating Scraped Data

Example 3: Content Classification

Performance Considerations

Use StringBuilder for Multiple Checks

Compile Regular Expressions

Handling Special Characters and Encoding

Integration with LINQ for Data Filtering

Error Handling and Null Safety

Complete Web Scraping Example

Best Practices

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

What are the common string methods in C# useful for web scraping?

How do I use Task-based asynchronous programming in C# for web scraping?

How can I replace text in strings when cleaning scraped data in C#?

Get Started Now

Support