Table of contents

How do I Check if a String Contains Specific Text in C# During Web Scraping?

When web scraping with C#, checking whether a string contains specific text is a fundamental operation you'll perform countless times. Whether you're validating scraped data, filtering content, or searching for specific elements, C# provides multiple powerful methods to accomplish this task efficiently.

Basic String Contains Method

The simplest way to check if a string contains specific text in C# is using the built-in Contains() method:

string scrapedContent = "This is a sample product description with price $99.99";

if (scrapedContent.Contains("price"))
{
    Console.WriteLine("The content mentions a price!");
}

This method is case-sensitive by default, meaning "Price" and "price" would be treated as different strings.

Case-Insensitive String Matching

For web scraping, you often need case-insensitive searches since HTML content can have inconsistent capitalization. Here are several approaches:

Using StringComparison.OrdinalIgnoreCase (.NET Core 2.1+)

string htmlContent = "<div class='Product-Title'>Gaming Laptop</div>";

if (htmlContent.Contains("product", StringComparison.OrdinalIgnoreCase))
{
    Console.WriteLine("Product information found!");
}

Using IndexOf for Older .NET Versions

string scrapedText = "Available in Multiple Colors";

if (scrapedText.IndexOf("multiple", StringComparison.OrdinalIgnoreCase) >= 0)
{
    Console.WriteLine("Multiple options available!");
}

Using ToLower() or ToUpper()

string productName = "WIRELESS HEADPHONES";

if (productName.ToLower().Contains("wireless"))
{
    Console.WriteLine("This is a wireless product");
}

Note: While simple, this approach creates new string objects and can be less efficient for large-scale scraping operations.

Advanced Pattern Matching with Regular Expressions

For complex text matching scenarios in web scraping, regular expressions provide powerful capabilities:

using System.Text.RegularExpressions;

string pageContent = "Price: $299.99, Discount: 20% off";

// Check if content contains a price pattern
if (Regex.IsMatch(pageContent, @"\$\d+\.\d{2}"))
{
    Console.WriteLine("Price information found");

    // Extract the actual price
    Match match = Regex.Match(pageContent, @"\$(\d+\.\d{2})");
    if (match.Success)
    {
        string price = match.Groups[1].Value;
        Console.WriteLine($"Extracted price: ${price}");
    }
}

// Case-insensitive regex search
if (Regex.IsMatch(pageContent, "discount", RegexOptions.IgnoreCase))
{
    Console.WriteLine("Discount information available");
}

Practical Web Scraping Examples

Example 1: Filtering Product Listings

using HtmlAgilityPack;
using System.Linq;

public class ProductScraper
{
    public List<string> FindProductsContainingKeyword(string html, string keyword)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var products = doc.DocumentNode
            .SelectNodes("//div[@class='product-item']")
            ?.Where(node => node.InnerText.Contains(keyword, StringComparison.OrdinalIgnoreCase))
            .Select(node => node.InnerText.Trim())
            .ToList();

        return products ?? new List<string>();
    }
}

// Usage
var scraper = new ProductScraper();
string htmlContent = await client.GetStringAsync("https://example.com/products");
var wirelessProducts = scraper.FindProductsContainingKeyword(htmlContent, "wireless");

Example 2: Validating Scraped Data

public class DataValidator
{
    public bool IsValidProductPage(string pageContent)
    {
        var requiredElements = new[] { "price", "add to cart", "description" };

        return requiredElements.All(element =>
            pageContent.Contains(element, StringComparison.OrdinalIgnoreCase));
    }

    public bool ContainsAnyKeyword(string text, params string[] keywords)
    {
        return keywords.Any(keyword =>
            text.Contains(keyword, StringComparison.OrdinalIgnoreCase));
    }
}

// Usage
var validator = new DataValidator();
string scrapedPage = GetScrapedContent();

if (validator.IsValidProductPage(scrapedPage))
{
    Console.WriteLine("Valid product page detected");
}

if (validator.ContainsAnyKeyword(scrapedPage, "sale", "discount", "offer"))
{
    Console.WriteLine("Special offer available!");
}

Example 3: Content Classification

public class ContentClassifier
{
    public string ClassifyProductCategory(string productDescription)
    {
        var categories = new Dictionary<string, string[]>
        {
            { "Electronics", new[] { "laptop", "phone", "tablet", "computer" } },
            { "Clothing", new[] { "shirt", "pants", "dress", "shoes" } },
            { "Home", new[] { "furniture", "decor", "kitchen", "bedroom" } }
        };

        foreach (var category in categories)
        {
            if (category.Value.Any(keyword =>
                productDescription.Contains(keyword, StringComparison.OrdinalIgnoreCase)))
            {
                return category.Key;
            }
        }

        return "Uncategorized";
    }
}

Performance Considerations

When dealing with large-scale web scraping operations, performance matters:

Use StringBuilder for Multiple Checks

using System.Text;

public bool ContainsMultipleKeywords(string content, List<string> keywords)
{
    // More efficient than multiple string concatenations
    var searchableContent = new StringBuilder(content.ToLower());

    return keywords.All(keyword =>
        searchableContent.ToString().Contains(keyword.ToLower()));
}

Compile Regular Expressions

For repeated pattern matching during web scraping, compile your regex patterns:

public class PatternMatcher
{
    private static readonly Regex EmailPattern = new Regex(
        @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b",
        RegexOptions.Compiled | RegexOptions.IgnoreCase
    );

    private static readonly Regex PhonePattern = new Regex(
        @"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}",
        RegexOptions.Compiled
    );

    public bool ContainsEmail(string text) => EmailPattern.IsMatch(text);
    public bool ContainsPhone(string text) => PhonePattern.IsMatch(text);
}

Handling Special Characters and Encoding

Web content often includes special characters that need careful handling:

using System.Net;
using System.Text;

public class TextProcessor
{
    public string NormalizeHtmlText(string htmlText)
    {
        // Decode HTML entities
        string decoded = WebUtility.HtmlDecode(htmlText);

        // Remove extra whitespace
        string normalized = Regex.Replace(decoded, @"\s+", " ").Trim();

        return normalized;
    }

    public bool ContainsTextInHtml(string html, string searchText)
    {
        string normalizedHtml = NormalizeHtmlText(html);
        string normalizedSearch = NormalizeHtmlText(searchText);

        return normalizedHtml.Contains(normalizedSearch, StringComparison.OrdinalIgnoreCase);
    }
}

// Usage
var processor = new TextProcessor();
string rawHtml = "&lt;div&gt;Product&nbsp;&nbsp;Name&lt;/div&gt;";

if (processor.ContainsTextInHtml(rawHtml, "Product Name"))
{
    Console.WriteLine("Match found after normalization");
}

Integration with LINQ for Data Filtering

LINQ provides powerful querying capabilities for filtering scraped data:

using System.Linq;

public class ScrapedDataFilter
{
    public class Product
    {
        public string Name { get; set; }
        public string Description { get; set; }
        public decimal Price { get; set; }
    }

    public List<Product> FilterProducts(List<Product> products, string keyword)
    {
        return products
            .Where(p => p.Name.Contains(keyword, StringComparison.OrdinalIgnoreCase) ||
                       p.Description.Contains(keyword, StringComparison.OrdinalIgnoreCase))
            .ToList();
    }

    public List<Product> FilterByMultipleKeywords(List<Product> products,
        params string[] keywords)
    {
        return products
            .Where(p => keywords.Any(k =>
                p.Name.Contains(k, StringComparison.OrdinalIgnoreCase) ||
                p.Description.Contains(k, StringComparison.OrdinalIgnoreCase)))
            .ToList();
    }
}

Error Handling and Null Safety

Always handle potential null values when working with scraped content:

public class SafeStringChecker
{
    public bool SafeContains(string source, string value)
    {
        if (string.IsNullOrEmpty(source) || string.IsNullOrEmpty(value))
        {
            return false;
        }

        return source.Contains(value, StringComparison.OrdinalIgnoreCase);
    }

    public bool SafeContainsAny(string source, params string[] values)
    {
        if (string.IsNullOrEmpty(source) || values == null || values.Length == 0)
        {
            return false;
        }

        return values.Any(v => !string.IsNullOrEmpty(v) &&
            source.Contains(v, StringComparison.OrdinalIgnoreCase));
    }
}

Complete Web Scraping Example

Here's a comprehensive example combining multiple techniques:

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

public class AdvancedProductScraper
{
    private readonly HttpClient _httpClient;

    public AdvancedProductScraper()
    {
        _httpClient = new HttpClient();
    }

    public async Task<List<ProductInfo>> ScrapeProducts(string url,
        string[] keywords, bool mustContainAll = false)
    {
        var html = await _httpClient.GetStringAsync(url);
        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var products = new List<ProductInfo>();
        var productNodes = doc.DocumentNode.SelectNodes("//div[@class='product']");

        if (productNodes == null) return products;

        foreach (var node in productNodes)
        {
            var productText = WebUtility.HtmlDecode(node.InnerText);

            bool matches = mustContainAll
                ? keywords.All(k => productText.Contains(k, StringComparison.OrdinalIgnoreCase))
                : keywords.Any(k => productText.Contains(k, StringComparison.OrdinalIgnoreCase));

            if (matches)
            {
                var product = new ProductInfo
                {
                    Name = node.SelectSingleNode(".//h3")?.InnerText.Trim(),
                    Description = node.SelectSingleNode(".//p")?.InnerText.Trim(),
                    HasDiscount = productText.Contains("sale", StringComparison.OrdinalIgnoreCase) ||
                                 productText.Contains("discount", StringComparison.OrdinalIgnoreCase),
                    MatchedKeywords = keywords.Where(k =>
                        productText.Contains(k, StringComparison.OrdinalIgnoreCase)).ToList()
                };

                products.Add(product);
            }
        }

        return products;
    }

    public class ProductInfo
    {
        public string Name { get; set; }
        public string Description { get; set; }
        public bool HasDiscount { get; set; }
        public List<string> MatchedKeywords { get; set; }
    }
}

Best Practices

  1. Choose the Right Method: Use Contains() for simple checks, IndexOf() for position-aware searches, and regex for complex patterns.

  2. Consider Case Sensitivity: Web content is unpredictable; use case-insensitive comparisons unless you specifically need case sensitivity.

  3. Normalize Text: Always decode HTML entities and normalize whitespace before performing text searches.

  4. Handle Nulls: Check for null or empty strings before performing contains operations to avoid NullReferenceException.

  5. Optimize for Scale: When processing large amounts of scraped data, compile regex patterns and avoid creating unnecessary string copies.

  6. Use String Interpolation: Modern C# string interpolation makes code more readable when building search queries.

Conclusion

Checking if strings contain specific text is a core operation in C# web scraping. By understanding the various methods available—from simple Contains() calls to advanced regex patterns—you can build robust and efficient web scrapers. Remember to always consider case sensitivity, handle special characters properly, and implement appropriate error handling for production-grade scraping applications.

Whether you're filtering product listings, validating scraped data, or classifying content, these techniques will help you effectively search and match text in your C# web scraping projects.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon