Table of contents

How do I select elements that contain specific text using CSS selectors?

Selecting elements that contain specific text is a common requirement in web scraping and DOM manipulation. While pure CSS selectors have limitations for text-based selection, there are several effective approaches using CSS pseudo-selectors, XPath expressions, and JavaScript methods. This guide covers all the available techniques with practical examples.

The CSS Selector Limitation

Important: Pure CSS selectors cannot directly select elements based on their text content. CSS selectors are designed for selecting elements based on their structure, attributes, classes, and IDs—not their textual content. However, there are workarounds and alternative approaches that achieve the same goal.

Method 1: Using JavaScript with CSS-like Syntax

The most common approach is combining CSS selectors with JavaScript to filter elements by text content:

Basic Text Matching

// Select all paragraphs containing "Hello World"
function selectByText(selector, text) {
    return Array.from(document.querySelectorAll(selector))
        .filter(element => element.textContent.includes(text));
}

// Usage examples
const elementsWithText = selectByText('p', 'Hello World');
const linksWithText = selectByText('a', 'Click here');
const buttonsWithText = selectByText('button', 'Submit');

Case-Insensitive Text Matching

function selectByTextIgnoreCase(selector, text) {
    return Array.from(document.querySelectorAll(selector))
        .filter(element => 
            element.textContent.toLowerCase().includes(text.toLowerCase())
        );
}

// Select all divs containing "error" (case-insensitive)
const errorDivs = selectByTextIgnoreCase('div', 'error');

Exact Text Matching

function selectByExactText(selector, text) {
    return Array.from(document.querySelectorAll(selector))
        .filter(element => element.textContent.trim() === text);
}

// Select button with exact text "Submit Form"
const submitButton = selectByExactText('button', 'Submit Form')[0];

Method 2: XPath Expressions (Recommended)

XPath provides powerful text-based selection capabilities and is supported by most web scraping tools:

Basic XPath Text Selection

// Select elements by exact text content
function selectByXPath(xpath) {
    return document.evaluate(
        xpath,
        document,
        null,
        XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
        null
    );
}

// XPath examples
const examples = [
    "//p[text()='Hello World']",              // Exact text match
    "//a[contains(text(), 'Click')]",         // Contains text
    "//button[normalize-space(text())='OK']", // Normalized text
    "//div[starts-with(text(), 'Error:')]",   // Text starts with
    "//span[ends-with(text(), '.pdf')]"       // Text ends with (XPath 2.0)
];

Advanced XPath Text Patterns

// Case-insensitive text matching with XPath
const caseInsensitiveXPath = "//p[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'hello')]";

// Multiple text conditions
const multipleConditions = "//div[contains(text(), 'Error') and contains(text(), 'failed')]";

// Text in child elements
const childText = "//div[.//span[text()='Important']]";

Method 3: Using Web Scraping Libraries

Python with BeautifulSoup

from bs4 import BeautifulSoup
import requests

# Fetch and parse HTML
response = requests.get('https://example.com')
soup = BeautifulSoup(response.content, 'html.parser')

# Find elements containing specific text
elements_with_text = soup.find_all(lambda tag: tag.string and 'Hello World' in tag.string)

# Find elements with text using CSS selectors + text filtering
paragraphs = soup.select('p')
filtered_paragraphs = [p for p in paragraphs if 'specific text' in p.get_text()]

# Using regex for advanced text matching
import re
regex_elements = soup.find_all(text=re.compile(r'Error: \d+'))

Python with Selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get('https://example.com')

# Using XPath with Selenium
element = driver.find_element(By.XPATH, "//button[contains(text(), 'Submit')]")
elements = driver.find_elements(By.XPATH, "//p[text()='Hello World']")

# Wait for element with specific text
wait = WebDriverWait(driver, 10)
element = wait.until(
    EC.presence_of_element_located((By.XPATH, "//div[contains(text(), 'Loading complete')]"))
)

JavaScript with Puppeteer

When working with dynamic content that requires JavaScript execution, Puppeteer provides powerful tools for handling browser automation:

const puppeteer = require('puppeteer');

async function scrapeTextElements() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com');

    // Wait for specific text to appear
    await page.waitForFunction(
        () => document.querySelector('body').innerText.includes('Content loaded')
    );

    // Select elements by text content
    const elements = await page.$$eval('p', paragraphs => 
        paragraphs.filter(p => p.textContent.includes('Hello World'))
    );

    // Using XPath in Puppeteer
    const xpathElements = await page.$x("//button[contains(text(), 'Click me')]");

    await browser.close();
    return elements;
}

Method 4: Advanced CSS Pseudo-Selectors

While CSS can't select by arbitrary text, some pseudo-selectors can help in specific scenarios:

Using CSS Attribute Selectors

/* Select elements with specific title attributes */
[title*="error"] { /* elements with "error" in title */ }
[alt^="Photo"] { /* images with alt text starting with "Photo" */ }
[data-text$="end"] { /* elements with data-text ending with "end" */ }

CSS Content-Based Selection (Limited)

/* Select empty elements */
:empty { }

/* Select elements that are not empty */
:not(:empty) { }

/* Select specific input values */
input[value="Submit"] { }

Practical Web Scraping Examples

Example 1: Finding Error Messages

// Function to find all error messages on a page
function findErrorMessages() {
    const selectors = ['div', 'p', 'span', '.error', '.alert'];
    const errorKeywords = ['error', 'failed', 'invalid', 'required'];

    const errorElements = [];

    selectors.forEach(selector => {
        const elements = document.querySelectorAll(selector);
        elements.forEach(element => {
            const text = element.textContent.toLowerCase();
            if (errorKeywords.some(keyword => text.includes(keyword))) {
                errorElements.push(element);
            }
        });
    });

    return errorElements;
}

Example 2: Product Price Extraction

# Using BeautifulSoup to find price elements
import re
from bs4 import BeautifulSoup

def extract_prices(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')

    # Find elements containing price patterns
    price_pattern = re.compile(r'\$\d+\.?\d*')
    price_elements = soup.find_all(text=price_pattern)

    # Alternative: Find elements with price-related classes containing numbers
    price_containers = soup.find_all(['span', 'div'], 
        class_=re.compile(r'price|cost|amount', re.I))

    prices = []
    for element in price_containers:
        text = element.get_text()
        if re.search(price_pattern, text):
            prices.append(text.strip())

    return prices

Example 3: Navigation Menu Items

When dealing with complex navigation structures, especially in single-page applications, you might need to handle dynamic content loading:

async function findNavigationItems(page, searchText) {
    // Wait for navigation to be fully loaded
    await page.waitForSelector('nav', { timeout: 5000 });

    // Find navigation links containing specific text
    const navItems = await page.evaluate((text) => {
        const links = Array.from(document.querySelectorAll('nav a, .nav-item a'));
        return links
            .filter(link => link.textContent.toLowerCase().includes(text.toLowerCase()))
            .map(link => ({
                text: link.textContent.trim(),
                href: link.href,
                visible: link.offsetParent !== null
            }));
    }, searchText);

    return navItems;
}

Performance Considerations

When selecting elements by text content, keep these performance tips in mind:

Optimize Selector Scope

// Bad: Search entire document
const badSearch = Array.from(document.querySelectorAll('*'))
    .filter(el => el.textContent.includes('search term'));

// Good: Limit search scope
const goodSearch = Array.from(document.querySelectorAll('.content-area p, .content-area div'))
    .filter(el => el.textContent.includes('search term'));

Use Efficient Text Matching

// Use indexOf for better performance than includes() in some cases
function fastTextSearch(elements, searchText) {
    return elements.filter(el => el.textContent.indexOf(searchText) !== -1);
}

// Pre-compile regex for repeated searches
const emailPattern = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
function findEmails(elements) {
    return elements.filter(el => emailPattern.test(el.textContent));
}

Browser Compatibility and Limitations

  • XPath Support: Well-supported in modern browsers but may have limitations in older versions
  • CSS Selector Level 4: Some advanced selectors are not universally supported
  • Performance: Text-based searching can be slower than structural selectors on large documents
  • Dynamic Content: May require waiting for content to load, especially with SPAs

Conclusion

While pure CSS selectors cannot directly select elements by text content, combining CSS selectors with JavaScript, XPath expressions, or web scraping libraries provides powerful solutions. For most web scraping scenarios, XPath expressions offer the most straightforward approach, while JavaScript methods provide maximum flexibility for complex text matching requirements.

Choose the method that best fits your specific use case: XPath for simplicity and power, JavaScript for custom logic, or specialized libraries for comprehensive web scraping projects. When working with dynamic content, consider the loading patterns and implement appropriate waiting strategies to ensure reliable element selection.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon