Table of contents

How can I select elements based on their font-family or color properties?

Selecting HTML elements based on their visual properties like font-family or color is a common requirement in web scraping and automation. Unlike traditional CSS selectors that target structural or attribute-based properties, visual property selection requires accessing computed styles through browser APIs or specialized tools.

Understanding Computed Styles

CSS properties like font-family and color are computed styles that browsers calculate from multiple sources including inline styles, stylesheets, and inheritance. These properties aren't directly accessible through standard CSS selectors but can be retrieved using JavaScript's getComputedStyle() method.

Basic Approach with JavaScript

Here's how to select elements based on their computed styles:

// Find all elements with a specific font-family
function findElementsByFontFamily(fontFamily) {
    const allElements = document.querySelectorAll('*');
    const matchingElements = [];

    allElements.forEach(element => {
        const computedStyle = window.getComputedStyle(element);
        if (computedStyle.fontFamily.includes(fontFamily)) {
            matchingElements.push(element);
        }
    });

    return matchingElements;
}

// Find elements with specific color
function findElementsByColor(color) {
    const allElements = document.querySelectorAll('*');
    const matchingElements = [];

    allElements.forEach(element => {
        const computedStyle = window.getComputedStyle(element);
        if (computedStyle.color === color) {
            matchingElements.push(element);
        }
    });

    return matchingElements;
}

// Usage examples
const arialElements = findElementsByFontFamily('Arial');
const redElements = findElementsByColor('rgb(255, 0, 0)');

Using Puppeteer for Style-Based Selection

Puppeteer provides powerful capabilities for selecting elements based on computed styles. This approach is particularly useful when handling dynamic content that loads after page load:

const puppeteer = require('puppeteer');

async function selectByVisualProperties() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://example.com');

    // Select elements by font-family
    const elementsWithArial = await page.evaluate(() => {
        const elements = Array.from(document.querySelectorAll('*'));
        return elements
            .filter(el => {
                const style = window.getComputedStyle(el);
                return style.fontFamily.toLowerCase().includes('arial');
            })
            .map(el => ({
                tagName: el.tagName,
                textContent: el.textContent.substring(0, 50),
                fontFamily: window.getComputedStyle(el).fontFamily
            }));
    });

    // Select elements by color
    const redTextElements = await page.evaluate(() => {
        const elements = Array.from(document.querySelectorAll('*'));
        return elements
            .filter(el => {
                const style = window.getComputedStyle(el);
                return style.color === 'rgb(255, 0, 0)';
            })
            .map(el => ({
                tagName: el.tagName,
                textContent: el.textContent.substring(0, 50),
                color: window.getComputedStyle(el).color
            }));
    });

    console.log('Elements with Arial font:', elementsWithArial);
    console.log('Elements with red text:', redTextElements);

    await browser.close();
}

Python Implementation with Selenium

Selenium WebDriver also supports computed style access for element selection:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def find_elements_by_font_family(driver, font_family):
    """Find elements with specific font-family"""
    script = """
    const elements = Array.from(document.querySelectorAll('*'));
    return elements.filter(el => {
        const style = window.getComputedStyle(el);
        return style.fontFamily.toLowerCase().includes(arguments[0].toLowerCase());
    });
    """
    return driver.execute_script(script, font_family)

def find_elements_by_color(driver, color):
    """Find elements with specific color"""
    script = """
    const elements = Array.from(document.querySelectorAll('*'));
    return elements.filter(el => {
        const style = window.getComputedStyle(el);
        return style.color === arguments[0];
    });
    """
    return driver.execute_script(script, color)

# Usage example
driver = webdriver.Chrome()
driver.get('https://example.com')

# Wait for page to load
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.TAG_NAME, "body"))
)

# Find elements with specific styles
arial_elements = find_elements_by_font_family(driver, 'Arial')
red_elements = find_elements_by_color(driver, 'rgb(255, 0, 0)')

print(f"Found {len(arial_elements)} elements with Arial font")
print(f"Found {len(red_elements)} elements with red color")

driver.quit()

Advanced Filtering Techniques

Font-Family Matching with Fallbacks

Font-family declarations often include fallback fonts. Here's how to handle complex font stacks:

function matchesFontFamily(computedFontFamily, targetFont) {
    const fontStack = computedFontFamily.toLowerCase().split(',');
    return fontStack.some(font => 
        font.trim().replace(/['"]/g, '').includes(targetFont.toLowerCase())
    );
}

// More robust font-family matching
function findElementsByFontFamilyAdvanced(targetFont) {
    const allElements = document.querySelectorAll('*');
    const matchingElements = [];

    allElements.forEach(element => {
        const computedStyle = window.getComputedStyle(element);
        if (matchesFontFamily(computedStyle.fontFamily, targetFont)) {
            matchingElements.push(element);
        }
    });

    return matchingElements;
}

Color Format Normalization

Colors can be specified in various formats (hex, rgb, hsl, named colors). Here's a utility to normalize color comparisons:

function normalizeColor(color) {
    // Create a temporary element to get computed color
    const temp = document.createElement('div');
    temp.style.color = color;
    document.body.appendChild(temp);
    const computedColor = window.getComputedStyle(temp).color;
    document.body.removeChild(temp);
    return computedColor;
}

function findElementsByNormalizedColor(targetColor) {
    const normalizedTarget = normalizeColor(targetColor);
    const allElements = document.querySelectorAll('*');
    const matchingElements = [];

    allElements.forEach(element => {
        const computedStyle = window.getComputedStyle(element);
        if (computedStyle.color === normalizedTarget) {
            matchingElements.push(element);
        }
    });

    return matchingElements;
}

// Usage with different color formats
const redElements1 = findElementsByNormalizedColor('#FF0000');
const redElements2 = findElementsByNormalizedColor('red');
const redElements3 = findElementsByNormalizedColor('rgb(255, 0, 0)');

Performance Considerations

Selecting elements by computed styles can be expensive since it requires style calculation for every element. Here are optimization strategies:

Scoped Searches

// Instead of searching the entire document
function findInScope(scope, property, value) {
    const elements = scope.querySelectorAll('*');
    const matches = [];

    for (const element of elements) {
        const style = window.getComputedStyle(element);
        if (style[property] === value) {
            matches.push(element);
        }
    }

    return matches;
}

// Search within specific containers
const contentArea = document.querySelector('.content');
const redElementsInContent = findInScope(contentArea, 'color', 'rgb(255, 0, 0)');

Caching Computed Styles

class StyleCache {
    constructor() {
        this.cache = new WeakMap();
    }

    getComputedStyle(element) {
        if (!this.cache.has(element)) {
            this.cache.set(element, window.getComputedStyle(element));
        }
        return this.cache.get(element);
    }
}

const styleCache = new StyleCache();

function findElementsWithCache(property, value) {
    const elements = document.querySelectorAll('*');
    const matches = [];

    for (const element of elements) {
        const style = styleCache.getComputedStyle(element);
        if (style[property] === value) {
            matches.push(element);
        }
    }

    return matches;
}

Practical Use Cases

Content Analysis

This technique is valuable for analyzing visual content patterns:

def analyze_typography(driver):
    """Analyze typography usage on a page"""
    script = """
    const elements = Array.from(document.querySelectorAll('h1, h2, h3, h4, h5, h6, p, span, div'));
    const typography = {};

    elements.forEach(el => {
        if (el.textContent.trim()) {
            const style = window.getComputedStyle(el);
            const fontFamily = style.fontFamily;
            const fontSize = style.fontSize;
            const fontWeight = style.fontWeight;

            const key = `${fontFamily}-${fontSize}-${fontWeight}`;
            if (!typography[key]) {
                typography[key] = {
                    fontFamily,
                    fontSize,
                    fontWeight,
                    count: 0,
                    elements: []
                };
            }
            typography[key].count++;
            typography[key].elements.push({
                tagName: el.tagName,
                text: el.textContent.substring(0, 50)
            });
        }
    });

    return typography;
    """
    return driver.execute_script(script)

Brand Consistency Checking

async function checkBrandColors(page, brandColors) {
    const colorUsage = await page.evaluate((colors) => {
        const elements = Array.from(document.querySelectorAll('*'));
        const usage = {};

        colors.forEach(color => {
            usage[color] = elements.filter(el => {
                const style = window.getComputedStyle(el);
                return style.color === color || style.backgroundColor === color;
            }).length;
        });

        return usage;
    }, brandColors);

    return colorUsage;
}

// Usage
const brandColors = ['rgb(0, 123, 255)', 'rgb(40, 167, 69)', 'rgb(220, 53, 69)'];
const colorUsage = await checkBrandColors(page, brandColors);
console.log('Brand color usage:', colorUsage);

Integration with Modern Web Scraping

When working with modern web applications, combining style-based selection with other techniques enhances scraping reliability. For instance, when interacting with DOM elements in Puppeteer, you might need to wait for styles to be applied before making selections.

Limitations and Considerations

  1. Performance Impact: Computing styles for all elements is resource-intensive
  2. Dynamic Styles: Styles may change based on user interactions or media queries
  3. Cross-Browser Differences: Color representation may vary between browsers
  4. Inherited Styles: Elements may inherit styles from parent elements
  5. Pseudo-elements: Computed styles don't include pseudo-element styles

Best Practices

  1. Combine with Structural Selectors: Use computed styles as a secondary filter after structural selection
  2. Cache Results: Store computed styles when processing multiple properties
  3. Normalize Values: Convert colors and fonts to consistent formats before comparison
  4. Scope Searches: Limit searches to specific page sections when possible
  5. Handle Edge Cases: Account for transparent colors, system fonts, and fallback values

By understanding these techniques and considerations, you can effectively select elements based on their visual properties, opening up new possibilities for content analysis, quality assurance, and advanced web scraping scenarios.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon