What is the :has() pseudo-class and is it supported in web scraping?

The :has() pseudo-class is a powerful CSS Level 4 selector that enables "parent selection" by allowing you to select elements based on their descendants. Often called the "parent selector," it represents a significant advancement in CSS selector capabilities for web scraping and DOM manipulation.

Understanding the :has() Pseudo-Class

The :has() pseudo-class selects elements that contain specific child elements or descendants. Its syntax follows the pattern:

parent:has(child-selector)

This selector targets the parent element when it contains a child matching the specified selector.

Basic Syntax Examples

/* Select divs that contain an h2 element */
div:has(h2)

/* Select articles that contain both an image and a paragraph */
article:has(img):has(p)

/* Select list items that contain a link with specific class */
li:has(a.external-link)

/* Select sections that don't contain any images */
section:not(:has(img))

Browser Support and Compatibility

Current Browser Support

As of 2024, :has() pseudo-class support varies across browsers:

Chrome/Edge: Full support since version 105 (September 2022)
Firefox: Full support since version 121 (December 2023)
Safari: Full support since version 15.4 (March 2022)
Mobile browsers: Generally supported in modern versions

Checking Support Programmatically

// Check if :has() is supported
function hasSupport() {
  try {
    document.querySelector(':has(div)');
    return true;
  } catch (e) {
    return false;
  }
}

console.log('Has support:', hasSupport());

Web Scraping with :has() Pseudo-Class

Python Examples with Selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

# Setup Chrome with modern version
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

try:
    driver.get('https://example.com')

    # Select articles that contain both title and author
    articles = driver.find_elements(
        By.CSS_SELECTOR, 
        'article:has(h2):has(.author)'
    )

    # Select product cards that have discount badges
    discounted_products = driver.find_elements(
        By.CSS_SELECTOR,
        '.product-card:has(.discount-badge)'
    )

    # Select navigation items with dropdown menus
    dropdown_navs = driver.find_elements(
        By.CSS_SELECTOR,
        'nav li:has(ul.dropdown)'
    )

    for article in articles:
        title = article.find_element(By.CSS_SELECTOR, 'h2').text
        author = article.find_element(By.CSS_SELECTOR, '.author').text
        print(f"Title: {title}, Author: {author}")

finally:
    driver.quit()

JavaScript Examples with Puppeteer

const puppeteer = require('puppeteer');

async function scrapeWithHasSelector() {
  const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();

  await page.goto('https://example.com');

  // Select blog posts that contain featured images
  const featuredPosts = await page.$$eval(
    '.blog-post:has(.featured-image)',
    posts => posts.map(post => ({
      title: post.querySelector('h2')?.textContent,
      image: post.querySelector('.featured-image img')?.src
    }))
  );

  // Select form fields that have validation errors
  const errorFields = await page.$$eval(
    '.form-group:has(.error-message)',
    groups => groups.map(group => ({
      fieldName: group.querySelector('label')?.textContent,
      errorMessage: group.querySelector('.error-message')?.textContent
    }))
  );

  // Select cards that contain call-to-action buttons
  const actionableCards = await page.$$eval(
    '.card:has(.cta-button)',
    cards => cards.map(card => ({
      title: card.querySelector('.card-title')?.textContent,
      ctaText: card.querySelector('.cta-button')?.textContent
    }))
  );

  console.log('Featured posts:', featuredPosts);
  console.log('Error fields:', errorFields);
  console.log('Actionable cards:', actionableCards);

  await browser.close();
}

scrapeWithHasSelector();

Advanced :has() Selector Patterns

Complex Nested Selections

/* Select tables that contain specific data patterns */
table:has(tr:has(td.price):has(td.discount))

/* Select containers with multiple media types */
.media-container:has(img):has(video):has(.caption)

/* Select forms with required fields that have errors */
form:has(.required:has(.error))

Combining with Other Pseudo-Classes

/* Select first child elements that contain specific content */
.item:first-child:has(.featured-badge)

/* Select hover-able elements that contain interactive content */
.card:hover:has(button, a, input)

/* Select elements that don't have certain children */
.product:not(:has(.out-of-stock))

Performance Considerations

The :has() pseudo-class can be computationally expensive, especially with complex selectors:

// More efficient - specific targeting
document.querySelectorAll('article:has(> .author)')

// Less efficient - broad descendant search
document.querySelectorAll('div:has(.deeply-nested-element)')

// Optimize by limiting scope
const container = document.querySelector('#main-content');
const results = container.querySelectorAll('.item:has(.special-badge)');

Practical Web Scraping Applications

E-commerce Product Scraping

from selenium import webdriver
from selenium.webdriver.common.by import By

def scrape_products_with_reviews():
    driver = webdriver.Chrome()
    driver.get('https://shop.example.com')

    # Only scrape products that have customer reviews
    products_with_reviews = driver.find_elements(
        By.CSS_SELECTOR,
        '.product-item:has(.reviews-section):has(.rating-stars)'
    )

    products_data = []
    for product in products_with_reviews:
        data = {
            'name': product.find_element(By.CSS_SELECTOR, '.product-name').text,
            'price': product.find_element(By.CSS_SELECTOR, '.price').text,
            'rating': product.find_element(By.CSS_SELECTOR, '.rating-stars').get_attribute('data-rating'),
            'review_count': product.find_element(By.CSS_SELECTOR, '.review-count').text
        }
        products_data.append(data)

    driver.quit()
    return products_data

Content Management Systems

// Scraping CMS content that has specific metadata
async function scrapeCMSContent(page) {
  // Select articles that have both publication date and author
  const completeArticles = await page.$$eval(
    '.article:has(.publish-date):has(.author-info)',
    articles => articles.map(article => ({
      title: article.querySelector('.article-title')?.textContent,
      author: article.querySelector('.author-info .name')?.textContent,
      publishDate: article.querySelector('.publish-date')?.textContent,
      summary: article.querySelector('.article-summary')?.textContent
    }))
  );

  return completeArticles;
}

Fallback Strategies for Unsupported Browsers

When :has() isn't supported, implement fallback strategies:

JavaScript-Based Fallback

function selectElementsWithChild(parentSelector, childSelector) {
  // Try modern :has() first
  if (CSS.supports('selector(:has(*))')) {
    return document.querySelectorAll(`${parentSelector}:has(${childSelector})`);
  }

  // Fallback for older browsers
  const parents = document.querySelectorAll(parentSelector);
  return Array.from(parents).filter(parent => 
    parent.querySelector(childSelector) !== null
  );
}

// Usage
const cardsWithButtons = selectElementsWithChild('.card', '.cta-button');

Python Fallback with Beautiful Soup

from bs4 import BeautifulSoup
import requests

def find_elements_with_child(soup, parent_selector, child_selector):
    """Fallback implementation for :has() functionality"""
    parents = soup.select(parent_selector)
    return [parent for parent in parents 
            if parent.select(child_selector)]

# Usage
response = requests.get('https://example.com')
soup = BeautifulSoup(response.content, 'html.parser')

# Find articles that contain images
articles_with_images = find_elements_with_child(
    soup, 'article', 'img'
)

Best Practices for Web Scraping with :has()

1. Performance Optimization

// Cache selectors to avoid repeated DOM queries
const complexSelector = '.container:has(.special):has(.featured)';
const cachedElements = document.querySelectorAll(complexSelector);

// Use more specific selectors when possible
// Good: article:has(> .author)
// Avoid: div:has(.author) // Too broad

2. Error Handling

from selenium.common.exceptions import InvalidSelectorException

def safe_has_selector(driver, selector):
    try:
        return driver.find_elements(By.CSS_SELECTOR, selector)
    except InvalidSelectorException:
        # Fallback to alternative method
        return fallback_selection(driver, selector)

3. Cross-Browser Testing

When handling browser sessions in Puppeteer, ensure your scraping logic works across different browser versions that may have varying :has() support.

Integration with Modern Web Scraping Tools

The :has() pseudo-class works seamlessly with modern tools when interacting with DOM elements in Puppeteer, providing more precise element targeting for dynamic content extraction.

Conclusion

The :has() pseudo-class represents a significant advancement in CSS selector capabilities, offering powerful parent selection functionality that's particularly valuable for web scraping. With growing browser support and practical fallback strategies available, it's becoming an essential tool for developers who need to select elements based on their content structure.

While browser support is now widespread, always implement appropriate fallbacks and performance considerations when using :has() in production web scraping applications. The selector's ability to target parent elements based on child content makes complex data extraction scenarios much more manageable and maintainable.

Remember to test your selectors across different browsers and implement graceful degradation for maximum compatibility in your web scraping projects.

Table of contents