What is the :has() pseudo-class and is it supported in web scraping?
The :has()
pseudo-class is a powerful CSS Level 4 selector that enables "parent selection" by allowing you to select elements based on their descendants. Often called the "parent selector," it represents a significant advancement in CSS selector capabilities for web scraping and DOM manipulation.
Understanding the :has() Pseudo-Class
The :has()
pseudo-class selects elements that contain specific child elements or descendants. Its syntax follows the pattern:
parent:has(child-selector)
This selector targets the parent element when it contains a child matching the specified selector.
Basic Syntax Examples
/* Select divs that contain an h2 element */
div:has(h2)
/* Select articles that contain both an image and a paragraph */
article:has(img):has(p)
/* Select list items that contain a link with specific class */
li:has(a.external-link)
/* Select sections that don't contain any images */
section:not(:has(img))
Browser Support and Compatibility
Current Browser Support
As of 2024, :has()
pseudo-class support varies across browsers:
- Chrome/Edge: Full support since version 105 (September 2022)
- Firefox: Full support since version 121 (December 2023)
- Safari: Full support since version 15.4 (March 2022)
- Mobile browsers: Generally supported in modern versions
Checking Support Programmatically
// Check if :has() is supported
function hasSupport() {
try {
document.querySelector(':has(div)');
return true;
} catch (e) {
return false;
}
}
console.log('Has support:', hasSupport());
Web Scraping with :has() Pseudo-Class
Python Examples with Selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
# Setup Chrome with modern version
options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)
try:
driver.get('https://example.com')
# Select articles that contain both title and author
articles = driver.find_elements(
By.CSS_SELECTOR,
'article:has(h2):has(.author)'
)
# Select product cards that have discount badges
discounted_products = driver.find_elements(
By.CSS_SELECTOR,
'.product-card:has(.discount-badge)'
)
# Select navigation items with dropdown menus
dropdown_navs = driver.find_elements(
By.CSS_SELECTOR,
'nav li:has(ul.dropdown)'
)
for article in articles:
title = article.find_element(By.CSS_SELECTOR, 'h2').text
author = article.find_element(By.CSS_SELECTOR, '.author').text
print(f"Title: {title}, Author: {author}")
finally:
driver.quit()
JavaScript Examples with Puppeteer
const puppeteer = require('puppeteer');
async function scrapeWithHasSelector() {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
// Select blog posts that contain featured images
const featuredPosts = await page.$$eval(
'.blog-post:has(.featured-image)',
posts => posts.map(post => ({
title: post.querySelector('h2')?.textContent,
image: post.querySelector('.featured-image img')?.src
}))
);
// Select form fields that have validation errors
const errorFields = await page.$$eval(
'.form-group:has(.error-message)',
groups => groups.map(group => ({
fieldName: group.querySelector('label')?.textContent,
errorMessage: group.querySelector('.error-message')?.textContent
}))
);
// Select cards that contain call-to-action buttons
const actionableCards = await page.$$eval(
'.card:has(.cta-button)',
cards => cards.map(card => ({
title: card.querySelector('.card-title')?.textContent,
ctaText: card.querySelector('.cta-button')?.textContent
}))
);
console.log('Featured posts:', featuredPosts);
console.log('Error fields:', errorFields);
console.log('Actionable cards:', actionableCards);
await browser.close();
}
scrapeWithHasSelector();
Advanced :has() Selector Patterns
Complex Nested Selections
/* Select tables that contain specific data patterns */
table:has(tr:has(td.price):has(td.discount))
/* Select containers with multiple media types */
.media-container:has(img):has(video):has(.caption)
/* Select forms with required fields that have errors */
form:has(.required:has(.error))
Combining with Other Pseudo-Classes
/* Select first child elements that contain specific content */
.item:first-child:has(.featured-badge)
/* Select hover-able elements that contain interactive content */
.card:hover:has(button, a, input)
/* Select elements that don't have certain children */
.product:not(:has(.out-of-stock))
Performance Considerations
The :has()
pseudo-class can be computationally expensive, especially with complex selectors:
// More efficient - specific targeting
document.querySelectorAll('article:has(> .author)')
// Less efficient - broad descendant search
document.querySelectorAll('div:has(.deeply-nested-element)')
// Optimize by limiting scope
const container = document.querySelector('#main-content');
const results = container.querySelectorAll('.item:has(.special-badge)');
Practical Web Scraping Applications
E-commerce Product Scraping
from selenium import webdriver
from selenium.webdriver.common.by import By
def scrape_products_with_reviews():
driver = webdriver.Chrome()
driver.get('https://shop.example.com')
# Only scrape products that have customer reviews
products_with_reviews = driver.find_elements(
By.CSS_SELECTOR,
'.product-item:has(.reviews-section):has(.rating-stars)'
)
products_data = []
for product in products_with_reviews:
data = {
'name': product.find_element(By.CSS_SELECTOR, '.product-name').text,
'price': product.find_element(By.CSS_SELECTOR, '.price').text,
'rating': product.find_element(By.CSS_SELECTOR, '.rating-stars').get_attribute('data-rating'),
'review_count': product.find_element(By.CSS_SELECTOR, '.review-count').text
}
products_data.append(data)
driver.quit()
return products_data
Content Management Systems
// Scraping CMS content that has specific metadata
async function scrapeCMSContent(page) {
// Select articles that have both publication date and author
const completeArticles = await page.$$eval(
'.article:has(.publish-date):has(.author-info)',
articles => articles.map(article => ({
title: article.querySelector('.article-title')?.textContent,
author: article.querySelector('.author-info .name')?.textContent,
publishDate: article.querySelector('.publish-date')?.textContent,
summary: article.querySelector('.article-summary')?.textContent
}))
);
return completeArticles;
}
Fallback Strategies for Unsupported Browsers
When :has()
isn't supported, implement fallback strategies:
JavaScript-Based Fallback
function selectElementsWithChild(parentSelector, childSelector) {
// Try modern :has() first
if (CSS.supports('selector(:has(*))')) {
return document.querySelectorAll(`${parentSelector}:has(${childSelector})`);
}
// Fallback for older browsers
const parents = document.querySelectorAll(parentSelector);
return Array.from(parents).filter(parent =>
parent.querySelector(childSelector) !== null
);
}
// Usage
const cardsWithButtons = selectElementsWithChild('.card', '.cta-button');
Python Fallback with Beautiful Soup
from bs4 import BeautifulSoup
import requests
def find_elements_with_child(soup, parent_selector, child_selector):
"""Fallback implementation for :has() functionality"""
parents = soup.select(parent_selector)
return [parent for parent in parents
if parent.select(child_selector)]
# Usage
response = requests.get('https://example.com')
soup = BeautifulSoup(response.content, 'html.parser')
# Find articles that contain images
articles_with_images = find_elements_with_child(
soup, 'article', 'img'
)
Best Practices for Web Scraping with :has()
1. Performance Optimization
// Cache selectors to avoid repeated DOM queries
const complexSelector = '.container:has(.special):has(.featured)';
const cachedElements = document.querySelectorAll(complexSelector);
// Use more specific selectors when possible
// Good: article:has(> .author)
// Avoid: div:has(.author) // Too broad
2. Error Handling
from selenium.common.exceptions import InvalidSelectorException
def safe_has_selector(driver, selector):
try:
return driver.find_elements(By.CSS_SELECTOR, selector)
except InvalidSelectorException:
# Fallback to alternative method
return fallback_selection(driver, selector)
3. Cross-Browser Testing
When handling browser sessions in Puppeteer, ensure your scraping logic works across different browser versions that may have varying :has()
support.
Integration with Modern Web Scraping Tools
The :has()
pseudo-class works seamlessly with modern tools when interacting with DOM elements in Puppeteer, providing more precise element targeting for dynamic content extraction.
Conclusion
The :has()
pseudo-class represents a significant advancement in CSS selector capabilities, offering powerful parent selection functionality that's particularly valuable for web scraping. With growing browser support and practical fallback strategies available, it's becoming an essential tool for developers who need to select elements based on their content structure.
While browser support is now widespread, always implement appropriate fallbacks and performance considerations when using :has()
in production web scraping applications. The selector's ability to target parent elements based on child content makes complex data extraction scenarios much more manageable and maintainable.
Remember to test your selectors across different browsers and implement graceful degradation for maximum compatibility in your web scraping projects.