CSS selectors provide multiple ways to target elements in web scraping and DOM manipulation. Here are the five essential techniques for selecting multiple elements:
1. Comma-Separated Selectors (OR Logic)
Select all elements that match any of several selectors by separating them with commas:
/* Selects all <h1> and <h2> elements */
h1, h2 {
color: blue;
}
/* Multiple types and classes */
.header, .title, h1, h2 {
font-weight: bold;
}
2. Descendant Selector (Nested Elements)
Select elements that are descendants (at any level) of a parent element using a space:
/* Selects all <p> elements inside <div> */
div p {
font-size: 16px;
}
/* Multiple levels */
.container article p {
line-height: 1.5;
}
3. Child Selector (Direct Children)
Select only direct children using the >
symbol:
/* Selects only direct <p> children of <div> */
div > p {
margin-left: 20px;
}
/* Useful for avoiding nested elements */
.nav > li > a {
text-decoration: none;
}
4. Adjacent Sibling Selector
Select the first element immediately following another using +
:
/* Selects the first <p> immediately after <h2> */
h2 + p {
font-weight: bold;
}
/* Common pattern for styling first paragraphs */
.intro + p {
font-size: 18px;
}
5. General Sibling Selector
Select all siblings that follow an element using ~
:
/* Selects all <p> elements that follow <h2> */
h2 ~ p {
text-decoration: underline;
}
Web Scraping Examples
Python with Beautiful Soup
from bs4 import BeautifulSoup
import requests
html = """
<div class="container">
<h1>Main Title</h1>
<p class="intro">Introduction paragraph</p>
<div class="content">
<h2>Section Title</h2>
<p>First paragraph</p>
<p>Second paragraph</p>
<ul>
<li><a href="/page1">Link 1</a></li>
<li><a href="/page2">Link 2</a></li>
</ul>
</div>
</div>
"""
soup = BeautifulSoup(html, 'html.parser')
# Comma-separated: Select all headers
headers = soup.select('h1, h2, h3')
print("Headers:", [h.text for h in headers])
# Descendant: All paragraphs inside container
paragraphs = soup.select('.container p')
print("Paragraphs:", [p.text for p in paragraphs])
# Child: Direct children only
direct_children = soup.select('.container > h1')
print("Direct children:", [c.text for c in direct_children])
# Adjacent sibling: Paragraph after h2
after_h2 = soup.select('h2 + p')
print("After H2:", [p.text for p in after_h2])
# General sibling: All paragraphs after h2
all_after_h2 = soup.select('h2 ~ p')
print("All after H2:", [p.text for p in all_after_h2])
# Complex combinations
links = soup.select('ul > li > a, .intro')
print("Links and intro:", [elem.text for elem in links])
JavaScript (Browser)
// Select all form inputs and textareas
const inputs = document.querySelectorAll('input, textarea, select');
// Select all links within navigation
const navLinks = document.querySelectorAll('nav a, .navigation a');
// Select all images with alt text
const images = document.querySelectorAll('img[alt], figure img');
// Process multiple element types
inputs.forEach(input => {
input.addEventListener('focus', () => {
input.style.borderColor = '#007bff';
});
});
// Get all headers for table of contents
const tocHeaders = document.querySelectorAll('h1, h2, h3, h4, h5, h6');
const toc = Array.from(tocHeaders).map(header => ({
level: parseInt(header.tagName.charAt(1)),
text: header.textContent,
id: header.id
}));
Node.js with Cheerio
const cheerio = require('cheerio');
const html = `
<article>
<h1>Article Title</h1>
<div class="meta">
<span class="author">John Doe</span>
<time datetime="2023-01-01">January 1, 2023</time>
</div>
<p>First paragraph</p>
<p>Second paragraph</p>
<div class="related">
<h3>Related Articles</h3>
<a href="/article1">Article 1</a>
<a href="/article2">Article 2</a>
</div>
</article>
`;
const $ = cheerio.load(html);
// Extract article metadata
const metadata = {
title: $('h1').text(),
author: $('.meta .author').text(),
date: $('.meta time').attr('datetime'),
paragraphs: $('article > p').map((i, el) => $(el).text()).get(),
relatedLinks: $('.related a').map((i, el) => ({
text: $(el).text(),
url: $(el).attr('href')
})).get()
};
console.log(metadata);
Advanced Selector Combinations
/* Multiple conditions */
div.container > p.highlight,
.sidebar ul > li:first-child,
article h2 + p.summary {
background-color: #f0f0f0;
}
/* Attribute selectors with multiples */
input[type="text"],
input[type="email"],
textarea {
border: 1px solid #ccc;
}
Performance Tips
- Specificity: More specific selectors are faster than broad ones
- Right-to-left: CSS engines read selectors from right to left
- Avoid universal selectors:
*
can be slow on large documents - Use classes over complex selectors when possible
These techniques give you powerful control over element selection in both styling and web scraping scenarios.