CSS (Cascading Style Sheets) selectors are patterns used to select the elements you want to style on a web page. In web scraping, selectors are also used to identify the specific elements you want to extract data from. There are several types of CSS selectors, each with its own use cases:
Simple selectors:
- Type selector: Selects all elements that match the given node name.
div { color: blue; }
- Class selector: Selects all elements that have the specified class attribute.
.alert { font-weight: bold; }
- ID selector: Selects an element based on the value of its
id
attribute. There should be only one element with a given ID in a document.
#header { background-color: #ff0; }
Attribute selectors:
- Presence and value attribute selectors: Selects elements with a certain attribute or with a specific value for an attribute.
/* Presence */ input[required] { border: 1px solid red; } /* Exact value */ input[type="text"] { width: 200px; }
- Partial value attribute selectors: Selects elements whose attribute value contains a specified substring.
/* Substring anywhere */ a[href*="example"] { color: green; } /* Substring at the beginning */ a[href^="http"] { font-style: italic; } /* Substring at the end */ a[href$=".com"] { font-weight: bold; }
Pseudo-classes:
- Pseudo-classes are used to define a special state of an element. For example:
a:hover { color: red; } input:focus { border: 2px solid blue; } li:nth-child(odd) { background-color: #eee; }
Pseudo-elements:
- Pseudo-elements are used to style specified parts of an element.
p::first-line { font-weight: bold; } p::first-letter { font-size: 130%; } ::before { content: "Note: "; font-weight: bold; } ::after { content: "."; }
Combinators:
- Descendant combinator: Selects all elements that are descendants of a specified element.
div p { color: red; }
- Child combinator: Selects all elements that are the direct children of a specified element.
ul > li { border: 1px solid blue; }
- Adjacent sibling combinator: Selects an element that is directly after another specific element.
h1 + p { font-size: 18px; }
- General sibling combinator: Selects all elements that are siblings of a specified element.
h1 ~ p { color: green; }
Grouping selectors:
- Grouping selectors are used to apply the same style to multiple selectors.
h1, h2, h3 { font-family: Arial, sans-serif; }
In web scraping, these selectors can be used with tools like BeautifulSoup (Python) or querySelector/querySelectorAll (JavaScript) to select and manipulate DOM elements.
Here's a simple example of using CSS selectors in Python with BeautifulSoup:
from bs4 import BeautifulSoup
import requests
# Fetch the webpage
response = requests.get('https://example.com')
html = response.text
# Parse the webpage
soup = BeautifulSoup(html, 'html.parser')
# Use CSS selectors to find elements
articles = soup.select('div.articles > article')
# Extract data from elements
for article in articles:
title = article.select_one('h2.title').get_text()
print(title)
And here's an example of using CSS selectors in JavaScript:
// Use CSS selectors to find elements
var articles = document.querySelectorAll('div.articles > article');
// Extract data from elements
articles.forEach(function(article) {
var title = article.querySelector('h2.title').textContent;
console.log(title);
});
Each CSS selector type serves a different purpose, and they can be combined to create more complex selection paths. This makes them a powerful tool for both styling and scraping web content.