How to Select Elements by CSS Selector in SwiftSoup

SwiftSoup is a pure Swift HTML parser that provides powerful CSS selector functionality, making it easy to extract specific elements from HTML documents in iOS applications. CSS selectors offer a flexible and intuitive way to navigate and parse HTML content, similar to how you would target elements in web development.

Basic Element Selection

Installing SwiftSoup

First, add SwiftSoup to your iOS project using Swift Package Manager:

// Add to Package.swift dependencies
.package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0")

Basic Usage Pattern

Here's the fundamental approach to selecting elements with CSS selectors in SwiftSoup:

import SwiftSoup

do {
    let html = """
    <html>
        <body>
            <div class="container">
                <h1 id="title">Welcome</h1>
                <p class="intro">Introduction paragraph</p>
                <ul class="list">
                    <li class="item">Item 1</li>
                    <li class="item">Item 2</li>
                </ul>
            </div>
        </body>
    </html>
    """

    let document = try SwiftSoup.parse(html)

    // Select elements using CSS selectors
    let elements = try document.select("css-selector-here")

    for element in elements {
        print(element.text())
    }
} catch {
    print("Error: \(error)")
}

Common CSS Selector Types

1. Tag Selectors

Select elements by their HTML tag name:

// Select all paragraph elements
let paragraphs = try document.select("p")

// Select all div elements
let divs = try document.select("div")

// Select all anchor tags
let links = try document.select("a")

2. Class Selectors

Select elements by their CSS class using the dot (.) prefix:

// Select elements with class "container"
let containers = try document.select(".container")

// Select elements with class "nav-item"
let navItems = try document.select(".nav-item")

// Select multiple classes (elements must have both classes)
let specificItems = try document.select(".item.active")

3. ID Selectors

Select elements by their ID using the hash (#) prefix:

// Select element with ID "header"
let header = try document.select("#header")

// Select element with ID "main-content"
let mainContent = try document.select("#main-content")

4. Attribute Selectors

Select elements based on their attributes:

// Elements with specific attribute
let elementsWithHref = try document.select("[href]")

// Elements with specific attribute value
let externalLinks = try document.select("[target='_blank']")

// Elements with attribute containing specific text
let mailtoLinks = try document.select("[href*='mailto:']")

// Elements with attribute starting with specific text
let httpsLinks = try document.select("[href^='https://']")

// Elements with attribute ending with specific text
let pdfLinks = try document.select("[href$='.pdf']")

Advanced CSS Selectors

Combinators and Hierarchical Selection

// Descendant combinator (space) - select nested elements
let nestedItems = try document.select("ul li")

// Child combinator (>) - select direct children only
let directChildren = try document.select("ul > li")

// Adjacent sibling combinator (+) - select immediately following sibling
let nextSibling = try document.select("h1 + p")

// General sibling combinator (~) - select all following siblings
let allSiblings = try document.select("h1 ~ p")

Pseudo-selectors

SwiftSoup supports various pseudo-selectors for more precise targeting:

// First child element
let firstItem = try document.select("li:first-child")

// Last child element
let lastItem = try document.select("li:last-child")

// Nth child (1-based indexing)
let thirdItem = try document.select("li:nth-child(3)")

// Even/odd children
let evenItems = try document.select("li:nth-child(even)")
let oddItems = try document.select("li:nth-child(odd)")

// Elements containing specific text
let elementsWithText = try document.select(":contains(Welcome)")

// Empty elements
let emptyElements = try document.select(":empty")

Practical Examples

Extracting Article Information

let html = """
<article class="blog-post">
    <header>
        <h1 class="title">Understanding SwiftSoup</h1>
        <div class="meta">
            <span class="author">John Doe</span>
            <span class="date">2024-01-15</span>
        </div>
    </header>
    <div class="content">
        <p class="intro">This article covers SwiftSoup basics...</p>
        <p>More content here...</p>
    </div>
    <footer>
        <div class="tags">
            <span class="tag">swift</span>
            <span class="tag">html</span>
        </div>
    </footer>
</article>
"""

do {
    let document = try SwiftSoup.parse(html)

    // Extract article title
    let title = try document.select(".blog-post .title").first()?.text() ?? ""

    // Extract author
    let author = try document.select(".meta .author").first()?.text() ?? ""

    // Extract publication date
    let date = try document.select(".meta .date").first()?.text() ?? ""

    // Extract all paragraphs from content
    let contentParagraphs = try document.select(".content p")

    // Extract all tags
    let tags = try document.select(".tags .tag").map { try $0.text() }

    print("Title: \(title)")
    print("Author: \(author)")
    print("Date: \(date)")
    print("Tags: \(tags.joined(separator: ", "))")

} catch {
    print("Parsing error: \(error)")
}

Working with Tables

let tableHTML = """
<table class="data-table">
    <thead>
        <tr>
            <th>Name</th>
            <th>Age</th>
            <th>City</th>
        </tr>
    </thead>
    <tbody>
        <tr class="person" data-id="1">
            <td class="name">Alice</td>
            <td class="age">25</td>
            <td class="city">New York</td>
        </tr>
        <tr class="person" data-id="2">
            <td class="name">Bob</td>
            <td class="age">30</td>
            <td class="city">London</td>
        </tr>
    </tbody>
</table>
"""

do {
    let document = try SwiftSoup.parse(tableHTML)

    // Extract table headers
    let headers = try document.select("thead th").map { try $0.text() }

    // Extract all data rows
    let rows = try document.select("tbody tr.person")

    for row in rows {
        let id = try row.attr("data-id")
        let name = try row.select(".name").first()?.text() ?? ""
        let age = try row.select(".age").first()?.text() ?? ""
        let city = try row.select(".city").first()?.text() ?? ""

        print("ID: \(id), Name: \(name), Age: \(age), City: \(city)")
    }

} catch {
    print("Table parsing error: \(error)")
}

Error Handling and Best Practices

Robust Element Selection

extension Document {
    func safeSelect(_ selector: String) -> Elements? {
        do {
            let elements = try self.select(selector)
            return elements.isEmpty() ? nil : elements
        } catch {
            print("Selector error for '\(selector)': \(error)")
            return nil
        }
    }
}

// Usage
if let elements = document.safeSelect(".content p") {
    for element in elements {
        print(try? element.text() ?? "")
    }
}

Chaining Selectors Safely

func extractNestedContent(from document: Document, selector: String) -> [String] {
    do {
        return try document.select(selector).compactMap { element in
            try element.text().isEmpty ? nil : element.text()
        }
    } catch {
        print("Error extracting content with selector '\(selector)': \(error)")
        return []
    }
}

// Usage
let contentItems = extractNestedContent(from: document, selector: ".article .content p")

Performance Considerations

Efficient Selector Usage

// Prefer specific selectors over broad ones
// Good: Specific path
let specificElements = try document.select("article.blog-post .content p.intro")

// Less efficient: Broad selector
let broadElements = try document.select("p")

// Use parent context when selecting multiple related elements
let article = try document.select("article.blog-post").first()
if let articleElement = article {
    let title = try articleElement.select(".title").first()?.text()
    let content = try articleElement.select(".content p")
}

Caching Selected Elements

struct ArticleParser {
    private let document: Document
    private lazy var articleElement: Element? = {
        try? document.select("article.blog-post").first()
    }()

    init(html: String) throws {
        self.document = try SwiftSoup.parse(html)
    }

    func getTitle() -> String? {
        try? articleElement?.select(".title").first()?.text()
    }

    func getContent() -> [String] {
        guard let article = articleElement else { return [] }
        return (try? article.select(".content p").map { try $0.text() }) ?? []
    }
}

Integration with Web Scraping Workflows

When building comprehensive web scraping solutions, SwiftSoup's CSS selectors can be particularly powerful when combined with proper HTML parsing strategies. For complex dynamic content that requires JavaScript execution, you might need to consider server-side solutions or web scraping APIs that can handle JavaScript-rendered content, similar to how modern browser automation tools handle dynamic pages.

SwiftSoup excels at parsing static HTML content efficiently, making it ideal for scenarios where you need fast, reliable element extraction without the overhead of a full browser engine. This approach is particularly valuable when handling multiple pages or implementing systematic data extraction workflows in iOS applications.

Conclusion

CSS selectors in SwiftSoup provide a powerful and familiar way to extract data from HTML documents in Swift applications. By mastering the various selector types—from basic tag and class selectors to advanced pseudo-selectors and combinators—you can efficiently parse and extract the exact content you need from web pages.

The key to successful HTML parsing with SwiftSoup lies in understanding your target HTML structure, using specific selectors for better performance, implementing proper error handling, and following Swift best practices for safe unwrapping and error management. With these techniques, you can build robust web scraping and HTML parsing functionality directly into your iOS applications.

Table of contents