Table of contents

How to Select Elements That Contain Specific Text in SwiftSoup

SwiftSoup is a powerful Swift library for parsing and manipulating HTML documents, providing similar functionality to Java's Jsoup library. One of the most common tasks when working with HTML is selecting elements based on their text content. This comprehensive guide will show you various methods to select elements that contain specific text using SwiftSoup.

Understanding Text-Based Element Selection

SwiftSoup offers several approaches to select elements based on their text content. The main methods include using CSS selectors with the :contains() pseudo-class and utilizing SwiftSoup's built-in methods for text matching.

Basic Text Selection with :contains()

The most straightforward way to select elements containing specific text is using the CSS :contains() pseudo-selector:

import SwiftSoup

do {
    let html = """
    <html>
        <body>
            <div>Welcome to our website</div>
            <p>This paragraph contains important information</p>
            <div>Another div with different content</div>
            <span>Welcome message here</span>
        </body>
    </html>
    """

    let doc = try SwiftSoup.parse(html)

    // Select all elements containing "Welcome"
    let welcomeElements = try doc.select(":contains(Welcome)")

    for element in welcomeElements {
        print("Found: \(try element.text())")
        print("Tag: \(element.tagName())")
    }
} catch Exception.Error(let type, let message) {
    print("Error: \(type) - \(message)")
} catch {
    print("Unknown error occurred")
}

Case-Sensitive vs Case-Insensitive Matching

By default, SwiftSoup's :contains() selector is case-sensitive. For case-insensitive matching, you'll need to implement additional logic:

import SwiftSoup

func selectElementsContainingTextIgnoreCase(_ doc: Document, _ text: String) throws -> Elements {
    let allElements = try doc.select("*")
    var matchingElements = Elements()

    for element in allElements {
        let elementText = try element.text().lowercased()
        if elementText.contains(text.lowercased()) {
            try matchingElements.add(element)
        }
    }

    return matchingElements
}

// Usage example
do {
    let html = "<div>HELLO World</div><p>hello there</p><span>Hi HELLO</span>"
    let doc = try SwiftSoup.parse(html)

    let elements = try selectElementsContainingTextIgnoreCase(doc, "hello")

    for element in elements {
        print("Found: \(try element.text())")
    }
} catch {
    print("Error: \(error)")
}

Advanced Text Selection Techniques

Selecting Elements with Exact Text Matches

Sometimes you need elements that contain exactly the specified text, not just as a substring:

import SwiftSoup

func selectElementsWithExactText(_ doc: Document, _ exactText: String) throws -> Elements {
    let allElements = try doc.select("*")
    var matchingElements = Elements()

    for element in allElements {
        let elementText = try element.text().trimmingCharacters(in: .whitespacesAndNewlines)
        if elementText == exactText {
            try matchingElements.add(element)
        }
    }

    return matchingElements
}

// Example usage
do {
    let html = """
    <div>Contact Us</div>
    <p>Please Contact Us for more information</p>
    <button>Contact Us</button>
    """

    let doc = try SwiftSoup.parse(html)
    let exactMatches = try selectElementsWithExactText(doc, "Contact Us")

    for element in exactMatches {
        print("Exact match found: \(element.tagName()) - \(try element.text())")
    }
} catch {
    print("Error: \(error)")
}

Combining Text Selection with Other Selectors

You can combine text-based selection with other CSS selectors for more precise targeting:

import SwiftSoup

do {
    let html = """
    <div class="content">
        <h1>Important Announcement</h1>
        <p>This is an important message</p>
        <div class="sidebar">
            <h2>Important Links</h2>
            <p>Some sidebar content</p>
        </div>
    </div>
    """

    let doc = try SwiftSoup.parse(html)

    // Select paragraphs containing "important" (case-insensitive)
    let importantParagraphs = try doc.select("p:contains(important)")

    // Select headings in sidebar containing "Important"
    let sidebarHeadings = try doc.select(".sidebar h2:contains(Important)")

    // Select any element with class "content" containing "Announcement"
    let contentWithAnnouncement = try doc.select(".content:contains(Announcement)")

    print("Important paragraphs: \(importantParagraphs.size())")
    print("Sidebar headings: \(sidebarHeadings.size())")
    print("Content with announcement: \(contentWithAnnouncement.size())")

} catch {
    print("Error: \(error)")
}

Working with Own Text vs. All Text

SwiftSoup distinguishes between an element's own text and all text (including child elements):

import SwiftSoup

do {
    let html = """
    <div>
        Parent text
        <span>Child text</span>
        More parent text
    </div>
    """

    let doc = try SwiftSoup.parse(html)
    let divElement = try doc.select("div").first()!

    // Get all text (including children)
    let allText = try divElement.text()
    print("All text: \(allText)")

    // Get only direct text (own text)
    let ownText = try divElement.ownText()
    print("Own text: \(ownText)")

    // Select based on own text only
    let elementsWithOwnText = try doc.select("*").filter { element in
        let ownText = try element.ownText().trimmingCharacters(in: .whitespacesAndNewlines)
        return ownText.contains("Parent")
    }

    for element in elementsWithOwnText {
        print("Element with own text: \(element.tagName())")
    }

} catch {
    print("Error: \(error)")
}

Pattern Matching and Regular Expressions

For more complex text matching scenarios, you can implement pattern-based selection:

import SwiftSoup
import Foundation

func selectElementsMatchingPattern(_ doc: Document, _ pattern: String) throws -> Elements {
    let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
    let allElements = try doc.select("*")
    var matchingElements = Elements()

    for element in allElements {
        let elementText = try element.text()
        let range = NSRange(location: 0, length: elementText.utf16.count)

        if regex.firstMatch(in: elementText, options: [], range: range) != nil {
            try matchingElements.add(element)
        }
    }

    return matchingElements
}

// Example: Select elements containing email addresses
do {
    let html = """
    <div>Contact us at support@example.com</div>
    <p>Email john.doe@company.org for details</p>
    <span>No email here</span>
    <div>Another email: admin@site.net</div>
    """

    let doc = try SwiftSoup.parse(html)
    let emailPattern = "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"

    let elementsWithEmails = try selectElementsMatchingPattern(doc, emailPattern)

    for element in elementsWithEmails {
        print("Element with email: \(try element.text())")
    }
} catch {
    print("Error: \(error)")
}

Performance Considerations and Best Practices

Optimizing Text-Based Selections

When working with large HTML documents, text-based selections can be expensive. Here are some optimization strategies:

import SwiftSoup

func optimizedTextSelection(_ doc: Document, _ searchText: String, _ tagFilter: String? = nil) throws -> Elements {
    // First, narrow down the search space if possible
    let searchScope = try doc.select(tagFilter ?? "*")
    var results = Elements()

    // Use early termination for better performance
    for element in searchScope {
        let text = try element.text()
        if text.localizedCaseInsensitiveContains(searchText) {
            try results.add(element)
        }
    }

    return results
}

// Usage example
do {
    let html = """
    <html>
        <body>
            <div class="content">
                <p>This is important information</p>
                <p>Regular paragraph</p>
                <p>Another important note</p>
            </div>
            <footer>
                <p>Footer content</p>
            </footer>
        </body>
    </html>
    """

    let doc = try SwiftSoup.parse(html)

    // Optimize by searching only within content div paragraphs
    let importantParagraphs = try optimizedTextSelection(doc, "important", ".content p")

    for element in importantParagraphs {
        print("Found: \(try element.text())")
    }
} catch {
    print("Error: \(error)")
}

Error Handling and Edge Cases

When selecting elements by text content, it's crucial to handle various edge cases:

import SwiftSoup

func robustTextSelection(_ htmlString: String, _ searchText: String) -> [String] {
    var results: [String] = []

    do {
        guard !htmlString.isEmpty && !searchText.isEmpty else {
            print("Warning: Empty HTML or search text provided")
            return results
        }

        let doc = try SwiftSoup.parse(htmlString)
        let elements = try doc.select(":contains(\(searchText))")

        for element in elements {
            let text = try element.text().trimmingCharacters(in: .whitespacesAndNewlines)
            if !text.isEmpty {
                results.append(text)
            }
        }

    } catch Exception.Error(let type, let message) {
        print("SwiftSoup Error: \(type) - \(message)")
    } catch {
        print("Unexpected error: \(error.localizedDescription)")
    }

    return results
}

// Test with various edge cases
let testCases = [
    "<div></div>", // Empty elements
    "<p>   </p>", // Whitespace only
    "<span>Normal text</span>", // Normal case
    "", // Empty HTML
    "<div>Test&amp;Example</div>" // HTML entities
]

for (index, testHtml) in testCases.enumerated() {
    let results = robustTextSelection(testHtml, "Test")
    print("Test case \(index + 1): \(results)")
}

Integration with Modern iOS Development

When building iOS applications that require web scraping or HTML parsing, SwiftSoup integrates well with modern Swift patterns:

import SwiftSoup
import Combine

class HTMLTextExtractor {
    func findElementsContaining(_ text: String, in html: String) -> AnyPublisher<[String], Error> {
        return Future { promise in
            DispatchQueue.global(qos: .background).async {
                do {
                    let doc = try SwiftSoup.parse(html)
                    let elements = try doc.select(":contains(\(text))")

                    let texts = try elements.compactMap { element in
                        try element.text().trimmingCharacters(in: .whitespacesAndNewlines)
                    }.filter { !$0.isEmpty }

                    DispatchQueue.main.async {
                        promise(.success(texts))
                    }
                } catch {
                    DispatchQueue.main.async {
                        promise(.failure(error))
                    }
                }
            }
        }
        .eraseToAnyPublisher()
    }
}

// Usage in a SwiftUI view or view controller
let extractor = HTMLTextExtractor()
extractor.findElementsContaining("important", in: htmlContent)
    .sink(
        receiveCompletion: { completion in
            switch completion {
            case .finished:
                print("Extraction completed")
            case .failure(let error):
                print("Error: \(error)")
            }
        },
        receiveValue: { texts in
            print("Found texts: \(texts)")
        }
    )

Working with Dynamic Content

When dealing with content that might be loaded dynamically, it's important to understand the limitations of HTML parsing libraries like SwiftSoup. Unlike browser-based solutions that can execute JavaScript, SwiftSoup only works with static HTML content. For cases where you need to handle dynamic content that loads after page load, you might need to combine SwiftSoup with other techniques or use JavaScript-based solutions.

Advanced SwiftSoup Text Selection Patterns

Selecting Elements by Text Length

Sometimes you need to select elements based on the length of their text content:

import SwiftSoup

func selectElementsByTextLength(_ doc: Document, minLength: Int, maxLength: Int? = nil) throws -> Elements {
    let allElements = try doc.select("*")
    var matchingElements = Elements()

    for element in allElements {
        let text = try element.ownText().trimmingCharacters(in: .whitespacesAndNewlines)
        let length = text.count

        if length >= minLength {
            if let maxLength = maxLength {
                if length <= maxLength {
                    try matchingElements.add(element)
                }
            } else {
                try matchingElements.add(element)
            }
        }
    }

    return matchingElements
}

// Example: Find elements with text between 10 and 50 characters
do {
    let html = """
    <div>Short</div>
    <p>This is a medium length paragraph that should be selected.</p>
    <span>This is a very long text content that exceeds the maximum character limit we've set for our selection criteria.</span>
    """

    let doc = try SwiftSoup.parse(html)
    let mediumTextElements = try selectElementsByTextLength(doc, minLength: 10, maxLength: 50)

    for element in mediumTextElements {
        print("Medium text: \(try element.text())")
    }
} catch {
    print("Error: \(error)")
}

Combining Multiple Text Criteria

You can create more sophisticated selection logic by combining multiple text-based criteria:

import SwiftSoup

struct TextSelectionCriteria {
    let containsText: String?
    let startsWithText: String?
    let endsWithText: String?
    let minLength: Int?
    let maxLength: Int?
    let caseInsensitive: Bool

    init(contains: String? = nil, startsWith: String? = nil, endsWith: String? = nil, 
         minLength: Int? = nil, maxLength: Int? = nil, caseInsensitive: Bool = true) {
        self.containsText = contains
        self.startsWithText = startsWith
        self.endsWithText = endsWith
        self.minLength = minLength
        self.maxLength = maxLength
        self.caseInsensitive = caseInsensitive
    }
}

func selectElementsByCriteria(_ doc: Document, criteria: TextSelectionCriteria) throws -> Elements {
    let allElements = try doc.select("*")
    var matchingElements = Elements()

    for element in allElements {
        var text = try element.text().trimmingCharacters(in: .whitespacesAndNewlines)

        if criteria.caseInsensitive {
            text = text.lowercased()
        }

        var matches = true

        // Check contains criteria
        if let containsText = criteria.containsText {
            let searchText = criteria.caseInsensitive ? containsText.lowercased() : containsText
            if !text.contains(searchText) {
                matches = false
            }
        }

        // Check starts with criteria
        if let startsWithText = criteria.startsWithText {
            let searchText = criteria.caseInsensitive ? startsWithText.lowercased() : startsWithText
            if !text.hasPrefix(searchText) {
                matches = false
            }
        }

        // Check ends with criteria
        if let endsWithText = criteria.endsWithText {
            let searchText = criteria.caseInsensitive ? endsWithText.lowercased() : endsWithText
            if !text.hasSuffix(searchText) {
                matches = false
            }
        }

        // Check length criteria
        if let minLength = criteria.minLength, text.count < minLength {
            matches = false
        }

        if let maxLength = criteria.maxLength, text.count > maxLength {
            matches = false
        }

        if matches {
            try matchingElements.add(element)
        }
    }

    return matchingElements
}

For more complex scraping scenarios involving JavaScript-heavy websites, you might want to explore how to handle AJAX requests using Puppeteer when building comprehensive web scraping solutions.

Conclusion

Selecting elements by text content in SwiftSoup is a powerful technique for HTML parsing and web scraping in iOS applications. Whether you need simple text matching with the :contains() selector or more complex pattern-based selection, SwiftSoup provides the tools necessary for effective HTML manipulation.

Remember to consider performance implications when working with large documents, handle edge cases properly, and leverage Swift's modern language features for cleaner, more maintainable code. By combining SwiftSoup's text selection capabilities with proper error handling and optimization techniques, you can build robust HTML parsing solutions for your iOS applications.

The key to successful text-based element selection lies in understanding your specific use case and choosing the appropriate method—whether it's simple substring matching, exact text matching, pattern-based selection, or complex multi-criteria filtering. With these techniques in your toolkit, you'll be well-equipped to extract the precise data you need from HTML documents in your Swift applications.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon