How do I parse HTML tables with SwiftSoup?

Parsing HTML tables is a fundamental task when extracting structured data from web pages. SwiftSoup provides powerful tools to select table elements, iterate through rows and cells, and extract data efficiently. This guide covers everything from basic table parsing to advanced scenarios with nested tables and complex structures.

Quick Answer

SwiftSoup allows you to parse HTML tables by selecting the table element and iterating through its rows (<tr>) and cells (<td> or <th>):

import SwiftSoup

do {
    let html = "<table><tr><th>Name</th><th>Age</th></tr><tr><td>John</td><td>25</td></tr></table>"
    let doc: Document = try SwiftSoup.parse(html)
    let table: Element = try doc.select("table").first()!
    let rows: Elements = try table.select("tr")

    for row in rows.array() {
        let cells: Elements = try row.select("td, th")
        for cell in cells.array() {
            print(try cell.text())
        }
    }
} catch {
    print("Error parsing HTML: \(error)")
}

Setting Up SwiftSoup

First, add SwiftSoup to your project using Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0")
]

Then import SwiftSoup in your Swift file:

import SwiftSoup

Basic Table Parsing

Extracting All Table Data

Here's a comprehensive example of parsing a basic HTML table:

import SwiftSoup

func parseBasicTable() {
    let html = """
    <table>
        <thead>
            <tr>
                <th>Product</th>
                <th>Price</th>
                <th>Stock</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>iPhone 15</td>
                <td>$999</td>
                <td>50</td>
            </tr>
            <tr>
                <td>MacBook Pro</td>
                <td>$2499</td>
                <td>25</td>
            </tr>
        </tbody>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table").first()!

        // Extract headers
        let headers: Elements = try table.select("thead th")
        var headerTexts: [String] = []

        for header in headers.array() {
            headerTexts.append(try header.text())
        }
        print("Headers: \(headerTexts)")

        // Extract data rows
        let dataRows: Elements = try table.select("tbody tr")
        var tableData: [[String]] = []

        for row in dataRows.array() {
            let cells: Elements = try row.select("td")
            var rowData: [String] = []

            for cell in cells.array() {
                rowData.append(try cell.text())
            }
            tableData.append(rowData)
        }

        print("Table Data: \(tableData)")

    } catch {
        print("Error parsing table: \(error)")
    }
}

Parsing Tables Without Headers

When dealing with tables that don't have explicit header sections:

func parseTableWithoutHeaders() {
    let html = """
    <table>
        <tr><td>Name</td><td>Email</td><td>Role</td></tr>
        <tr><td>Alice Johnson</td><td>alice@example.com</td><td>Manager</td></tr>
        <tr><td>Bob Smith</td><td>bob@example.com</td><td>Developer</td></tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let rows: Elements = try doc.select("table tr")

        for (index, row) in rows.array().enumerated() {
            let cells: Elements = try row.select("td")
            var rowData: [String] = []

            for cell in cells.array() {
                rowData.append(try cell.text())
            }

            if index == 0 {
                print("Headers: \(rowData)")
            } else {
                print("Row \(index): \(rowData)")
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Advanced Table Parsing Techniques

Working with Table Attributes

Extract additional information from table elements using attributes:

func parseTableWithAttributes() {
    let html = """
    <table id="products" class="data-table">
        <tr>
            <td data-sort="name">Product A</td>
            <td data-sort="price">$29.99</td>
            <td class="stock-low">5</td>
        </tr>
        <tr>
            <td data-sort="name">Product B</td>
            <td data-sort="price">$49.99</td>
            <td class="stock-ok">150</td>
        </tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table#products").first()!

        // Get table attributes
        let tableId = try table.attr("id")
        let tableClass = try table.attr("class")
        print("Table ID: \(tableId), Class: \(tableClass)")

        let rows: Elements = try table.select("tr")

        for row in rows.array() {
            let cells: Elements = try row.select("td")

            for cell in cells.array() {
                let cellText = try cell.text()
                let sortAttribute = try cell.attr("data-sort")
                let cellClass = try cell.attr("class")

                print("Cell: \(cellText)")
                if !sortAttribute.isEmpty {
                    print("  Sort key: \(sortAttribute)")
                }
                if !cellClass.isEmpty {
                    print("  Class: \(cellClass)")
                }
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Handling Complex Table Structures

Deal with tables that have colspan and rowspan attributes:

func parseComplexTable() {
    let html = """
    <table>
        <tr>
            <th colspan="2">Sales Report</th>
            <th rowspan="2">Total</th>
        </tr>
        <tr>
            <th>Q1</th>
            <th>Q2</th>
        </tr>
        <tr>
            <td>$1000</td>
            <td>$1500</td>
            <td>$2500</td>
        </tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let rows: Elements = try doc.select("table tr")

        for (rowIndex, row) in rows.array().enumerated() {
            print("Row \(rowIndex):")

            let cells: Elements = try row.select("th, td")

            for cell in cells.array() {
                let cellText = try cell.text()
                let colspan = try cell.attr("colspan")
                let rowspan = try cell.attr("rowspan")

                var cellInfo = "  Cell: \(cellText)"

                if !colspan.isEmpty && colspan != "1" {
                    cellInfo += " (colspan: \(colspan))"
                }
                if !rowspan.isEmpty && rowspan != "1" {
                    cellInfo += " (rowspan: \(rowspan))"
                }

                print(cellInfo)
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Extracting Specific Data Patterns

Finding Tables by Content

Locate specific tables based on their content:

func findTableByContent() {
    let html = """
    <div>
        <table>
            <tr><th>Users</th></tr>
            <tr><td>John</td></tr>
        </table>
        <table>
            <tr><th>Products</th><th>Price</th></tr>
            <tr><td>iPhone</td><td>$999</td></tr>
        </table>
    </div>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)

        // Find table containing "Products" header
        let productTables: Elements = try doc.select("table:has(th:contains(Products))")

        for table in productTables.array() {
            print("Found products table:")
            let rows: Elements = try table.select("tr")

            for row in rows.array() {
                let cells: Elements = try row.select("th, td")
                var rowData: [String] = []

                for cell in cells.array() {
                    rowData.append(try cell.text())
                }

                print("  \(rowData)")
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Converting Table to Dictionary

Create structured data from table content:

struct TableData {
    let headers: [String]
    let rows: [[String]]

    func toDictionaryArray() -> [[String: String]] {
        return rows.map { row in
            var dict: [String: String] = [:]
            for (index, value) in row.enumerated() {
                if index < headers.count {
                    dict[headers[index]] = value
                }
            }
            return dict
        }
    }
}

func parseTableToDictionary() {
    let html = """
    <table>
        <tr><th>Name</th><th>Age</th><th>City</th></tr>
        <tr><td>Alice</td><td>30</td><td>New York</td></tr>
        <tr><td>Bob</td><td>25</td><td>London</td></tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table").first()!

        // Extract headers
        let headerElements: Elements = try table.select("tr:first-child th, tr:first-child td")
        let headers = try headerElements.array().map { try $0.text() }

        // Extract data rows (skip first row if it contains headers)
        let dataRowElements: Elements = try table.select("tr:gt(0)")
        var rows: [[String]] = []

        for row in dataRowElements.array() {
            let cells: Elements = try row.select("td")
            let rowData = try cells.array().map { try $0.text() }
            rows.append(rowData)
        }

        let tableData = TableData(headers: headers, rows: rows)
        let dictArray = tableData.toDictionaryArray()

        for dict in dictArray {
            print(dict)
        }

    } catch {
        print("Error: \(error)")
    }
}

Error Handling and Edge Cases

Robust Table Parser

Handle various edge cases when parsing tables:

func robustTableParser(html: String) -> [[String]] {
    do {
        let doc: Document = try SwiftSoup.parse(html)
        let tables: Elements = try doc.select("table")

        guard !tables.isEmpty() else {
            print("No tables found in HTML")
            return []
        }

        let table = tables.first()!
        let rows: Elements = try table.select("tr")
        var result: [[String]] = []

        for row in rows.array() {
            // Handle both th and td elements
            let cells: Elements = try row.select("th, td")
            var rowData: [String] = []

            for cell in cells.array() {
                let cellText = try cell.text().trimmingCharacters(in: .whitespacesAndNewlines)
                rowData.append(cellText)
            }

            // Only add non-empty rows
            if !rowData.isEmpty {
                result.append(rowData)
            }
        }

        return result

    } catch Exception.Error(let type, let message) {
        print("SwiftSoup error (\(type)): \(message)")
        return []
    } catch {
        print("Unexpected error: \(error)")
        return []
    }
}

Performance Considerations

For large tables or when processing multiple tables, consider these optimization techniques:

Memory-Efficient Processing

func processLargeTable(html: String) {
    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table").first()!
        let rows: Elements = try table.select("tr")

        // Process rows one at a time to minimize memory usage
        for (index, row) in rows.array().enumerated() {
            autoreleasepool {
                do {
                    let cells: Elements = try row.select("td, th")

                    // Process each row immediately rather than storing all data
                    var rowData: [String] = []
                    for cell in cells.array() {
                        rowData.append(try cell.text())
                    }

                    // Process the row data here
                    processRow(index: index, data: rowData)

                } catch {
                    print("Error processing row \(index): \(error)")
                }
            }
        }

    } catch {
        print("Error parsing table: \(error)")
    }
}

func processRow(index: Int, data: [String]) {
    // Your row processing logic here
    print("Processing row \(index): \(data)")
}

Integration with Web Scraping

When combined with network requests, SwiftSoup becomes powerful for web scraping. For scenarios requiring JavaScript execution or complex page interactions, consider using browser automation tools for handling dynamic content.

Complete Web Scraping Example

import Foundation
import SwiftSoup

func scrapeWebTable(from urlString: String, completion: @escaping ([[String]]) -> Void) {
    guard let url = URL(string: urlString) else {
        print("Invalid URL")
        completion([])
        return
    }

    let task = URLSession.shared.dataTask(with: url) { data, response, error in
        guard let data = data,
              let htmlString = String(data: data, encoding: .utf8) else {
            print("Failed to load data")
            completion([])
            return
        }

        let tableData = robustTableParser(html: htmlString)
        completion(tableData)
    }

    task.resume()
}

// Usage
scrapeWebTable(from: "https://example.com/data.html") { tableData in
    DispatchQueue.main.async {
        for row in tableData {
            print(row)
        }
    }
}

Best Practices

Always handle exceptions: SwiftSoup operations can throw exceptions, so wrap them in do-catch blocks
Validate table structure: Check if tables exist before attempting to parse them
Handle empty cells: Some table cells might be empty or contain only whitespace
Consider cell spanning: Tables with colspan/rowspan require special handling
Optimize for performance: For large tables, process data incrementally rather than loading everything into memory

Conclusion

SwiftSoup provides excellent capabilities for parsing HTML tables in Swift applications. Whether you're dealing with simple data tables or complex structures with spanning cells, SwiftSoup's CSS selector syntax makes it easy to extract the data you need. Combined with proper error handling and performance considerations, you can build robust table parsing solutions for your iOS and macOS applications.

For more complex scenarios involving JavaScript-heavy sites, consider complementing SwiftSoup with browser automation techniques for single-page applications or specialized web scraping APIs that handle dynamic content.

Table of contents

How do I parse HTML tables with SwiftSoup?

Quick Answer

Setting Up SwiftSoup

Basic Table Parsing

Extracting All Table Data

Parsing Tables Without Headers

Advanced Table Parsing Techniques

Working with Table Attributes

Handling Complex Table Structures

Extracting Specific Data Patterns

Finding Tables by Content

Converting Table to Dictionary

Error Handling and Edge Cases

Robust Table Parser

Performance Considerations

Memory-Efficient Processing

Integration with Web Scraping

Complete Web Scraping Example

Best Practices

Conclusion

Try WebScraping.AI for Your Web Scraping Needs

Key Features:

Getting Started:

Related Questions

Can SwiftSoup handle malformed or invalid HTML?

How do I select the first or last element matching a criteria in SwiftSoup?

How do I traverse DOM tree structure with SwiftSoup?

Get Started Now

Support