Table of contents

How do I parse HTML tables with SwiftSoup?

Parsing HTML tables is a fundamental task when extracting structured data from web pages. SwiftSoup provides powerful tools to select table elements, iterate through rows and cells, and extract data efficiently. This guide covers everything from basic table parsing to advanced scenarios with nested tables and complex structures.

Quick Answer

SwiftSoup allows you to parse HTML tables by selecting the table element and iterating through its rows (<tr>) and cells (<td> or <th>):

import SwiftSoup

do {
    let html = "<table><tr><th>Name</th><th>Age</th></tr><tr><td>John</td><td>25</td></tr></table>"
    let doc: Document = try SwiftSoup.parse(html)
    let table: Element = try doc.select("table").first()!
    let rows: Elements = try table.select("tr")

    for row in rows.array() {
        let cells: Elements = try row.select("td, th")
        for cell in cells.array() {
            print(try cell.text())
        }
    }
} catch {
    print("Error parsing HTML: \(error)")
}

Setting Up SwiftSoup

First, add SwiftSoup to your project using Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0")
]

Then import SwiftSoup in your Swift file:

import SwiftSoup

Basic Table Parsing

Extracting All Table Data

Here's a comprehensive example of parsing a basic HTML table:

import SwiftSoup

func parseBasicTable() {
    let html = """
    <table>
        <thead>
            <tr>
                <th>Product</th>
                <th>Price</th>
                <th>Stock</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>iPhone 15</td>
                <td>$999</td>
                <td>50</td>
            </tr>
            <tr>
                <td>MacBook Pro</td>
                <td>$2499</td>
                <td>25</td>
            </tr>
        </tbody>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table").first()!

        // Extract headers
        let headers: Elements = try table.select("thead th")
        var headerTexts: [String] = []

        for header in headers.array() {
            headerTexts.append(try header.text())
        }
        print("Headers: \(headerTexts)")

        // Extract data rows
        let dataRows: Elements = try table.select("tbody tr")
        var tableData: [[String]] = []

        for row in dataRows.array() {
            let cells: Elements = try row.select("td")
            var rowData: [String] = []

            for cell in cells.array() {
                rowData.append(try cell.text())
            }
            tableData.append(rowData)
        }

        print("Table Data: \(tableData)")

    } catch {
        print("Error parsing table: \(error)")
    }
}

Parsing Tables Without Headers

When dealing with tables that don't have explicit header sections:

func parseTableWithoutHeaders() {
    let html = """
    <table>
        <tr><td>Name</td><td>Email</td><td>Role</td></tr>
        <tr><td>Alice Johnson</td><td>alice@example.com</td><td>Manager</td></tr>
        <tr><td>Bob Smith</td><td>bob@example.com</td><td>Developer</td></tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let rows: Elements = try doc.select("table tr")

        for (index, row) in rows.array().enumerated() {
            let cells: Elements = try row.select("td")
            var rowData: [String] = []

            for cell in cells.array() {
                rowData.append(try cell.text())
            }

            if index == 0 {
                print("Headers: \(rowData)")
            } else {
                print("Row \(index): \(rowData)")
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Advanced Table Parsing Techniques

Working with Table Attributes

Extract additional information from table elements using attributes:

func parseTableWithAttributes() {
    let html = """
    <table id="products" class="data-table">
        <tr>
            <td data-sort="name">Product A</td>
            <td data-sort="price">$29.99</td>
            <td class="stock-low">5</td>
        </tr>
        <tr>
            <td data-sort="name">Product B</td>
            <td data-sort="price">$49.99</td>
            <td class="stock-ok">150</td>
        </tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table#products").first()!

        // Get table attributes
        let tableId = try table.attr("id")
        let tableClass = try table.attr("class")
        print("Table ID: \(tableId), Class: \(tableClass)")

        let rows: Elements = try table.select("tr")

        for row in rows.array() {
            let cells: Elements = try row.select("td")

            for cell in cells.array() {
                let cellText = try cell.text()
                let sortAttribute = try cell.attr("data-sort")
                let cellClass = try cell.attr("class")

                print("Cell: \(cellText)")
                if !sortAttribute.isEmpty {
                    print("  Sort key: \(sortAttribute)")
                }
                if !cellClass.isEmpty {
                    print("  Class: \(cellClass)")
                }
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Handling Complex Table Structures

Deal with tables that have colspan and rowspan attributes:

func parseComplexTable() {
    let html = """
    <table>
        <tr>
            <th colspan="2">Sales Report</th>
            <th rowspan="2">Total</th>
        </tr>
        <tr>
            <th>Q1</th>
            <th>Q2</th>
        </tr>
        <tr>
            <td>$1000</td>
            <td>$1500</td>
            <td>$2500</td>
        </tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let rows: Elements = try doc.select("table tr")

        for (rowIndex, row) in rows.array().enumerated() {
            print("Row \(rowIndex):")

            let cells: Elements = try row.select("th, td")

            for cell in cells.array() {
                let cellText = try cell.text()
                let colspan = try cell.attr("colspan")
                let rowspan = try cell.attr("rowspan")

                var cellInfo = "  Cell: \(cellText)"

                if !colspan.isEmpty && colspan != "1" {
                    cellInfo += " (colspan: \(colspan))"
                }
                if !rowspan.isEmpty && rowspan != "1" {
                    cellInfo += " (rowspan: \(rowspan))"
                }

                print(cellInfo)
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Extracting Specific Data Patterns

Finding Tables by Content

Locate specific tables based on their content:

func findTableByContent() {
    let html = """
    <div>
        <table>
            <tr><th>Users</th></tr>
            <tr><td>John</td></tr>
        </table>
        <table>
            <tr><th>Products</th><th>Price</th></tr>
            <tr><td>iPhone</td><td>$999</td></tr>
        </table>
    </div>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)

        // Find table containing "Products" header
        let productTables: Elements = try doc.select("table:has(th:contains(Products))")

        for table in productTables.array() {
            print("Found products table:")
            let rows: Elements = try table.select("tr")

            for row in rows.array() {
                let cells: Elements = try row.select("th, td")
                var rowData: [String] = []

                for cell in cells.array() {
                    rowData.append(try cell.text())
                }

                print("  \(rowData)")
            }
        }

    } catch {
        print("Error: \(error)")
    }
}

Converting Table to Dictionary

Create structured data from table content:

struct TableData {
    let headers: [String]
    let rows: [[String]]

    func toDictionaryArray() -> [[String: String]] {
        return rows.map { row in
            var dict: [String: String] = [:]
            for (index, value) in row.enumerated() {
                if index < headers.count {
                    dict[headers[index]] = value
                }
            }
            return dict
        }
    }
}

func parseTableToDictionary() {
    let html = """
    <table>
        <tr><th>Name</th><th>Age</th><th>City</th></tr>
        <tr><td>Alice</td><td>30</td><td>New York</td></tr>
        <tr><td>Bob</td><td>25</td><td>London</td></tr>
    </table>
    """

    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table").first()!

        // Extract headers
        let headerElements: Elements = try table.select("tr:first-child th, tr:first-child td")
        let headers = try headerElements.array().map { try $0.text() }

        // Extract data rows (skip first row if it contains headers)
        let dataRowElements: Elements = try table.select("tr:gt(0)")
        var rows: [[String]] = []

        for row in dataRowElements.array() {
            let cells: Elements = try row.select("td")
            let rowData = try cells.array().map { try $0.text() }
            rows.append(rowData)
        }

        let tableData = TableData(headers: headers, rows: rows)
        let dictArray = tableData.toDictionaryArray()

        for dict in dictArray {
            print(dict)
        }

    } catch {
        print("Error: \(error)")
    }
}

Error Handling and Edge Cases

Robust Table Parser

Handle various edge cases when parsing tables:

func robustTableParser(html: String) -> [[String]] {
    do {
        let doc: Document = try SwiftSoup.parse(html)
        let tables: Elements = try doc.select("table")

        guard !tables.isEmpty() else {
            print("No tables found in HTML")
            return []
        }

        let table = tables.first()!
        let rows: Elements = try table.select("tr")
        var result: [[String]] = []

        for row in rows.array() {
            // Handle both th and td elements
            let cells: Elements = try row.select("th, td")
            var rowData: [String] = []

            for cell in cells.array() {
                let cellText = try cell.text().trimmingCharacters(in: .whitespacesAndNewlines)
                rowData.append(cellText)
            }

            // Only add non-empty rows
            if !rowData.isEmpty {
                result.append(rowData)
            }
        }

        return result

    } catch Exception.Error(let type, let message) {
        print("SwiftSoup error (\(type)): \(message)")
        return []
    } catch {
        print("Unexpected error: \(error)")
        return []
    }
}

Performance Considerations

For large tables or when processing multiple tables, consider these optimization techniques:

Memory-Efficient Processing

func processLargeTable(html: String) {
    do {
        let doc: Document = try SwiftSoup.parse(html)
        let table: Element = try doc.select("table").first()!
        let rows: Elements = try table.select("tr")

        // Process rows one at a time to minimize memory usage
        for (index, row) in rows.array().enumerated() {
            autoreleasepool {
                do {
                    let cells: Elements = try row.select("td, th")

                    // Process each row immediately rather than storing all data
                    var rowData: [String] = []
                    for cell in cells.array() {
                        rowData.append(try cell.text())
                    }

                    // Process the row data here
                    processRow(index: index, data: rowData)

                } catch {
                    print("Error processing row \(index): \(error)")
                }
            }
        }

    } catch {
        print("Error parsing table: \(error)")
    }
}

func processRow(index: Int, data: [String]) {
    // Your row processing logic here
    print("Processing row \(index): \(data)")
}

Integration with Web Scraping

When combined with network requests, SwiftSoup becomes powerful for web scraping. For scenarios requiring JavaScript execution or complex page interactions, consider using browser automation tools for handling dynamic content.

Complete Web Scraping Example

import Foundation
import SwiftSoup

func scrapeWebTable(from urlString: String, completion: @escaping ([[String]]) -> Void) {
    guard let url = URL(string: urlString) else {
        print("Invalid URL")
        completion([])
        return
    }

    let task = URLSession.shared.dataTask(with: url) { data, response, error in
        guard let data = data,
              let htmlString = String(data: data, encoding: .utf8) else {
            print("Failed to load data")
            completion([])
            return
        }

        let tableData = robustTableParser(html: htmlString)
        completion(tableData)
    }

    task.resume()
}

// Usage
scrapeWebTable(from: "https://example.com/data.html") { tableData in
    DispatchQueue.main.async {
        for row in tableData {
            print(row)
        }
    }
}

Best Practices

  1. Always handle exceptions: SwiftSoup operations can throw exceptions, so wrap them in do-catch blocks
  2. Validate table structure: Check if tables exist before attempting to parse them
  3. Handle empty cells: Some table cells might be empty or contain only whitespace
  4. Consider cell spanning: Tables with colspan/rowspan require special handling
  5. Optimize for performance: For large tables, process data incrementally rather than loading everything into memory

Conclusion

SwiftSoup provides excellent capabilities for parsing HTML tables in Swift applications. Whether you're dealing with simple data tables or complex structures with spanning cells, SwiftSoup's CSS selector syntax makes it easy to extract the data you need. Combined with proper error handling and performance considerations, you can build robust table parsing solutions for your iOS and macOS applications.

For more complex scenarios involving JavaScript-heavy sites, consider complementing SwiftSoup with browser automation techniques for single-page applications or specialized web scraping APIs that handle dynamic content.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon