How do I parse HTML tables with SwiftSoup?
Parsing HTML tables is a fundamental task when extracting structured data from web pages. SwiftSoup provides powerful tools to select table elements, iterate through rows and cells, and extract data efficiently. This guide covers everything from basic table parsing to advanced scenarios with nested tables and complex structures.
Quick Answer
SwiftSoup allows you to parse HTML tables by selecting the table element and iterating through its rows (<tr>
) and cells (<td>
or <th>
):
import SwiftSoup
do {
let html = "<table><tr><th>Name</th><th>Age</th></tr><tr><td>John</td><td>25</td></tr></table>"
let doc: Document = try SwiftSoup.parse(html)
let table: Element = try doc.select("table").first()!
let rows: Elements = try table.select("tr")
for row in rows.array() {
let cells: Elements = try row.select("td, th")
for cell in cells.array() {
print(try cell.text())
}
}
} catch {
print("Error parsing HTML: \(error)")
}
Setting Up SwiftSoup
First, add SwiftSoup to your project using Swift Package Manager:
dependencies: [
.package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0")
]
Then import SwiftSoup in your Swift file:
import SwiftSoup
Basic Table Parsing
Extracting All Table Data
Here's a comprehensive example of parsing a basic HTML table:
import SwiftSoup
func parseBasicTable() {
let html = """
<table>
<thead>
<tr>
<th>Product</th>
<th>Price</th>
<th>Stock</th>
</tr>
</thead>
<tbody>
<tr>
<td>iPhone 15</td>
<td>$999</td>
<td>50</td>
</tr>
<tr>
<td>MacBook Pro</td>
<td>$2499</td>
<td>25</td>
</tr>
</tbody>
</table>
"""
do {
let doc: Document = try SwiftSoup.parse(html)
let table: Element = try doc.select("table").first()!
// Extract headers
let headers: Elements = try table.select("thead th")
var headerTexts: [String] = []
for header in headers.array() {
headerTexts.append(try header.text())
}
print("Headers: \(headerTexts)")
// Extract data rows
let dataRows: Elements = try table.select("tbody tr")
var tableData: [[String]] = []
for row in dataRows.array() {
let cells: Elements = try row.select("td")
var rowData: [String] = []
for cell in cells.array() {
rowData.append(try cell.text())
}
tableData.append(rowData)
}
print("Table Data: \(tableData)")
} catch {
print("Error parsing table: \(error)")
}
}
Parsing Tables Without Headers
When dealing with tables that don't have explicit header sections:
func parseTableWithoutHeaders() {
let html = """
<table>
<tr><td>Name</td><td>Email</td><td>Role</td></tr>
<tr><td>Alice Johnson</td><td>alice@example.com</td><td>Manager</td></tr>
<tr><td>Bob Smith</td><td>bob@example.com</td><td>Developer</td></tr>
</table>
"""
do {
let doc: Document = try SwiftSoup.parse(html)
let rows: Elements = try doc.select("table tr")
for (index, row) in rows.array().enumerated() {
let cells: Elements = try row.select("td")
var rowData: [String] = []
for cell in cells.array() {
rowData.append(try cell.text())
}
if index == 0 {
print("Headers: \(rowData)")
} else {
print("Row \(index): \(rowData)")
}
}
} catch {
print("Error: \(error)")
}
}
Advanced Table Parsing Techniques
Working with Table Attributes
Extract additional information from table elements using attributes:
func parseTableWithAttributes() {
let html = """
<table id="products" class="data-table">
<tr>
<td data-sort="name">Product A</td>
<td data-sort="price">$29.99</td>
<td class="stock-low">5</td>
</tr>
<tr>
<td data-sort="name">Product B</td>
<td data-sort="price">$49.99</td>
<td class="stock-ok">150</td>
</tr>
</table>
"""
do {
let doc: Document = try SwiftSoup.parse(html)
let table: Element = try doc.select("table#products").first()!
// Get table attributes
let tableId = try table.attr("id")
let tableClass = try table.attr("class")
print("Table ID: \(tableId), Class: \(tableClass)")
let rows: Elements = try table.select("tr")
for row in rows.array() {
let cells: Elements = try row.select("td")
for cell in cells.array() {
let cellText = try cell.text()
let sortAttribute = try cell.attr("data-sort")
let cellClass = try cell.attr("class")
print("Cell: \(cellText)")
if !sortAttribute.isEmpty {
print(" Sort key: \(sortAttribute)")
}
if !cellClass.isEmpty {
print(" Class: \(cellClass)")
}
}
}
} catch {
print("Error: \(error)")
}
}
Handling Complex Table Structures
Deal with tables that have colspan and rowspan attributes:
func parseComplexTable() {
let html = """
<table>
<tr>
<th colspan="2">Sales Report</th>
<th rowspan="2">Total</th>
</tr>
<tr>
<th>Q1</th>
<th>Q2</th>
</tr>
<tr>
<td>$1000</td>
<td>$1500</td>
<td>$2500</td>
</tr>
</table>
"""
do {
let doc: Document = try SwiftSoup.parse(html)
let rows: Elements = try doc.select("table tr")
for (rowIndex, row) in rows.array().enumerated() {
print("Row \(rowIndex):")
let cells: Elements = try row.select("th, td")
for cell in cells.array() {
let cellText = try cell.text()
let colspan = try cell.attr("colspan")
let rowspan = try cell.attr("rowspan")
var cellInfo = " Cell: \(cellText)"
if !colspan.isEmpty && colspan != "1" {
cellInfo += " (colspan: \(colspan))"
}
if !rowspan.isEmpty && rowspan != "1" {
cellInfo += " (rowspan: \(rowspan))"
}
print(cellInfo)
}
}
} catch {
print("Error: \(error)")
}
}
Extracting Specific Data Patterns
Finding Tables by Content
Locate specific tables based on their content:
func findTableByContent() {
let html = """
<div>
<table>
<tr><th>Users</th></tr>
<tr><td>John</td></tr>
</table>
<table>
<tr><th>Products</th><th>Price</th></tr>
<tr><td>iPhone</td><td>$999</td></tr>
</table>
</div>
"""
do {
let doc: Document = try SwiftSoup.parse(html)
// Find table containing "Products" header
let productTables: Elements = try doc.select("table:has(th:contains(Products))")
for table in productTables.array() {
print("Found products table:")
let rows: Elements = try table.select("tr")
for row in rows.array() {
let cells: Elements = try row.select("th, td")
var rowData: [String] = []
for cell in cells.array() {
rowData.append(try cell.text())
}
print(" \(rowData)")
}
}
} catch {
print("Error: \(error)")
}
}
Converting Table to Dictionary
Create structured data from table content:
struct TableData {
let headers: [String]
let rows: [[String]]
func toDictionaryArray() -> [[String: String]] {
return rows.map { row in
var dict: [String: String] = [:]
for (index, value) in row.enumerated() {
if index < headers.count {
dict[headers[index]] = value
}
}
return dict
}
}
}
func parseTableToDictionary() {
let html = """
<table>
<tr><th>Name</th><th>Age</th><th>City</th></tr>
<tr><td>Alice</td><td>30</td><td>New York</td></tr>
<tr><td>Bob</td><td>25</td><td>London</td></tr>
</table>
"""
do {
let doc: Document = try SwiftSoup.parse(html)
let table: Element = try doc.select("table").first()!
// Extract headers
let headerElements: Elements = try table.select("tr:first-child th, tr:first-child td")
let headers = try headerElements.array().map { try $0.text() }
// Extract data rows (skip first row if it contains headers)
let dataRowElements: Elements = try table.select("tr:gt(0)")
var rows: [[String]] = []
for row in dataRowElements.array() {
let cells: Elements = try row.select("td")
let rowData = try cells.array().map { try $0.text() }
rows.append(rowData)
}
let tableData = TableData(headers: headers, rows: rows)
let dictArray = tableData.toDictionaryArray()
for dict in dictArray {
print(dict)
}
} catch {
print("Error: \(error)")
}
}
Error Handling and Edge Cases
Robust Table Parser
Handle various edge cases when parsing tables:
func robustTableParser(html: String) -> [[String]] {
do {
let doc: Document = try SwiftSoup.parse(html)
let tables: Elements = try doc.select("table")
guard !tables.isEmpty() else {
print("No tables found in HTML")
return []
}
let table = tables.first()!
let rows: Elements = try table.select("tr")
var result: [[String]] = []
for row in rows.array() {
// Handle both th and td elements
let cells: Elements = try row.select("th, td")
var rowData: [String] = []
for cell in cells.array() {
let cellText = try cell.text().trimmingCharacters(in: .whitespacesAndNewlines)
rowData.append(cellText)
}
// Only add non-empty rows
if !rowData.isEmpty {
result.append(rowData)
}
}
return result
} catch Exception.Error(let type, let message) {
print("SwiftSoup error (\(type)): \(message)")
return []
} catch {
print("Unexpected error: \(error)")
return []
}
}
Performance Considerations
For large tables or when processing multiple tables, consider these optimization techniques:
Memory-Efficient Processing
func processLargeTable(html: String) {
do {
let doc: Document = try SwiftSoup.parse(html)
let table: Element = try doc.select("table").first()!
let rows: Elements = try table.select("tr")
// Process rows one at a time to minimize memory usage
for (index, row) in rows.array().enumerated() {
autoreleasepool {
do {
let cells: Elements = try row.select("td, th")
// Process each row immediately rather than storing all data
var rowData: [String] = []
for cell in cells.array() {
rowData.append(try cell.text())
}
// Process the row data here
processRow(index: index, data: rowData)
} catch {
print("Error processing row \(index): \(error)")
}
}
}
} catch {
print("Error parsing table: \(error)")
}
}
func processRow(index: Int, data: [String]) {
// Your row processing logic here
print("Processing row \(index): \(data)")
}
Integration with Web Scraping
When combined with network requests, SwiftSoup becomes powerful for web scraping. For scenarios requiring JavaScript execution or complex page interactions, consider using browser automation tools for handling dynamic content.
Complete Web Scraping Example
import Foundation
import SwiftSoup
func scrapeWebTable(from urlString: String, completion: @escaping ([[String]]) -> Void) {
guard let url = URL(string: urlString) else {
print("Invalid URL")
completion([])
return
}
let task = URLSession.shared.dataTask(with: url) { data, response, error in
guard let data = data,
let htmlString = String(data: data, encoding: .utf8) else {
print("Failed to load data")
completion([])
return
}
let tableData = robustTableParser(html: htmlString)
completion(tableData)
}
task.resume()
}
// Usage
scrapeWebTable(from: "https://example.com/data.html") { tableData in
DispatchQueue.main.async {
for row in tableData {
print(row)
}
}
}
Best Practices
- Always handle exceptions: SwiftSoup operations can throw exceptions, so wrap them in do-catch blocks
- Validate table structure: Check if tables exist before attempting to parse them
- Handle empty cells: Some table cells might be empty or contain only whitespace
- Consider cell spanning: Tables with colspan/rowspan require special handling
- Optimize for performance: For large tables, process data incrementally rather than loading everything into memory
Conclusion
SwiftSoup provides excellent capabilities for parsing HTML tables in Swift applications. Whether you're dealing with simple data tables or complex structures with spanning cells, SwiftSoup's CSS selector syntax makes it easy to extract the data you need. Combined with proper error handling and performance considerations, you can build robust table parsing solutions for your iOS and macOS applications.
For more complex scenarios involving JavaScript-heavy sites, consider complementing SwiftSoup with browser automation techniques for single-page applications or specialized web scraping APIs that handle dynamic content.