SwiftSoup is a pure Swift library that allows you to parse, and manipulate HTML content, similar to JSoup in the Java ecosystem. To scrape a table from a webpage using SwiftSoup, you would first need to download the HTML content of the webpage and then parse it to extract the table data.
Here's how you can scrape a table from a webpage using SwiftSoup:
- Add SwiftSoup to your project. If you're using CocoaPods, you can add it to your
Podfile
:
pod 'SwiftSoup'
Alternatively, if you're using Swift Package Manager, you can add the following to your Package.swift
:
dependencies: [
.package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.3.2")
]
- Import the SwiftSoup library in your Swift file:
import SwiftSoup
- Fetch the HTML content of the webpage. You can use
URLSession
for this purpose:
func fetchHTML(from url: String, completion: @escaping (String?) -> Void) {
guard let url = URL(string: url) else {
completion(nil)
return
}
let task = URLSession.shared.dataTask(with: url) { data, response, error in
guard let data = data, error == nil else {
completion(nil)
return
}
let htmlString = String(data: data, encoding: .utf8)
completion(htmlString)
}
task.resume()
}
- Parse the HTML and extract the table data using SwiftSoup:
func parseTable(html: String) throws -> [[String]] {
let document = try SwiftSoup.parse(html)
var tableData = [[String]]()
// Find the table you want to scrape
// If there's more than one table, you may need to be more specific in your selector
let table = try document.select("table").first()
// Iterate through each row of the table
try table?.select("tr").forEach({ row in
var rowData = [String]()
// Iterate through each column of the row
try row.select("td").forEach({ column in
let text = try column.text()
rowData.append(text)
})
if !rowData.isEmpty {
tableData.append(rowData)
}
})
return tableData
}
- Call the functions and handle the table data:
let url = "https://example.com/tablepage.html"
fetchHTML(from: url) { html in
guard let html = html else {
print("Error fetching HTML.")
return
}
do {
let tableData = try parseTable(html: html)
// Now you have the table data in a 2D array
// You can process it further as needed
print(tableData)
} catch {
print("Error parsing HTML: \(error.localizedDescription)")
}
}
Please note that web scraping should be done responsibly. You should always check the website's robots.txt
file and terms of service to ensure that you are allowed to scrape the data. Also, be respectful of the website's resources and do not make excessive requests that could impact the server's performance.