How do I scrape a table from a webpage using SwiftSoup?

SwiftSoup is a pure Swift library that allows you to parse, and manipulate HTML content, similar to JSoup in the Java ecosystem. To scrape a table from a webpage using SwiftSoup, you would first need to download the HTML content of the webpage and then parse it to extract the table data.

Here's how you can scrape a table from a webpage using SwiftSoup:

  1. Add SwiftSoup to your project. If you're using CocoaPods, you can add it to your Podfile:
pod 'SwiftSoup'

Alternatively, if you're using Swift Package Manager, you can add the following to your Package.swift:

dependencies: [
    .package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.3.2")
]
  1. Import the SwiftSoup library in your Swift file:
import SwiftSoup
  1. Fetch the HTML content of the webpage. You can use URLSession for this purpose:
func fetchHTML(from url: String, completion: @escaping (String?) -> Void) {
    guard let url = URL(string: url) else {
        completion(nil)
        return
    }

    let task = URLSession.shared.dataTask(with: url) { data, response, error in
        guard let data = data, error == nil else {
            completion(nil)
            return
        }
        let htmlString = String(data: data, encoding: .utf8)
        completion(htmlString)
    }
    task.resume()
}
  1. Parse the HTML and extract the table data using SwiftSoup:
func parseTable(html: String) throws -> [[String]] {
    let document = try SwiftSoup.parse(html)
    var tableData = [[String]]()

    // Find the table you want to scrape
    // If there's more than one table, you may need to be more specific in your selector
    let table = try document.select("table").first()

    // Iterate through each row of the table
    try table?.select("tr").forEach({ row in
        var rowData = [String]()

        // Iterate through each column of the row
        try row.select("td").forEach({ column in
            let text = try column.text()
            rowData.append(text)
        })

        if !rowData.isEmpty {
            tableData.append(rowData)
        }
    })

    return tableData
}
  1. Call the functions and handle the table data:
let url = "https://example.com/tablepage.html"

fetchHTML(from: url) { html in
    guard let html = html else {
        print("Error fetching HTML.")
        return
    }

    do {
        let tableData = try parseTable(html: html)
        // Now you have the table data in a 2D array
        // You can process it further as needed
        print(tableData)
    } catch {
        print("Error parsing HTML: \(error.localizedDescription)")
    }
}

Please note that web scraping should be done responsibly. You should always check the website's robots.txt file and terms of service to ensure that you are allowed to scrape the data. Also, be respectful of the website's resources and do not make excessive requests that could impact the server's performance.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon