SwiftSoup is a Swift library for parsing HTML and XML. It provides a way for iOS developers to work with real-world HTML in a similar way to how JSoup works for Java developers. To use SwiftSoup to parse an HTML document from a URL in a Swift application, you will need to perform the following steps:
- Add SwiftSoup to Your Project: If you're using CocoaPods, you can add SwiftSoup to your Podfile:
pod 'SwiftSoup'
And then run pod install
. If you're using Swift Package Manager, you can add SwiftSoup as a dependency in your Package.swift
file.
- Import SwiftSoup: In the Swift file where you intend to parse HTML, import the SwiftSoup library:
import SwiftSoup
Fetch the HTML Content: Use Swift's
URLSession
to fetch the HTML content from the web. Be aware that network requests should be performed on a background thread, not on the main thread, to avoid blocking the UI.Parse the HTML Document: Once you have the HTML content, you can use SwiftSoup to parse it and manipulate the DOM.
Here’s an example of how you might implement these steps:
import SwiftSoup
// Function to load and parse HTML from a URL
func loadAndParseHTML(from urlString: String) {
guard let url = URL(string: urlString) else {
print("Invalid URL")
return
}
// Perform network request to get HTML data
let task = URLSession.shared.dataTask(with: url) { (data, response, error) in
// Handle errors
if let error = error {
print("Error fetching HTML: \(error)")
return
}
// Ensure we received data
guard let data = data else {
print("Did not receive data")
return
}
// Parse HTML
do {
let html = String(data: data, encoding: .utf8)!
let doc: Document = try SwiftSoup.parse(html)
// Example: Finding elements by tag
let links: Elements = try doc.select("a")
for link in links {
let linkHref: String = try link.attr("href")
let linkText: String = try link.text()
print("\(linkText): \(linkHref)")
}
} catch Exception.Error(let type, let message) {
print("Error of type \(type): \(message)")
} catch {
print("error")
}
}
// Start the network request
task.resume()
}
// Example usage:
loadAndParseHTML(from: "https://example.com")
Remember that this code should be run in a context where asynchronous execution is possible. iOS does not allow network requests on the main thread, so this code will need to be called in a way that respects that restriction, such as in a background thread or within an asynchronous context provided by Combine or Swift concurrency (if you're targeting iOS 15+ with async/await).
Additionally, you must handle any errors appropriately, and be aware of the App Transport Security (ATS) requirements that apply to network requests made from iOS apps. ATS requires secure connections using HTTPS by default, and if you need to connect to an HTTP URL, you will need to add exceptions for your domain in your app's Info.plist file.