How can I use SwiftSoup to download and save images from a webpage?

SwiftSoup is a pure Swift library that allows for parsing and manipulating HTML, similar to how Jsoup works for Java. While SwiftSoup doesn't have a direct method to download and save images from a webpage, you can use it to extract the image URLs. Then, you can download the images using URLSession or other networking methods available in Swift.

Here's a step-by-step guide on how to use SwiftSoup to download and save images from a webpage:

Step 1: Parse the HTML document

Start by fetching the HTML content of the webpage and parsing it using SwiftSoup.

import SwiftSoup

func fetchHTML(from url: String, completion: @escaping (Result<Document, Error>) -> Void) {
    guard let url = URL(string: url) else {
        completion(.failure(NSError(domain: "", code: 0, userInfo: [NSLocalizedDescriptionKey: "Invalid URL"])))
        return
    }

    URLSession.shared.dataTask(with: url) { data, response, error in
        if let error = error {
            completion(.failure(error))
            return
        }

        guard let data = data, let html = String(data: data, encoding: .utf8) else {
            completion(.failure(NSError(domain: "", code: 0, userInfo: [NSLocalizedDescriptionKey: "Failed to decode HTML"])))
            return
        }

        do {
            let document = try SwiftSoup.parse(html)
            completion(.success(document))
        } catch {
            completion(.failure(error))
        }
    }.resume()
}

Step 2: Extract image URLs

Once you have the parsed HTML Document, extract all the image URLs using SwiftSoup.

func extractImageURLs(from document: Document) -> [URL] {
    do {
        let imageElements = try document.select("img")
        let srcs = imageElements.array().compactMap { try? $0.attr("src").trimmingCharacters(in: .whitespacesAndNewlines) }
        return srcs.compactMap { URL(string: $0) }
    } catch {
        print("Error extracting image URLs: \(error)")
        return []
    }
}

Step 3: Download and save images

With the URLs extracted, you can now download and save the images to the local file system.

func downloadImage(from url: URL, to directory: URL, completion: @escaping (Error?) -> Void) {
    URLSession.shared.dataTask(with: url) { data, response, error in
        if let error = error {
            completion(error)
            return
        }

        guard let data = data, let httpResponse = response as? HTTPURLResponse, httpResponse.statusCode == 200 else {
            completion(NSError(domain: "", code: 0, userInfo: [NSLocalizedDescriptionKey: "Invalid response or data"]))
            return
        }

        let fileName = url.lastPathComponent
        let fileURL = directory.appendingPathComponent(fileName)

        do {
            try data.write(to: fileURL)
            completion(nil)
        } catch {
            completion(error)
        }
    }.resume()
}

// Example usage
let urlString = "http://example.com"
let saveDirectory = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]

fetchHTML(from: urlString) { result in
    switch result {
    case .success(let document):
        let imageURLs = extractImageURLs(from: document)
        imageURLs.forEach { imageURL in
            downloadImage(from: imageURL, to: saveDirectory) { error in
                if let error = error {
                    print("Error downloading image: \(error)")
                } else {
                    print("Image downloaded: \(imageURL.lastPathComponent)")
                }
            }
        }
    case .failure(let error):
        print("Error fetching HTML: \(error)")
    }
}

In this Swift code:

  • We first define a function fetchHTML to fetch the HTML content of a webpage and parse it using SwiftSoup.
  • Then, extractImageURLs takes the parsed Document and extracts the src attribute of all img tags, returning an array of URL objects.
  • The downloadImage function then takes each URL, downloads the image data using URLSession, and saves it to the specified directory using FileManager.
  • Finally, we use these functions together to download images from a given webpage URL.

Please make sure you have the legal right to scrape the content from the webpage and that you are complying with the website's terms of service or robots.txt file.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon