Table of contents

How do I handle file uploads when scraping with Swift?

File uploads are a common requirement in web scraping scenarios where you need to submit documents, images, or other files to web services. Swift provides robust tools through URLSession and Foundation framework to handle various file upload scenarios. This guide covers different approaches to handle file uploads when scraping with Swift.

Understanding File Upload Mechanisms

When dealing with file uploads in web scraping, you'll encounter several common scenarios:

  1. Multipart form data uploads - Most common for HTML forms
  2. Raw binary uploads - Direct file content submission
  3. Base64 encoded uploads - Files encoded as strings in JSON payloads
  4. Chunked uploads - Large files uploaded in segments

Basic File Upload with URLSession

The foundation of file uploads in Swift is URLSession. Here's a basic implementation for uploading a file:

import Foundation

class FileUploadManager {
    private let session = URLSession.shared

    func uploadFile(fileURL: URL, to uploadURL: URL, completion: @escaping (Result<Data, Error>) -> Void) {
        var request = URLRequest(url: uploadURL)
        request.httpMethod = "POST"
        request.setValue("application/octet-stream", forHTTPHeaderField: "Content-Type")

        let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { data, response, error in
            if let error = error {
                completion(.failure(error))
                return
            }

            guard let data = data else {
                completion(.failure(URLError(.badServerResponse)))
                return
            }

            completion(.success(data))
        }

        uploadTask.resume()
    }
}

Multipart Form Data Uploads

Most web forms use multipart/form-data encoding for file uploads. Here's how to create and send multipart requests:

import Foundation

class MultipartFormUploader {
    private let boundary = UUID().uuidString
    private let session = URLSession.shared

    func uploadFile(fileURL: URL, 
                   fieldName: String, 
                   fileName: String, 
                   mimeType: String,
                   to uploadURL: URL,
                   additionalFields: [String: String] = [:],
                   completion: @escaping (Result<Data, Error>) -> Void) {

        var request = URLRequest(url: uploadURL)
        request.httpMethod = "POST"
        request.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")

        do {
            let httpBody = try createMultipartBody(
                fileURL: fileURL,
                fieldName: fieldName,
                fileName: fileName,
                mimeType: mimeType,
                additionalFields: additionalFields
            )

            let uploadTask = session.uploadTask(with: request, from: httpBody) { data, response, error in
                if let error = error {
                    completion(.failure(error))
                    return
                }

                if let httpResponse = response as? HTTPURLResponse {
                    print("Upload response status: \(httpResponse.statusCode)")
                }

                completion(.success(data ?? Data()))
            }

            uploadTask.resume()

        } catch {
            completion(.failure(error))
        }
    }

    private func createMultipartBody(fileURL: URL,
                                   fieldName: String,
                                   fileName: String,
                                   mimeType: String,
                                   additionalFields: [String: String]) throws -> Data {
        var body = Data()

        // Add additional form fields
        for (key, value) in additionalFields {
            body.append("--\(boundary)\r\n".data(using: .utf8)!)
            body.append("Content-Disposition: form-data; name=\"\(key)\"\r\n\r\n".data(using: .utf8)!)
            body.append("\(value)\r\n".data(using: .utf8)!)
        }

        // Add file data
        body.append("--\(boundary)\r\n".data(using: .utf8)!)
        body.append("Content-Disposition: form-data; name=\"\(fieldName)\"; filename=\"\(fileName)\"\r\n".data(using: .utf8)!)
        body.append("Content-Type: \(mimeType)\r\n\r\n".data(using: .utf8)!)

        let fileData = try Data(contentsOf: fileURL)
        body.append(fileData)
        body.append("\r\n".data(using: .utf8)!)

        // End boundary
        body.append("--\(boundary)--\r\n".data(using: .utf8)!)

        return body
    }
}

Handling Authentication with File Uploads

Many upload endpoints require authentication. Here's how to include authentication headers:

class AuthenticatedFileUploader {
    private let session: URLSession

    init(authToken: String) {
        let config = URLSessionConfiguration.default
        config.httpAdditionalHeaders = [
            "Authorization": "Bearer \(authToken)",
            "User-Agent": "Swift-WebScraper/1.0"
        ]
        self.session = URLSession(configuration: config)
    }

    func uploadWithAuthentication(fileURL: URL, 
                                to uploadURL: URL,
                                completion: @escaping (Result<Data, Error>) -> Void) {
        var request = URLRequest(url: uploadURL)
        request.httpMethod = "POST"

        let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { data, response, error in
            if let error = error {
                completion(.failure(error))
                return
            }

            if let httpResponse = response as? HTTPURLResponse {
                switch httpResponse.statusCode {
                case 200...299:
                    completion(.success(data ?? Data()))
                case 401:
                    completion(.failure(URLError(.userAuthenticationRequired)))
                default:
                    completion(.failure(URLError(.badServerResponse)))
                }
            }
        }

        uploadTask.resume()
    }
}

Progress Tracking for Large Files

When uploading large files, progress tracking is essential:

class ProgressTrackingUploader: NSObject {
    private var session: URLSession!
    private var progressHandler: ((Double) -> Void)?

    override init() {
        super.init()
        let config = URLSessionConfiguration.default
        session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
    }

    func uploadWithProgress(fileURL: URL,
                          to uploadURL: URL,
                          progress: @escaping (Double) -> Void,
                          completion: @escaping (Result<Data, Error>) -> Void) {

        self.progressHandler = progress

        var request = URLRequest(url: uploadURL)
        request.httpMethod = "POST"

        let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { data, response, error in
            if let error = error {
                completion(.failure(error))
                return
            }
            completion(.success(data ?? Data()))
        }

        uploadTask.resume()
    }
}

extension ProgressTrackingUploader: URLSessionTaskDelegate {
    func urlSession(_ session: URLSession, 
                   task: URLSessionTask, 
                   didSendBodyData bytesSent: Int64, 
                   totalBytesSent: Int64, 
                   totalBytesExpectedToSend: Int64) {

        let progress = Double(totalBytesSent) / Double(totalBytesExpectedToSend)
        DispatchQueue.main.async {
            self.progressHandler?(progress)
        }
    }
}

Handling Different File Types

Different file types may require specific handling. Here's a utility to determine MIME types:

extension URL {
    var mimeType: String {
        let pathExtension = self.pathExtension.lowercased()

        switch pathExtension {
        case "jpg", "jpeg":
            return "image/jpeg"
        case "png":
            return "image/png"
        case "gif":
            return "image/gif"
        case "pdf":
            return "application/pdf"
        case "txt":
            return "text/plain"
        case "json":
            return "application/json"
        case "xml":
            return "application/xml"
        case "zip":
            return "application/zip"
        default:
            return "application/octet-stream"
        }
    }
}

// Usage example
let fileURL = URL(fileURLWithPath: "/path/to/document.pdf")
let mimeType = fileURL.mimeType
print("MIME type: \(mimeType)") // Output: application/pdf

Error Handling and Retry Logic

Robust file upload implementations should include proper error handling and retry mechanisms:

class RobustFileUploader {
    private let session = URLSession.shared
    private let maxRetries = 3

    func uploadWithRetry(fileURL: URL,
                        to uploadURL: URL,
                        retryCount: Int = 0,
                        completion: @escaping (Result<Data, Error>) -> Void) {

        var request = URLRequest(url: uploadURL)
        request.httpMethod = "POST"
        request.timeoutInterval = 60.0

        let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { [weak self] data, response, error in

            if let error = error {
                if retryCount < self?.maxRetries ?? 0 {
                    print("Upload failed, retrying... (attempt \(retryCount + 1))")
                    DispatchQueue.global().asyncAfter(deadline: .now() + Double(retryCount + 1)) {
                        self?.uploadWithRetry(fileURL: fileURL, 
                                            to: uploadURL, 
                                            retryCount: retryCount + 1, 
                                            completion: completion)
                    }
                } else {
                    completion(.failure(error))
                }
                return
            }

            if let httpResponse = response as? HTTPURLResponse {
                if httpResponse.statusCode >= 400 {
                    let statusError = NSError(domain: "HTTPError", 
                                            code: httpResponse.statusCode, 
                                            userInfo: [NSLocalizedDescriptionKey: "HTTP \(httpResponse.statusCode)"])
                    completion(.failure(statusError))
                    return
                }
            }

            completion(.success(data ?? Data()))
        }

        uploadTask.resume()
    }
}

Working with Web Forms and File Inputs

When scraping websites that require file uploads through HTML forms, you'll need to parse the form structure first. While Swift doesn't have built-in HTML parsing like some other languages, you can use techniques similar to how other browser automation tools handle form interactions to understand the required form fields.

struct FormField {
    let name: String
    let type: String
    let required: Bool
}

class FormUploadHandler {
    func parseFormFields(from html: String) -> [FormField] {
        // Basic HTML parsing logic
        // In production, consider using a proper HTML parser
        var fields: [FormField] = []

        let inputPattern = #"<input[^>]*name\s*=\s*["\']([^"\']*)["\'][^>]*>"#
        let regex = try! NSRegularExpression(pattern: inputPattern, options: .caseInsensitive)

        let matches = regex.matches(in: html, options: [], range: NSRange(html.startIndex..., in: html))

        for match in matches {
            if let nameRange = Range(match.range(at: 1), in: html) {
                let fieldName = String(html[nameRange])
                fields.append(FormField(name: fieldName, type: "input", required: false))
            }
        }

        return fields
    }
}

Async/Await Support (iOS 15+)

For modern Swift applications targeting iOS 15+, you can use async/await for cleaner code:

@available(iOS 15.0, *)
class AsyncFileUploader {
    private let session = URLSession.shared

    func uploadFile(fileURL: URL, to uploadURL: URL) async throws -> Data {
        var request = URLRequest(url: uploadURL)
        request.httpMethod = "POST"

        let (data, response) = try await session.upload(for: request, fromFile: fileURL)

        guard let httpResponse = response as? HTTPURLResponse else {
            throw URLError(.badServerResponse)
        }

        guard 200...299 ~= httpResponse.statusCode else {
            throw URLError(.badServerResponse)
        }

        return data
    }
}

// Usage
Task {
    do {
        let uploader = AsyncFileUploader()
        let result = try await uploader.uploadFile(fileURL: fileURL, to: uploadURL)
        print("Upload successful: \(result.count) bytes received")
    } catch {
        print("Upload failed: \(error)")
    }
}

Best Practices and Considerations

  1. File Size Limits: Always check the target server's file size limits before attempting uploads
  2. Memory Management: For large files, use streaming uploads to avoid loading entire files into memory
  3. Network Conditions: Implement proper timeout handling and retry logic for unreliable networks
  4. Security: Validate file types and sizes before uploading to prevent security issues
  5. Progress Feedback: Provide progress indicators for better user experience during long uploads

Troubleshooting Common Issues

Issue: Upload Fails with 413 (Payload Too Large)

Solution: Check server configuration and split large files into chunks if necessary.

Issue: Timeout Errors on Slow Networks

Solution: Increase timeout intervals and implement exponential backoff retry logic.

Issue: Authentication Failures

Solution: Ensure proper authentication headers are included and tokens are valid.

Conclusion

Handling file uploads in Swift web scraping requires understanding various upload mechanisms and implementing robust error handling. The examples provided cover the most common scenarios you'll encounter, from basic uploads to complex multipart forms with authentication. Similar to how browser automation tools handle file operations, proper implementation of file uploads requires attention to detail and robust error handling.

Remember to always test your upload implementations with different file types and sizes to ensure reliability across various scenarios. With these techniques, you'll be well-equipped to handle file uploads in your Swift web scraping projects.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon