How do I handle file uploads when scraping with Swift?
File uploads are a common requirement in web scraping scenarios where you need to submit documents, images, or other files to web services. Swift provides robust tools through URLSession and Foundation framework to handle various file upload scenarios. This guide covers different approaches to handle file uploads when scraping with Swift.
Understanding File Upload Mechanisms
When dealing with file uploads in web scraping, you'll encounter several common scenarios:
- Multipart form data uploads - Most common for HTML forms
- Raw binary uploads - Direct file content submission
- Base64 encoded uploads - Files encoded as strings in JSON payloads
- Chunked uploads - Large files uploaded in segments
Basic File Upload with URLSession
The foundation of file uploads in Swift is URLSession. Here's a basic implementation for uploading a file:
import Foundation
class FileUploadManager {
private let session = URLSession.shared
func uploadFile(fileURL: URL, to uploadURL: URL, completion: @escaping (Result<Data, Error>) -> Void) {
var request = URLRequest(url: uploadURL)
request.httpMethod = "POST"
request.setValue("application/octet-stream", forHTTPHeaderField: "Content-Type")
let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { data, response, error in
if let error = error {
completion(.failure(error))
return
}
guard let data = data else {
completion(.failure(URLError(.badServerResponse)))
return
}
completion(.success(data))
}
uploadTask.resume()
}
}
Multipart Form Data Uploads
Most web forms use multipart/form-data encoding for file uploads. Here's how to create and send multipart requests:
import Foundation
class MultipartFormUploader {
private let boundary = UUID().uuidString
private let session = URLSession.shared
func uploadFile(fileURL: URL,
fieldName: String,
fileName: String,
mimeType: String,
to uploadURL: URL,
additionalFields: [String: String] = [:],
completion: @escaping (Result<Data, Error>) -> Void) {
var request = URLRequest(url: uploadURL)
request.httpMethod = "POST"
request.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")
do {
let httpBody = try createMultipartBody(
fileURL: fileURL,
fieldName: fieldName,
fileName: fileName,
mimeType: mimeType,
additionalFields: additionalFields
)
let uploadTask = session.uploadTask(with: request, from: httpBody) { data, response, error in
if let error = error {
completion(.failure(error))
return
}
if let httpResponse = response as? HTTPURLResponse {
print("Upload response status: \(httpResponse.statusCode)")
}
completion(.success(data ?? Data()))
}
uploadTask.resume()
} catch {
completion(.failure(error))
}
}
private func createMultipartBody(fileURL: URL,
fieldName: String,
fileName: String,
mimeType: String,
additionalFields: [String: String]) throws -> Data {
var body = Data()
// Add additional form fields
for (key, value) in additionalFields {
body.append("--\(boundary)\r\n".data(using: .utf8)!)
body.append("Content-Disposition: form-data; name=\"\(key)\"\r\n\r\n".data(using: .utf8)!)
body.append("\(value)\r\n".data(using: .utf8)!)
}
// Add file data
body.append("--\(boundary)\r\n".data(using: .utf8)!)
body.append("Content-Disposition: form-data; name=\"\(fieldName)\"; filename=\"\(fileName)\"\r\n".data(using: .utf8)!)
body.append("Content-Type: \(mimeType)\r\n\r\n".data(using: .utf8)!)
let fileData = try Data(contentsOf: fileURL)
body.append(fileData)
body.append("\r\n".data(using: .utf8)!)
// End boundary
body.append("--\(boundary)--\r\n".data(using: .utf8)!)
return body
}
}
Handling Authentication with File Uploads
Many upload endpoints require authentication. Here's how to include authentication headers:
class AuthenticatedFileUploader {
private let session: URLSession
init(authToken: String) {
let config = URLSessionConfiguration.default
config.httpAdditionalHeaders = [
"Authorization": "Bearer \(authToken)",
"User-Agent": "Swift-WebScraper/1.0"
]
self.session = URLSession(configuration: config)
}
func uploadWithAuthentication(fileURL: URL,
to uploadURL: URL,
completion: @escaping (Result<Data, Error>) -> Void) {
var request = URLRequest(url: uploadURL)
request.httpMethod = "POST"
let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { data, response, error in
if let error = error {
completion(.failure(error))
return
}
if let httpResponse = response as? HTTPURLResponse {
switch httpResponse.statusCode {
case 200...299:
completion(.success(data ?? Data()))
case 401:
completion(.failure(URLError(.userAuthenticationRequired)))
default:
completion(.failure(URLError(.badServerResponse)))
}
}
}
uploadTask.resume()
}
}
Progress Tracking for Large Files
When uploading large files, progress tracking is essential:
class ProgressTrackingUploader: NSObject {
private var session: URLSession!
private var progressHandler: ((Double) -> Void)?
override init() {
super.init()
let config = URLSessionConfiguration.default
session = URLSession(configuration: config, delegate: self, delegateQueue: nil)
}
func uploadWithProgress(fileURL: URL,
to uploadURL: URL,
progress: @escaping (Double) -> Void,
completion: @escaping (Result<Data, Error>) -> Void) {
self.progressHandler = progress
var request = URLRequest(url: uploadURL)
request.httpMethod = "POST"
let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { data, response, error in
if let error = error {
completion(.failure(error))
return
}
completion(.success(data ?? Data()))
}
uploadTask.resume()
}
}
extension ProgressTrackingUploader: URLSessionTaskDelegate {
func urlSession(_ session: URLSession,
task: URLSessionTask,
didSendBodyData bytesSent: Int64,
totalBytesSent: Int64,
totalBytesExpectedToSend: Int64) {
let progress = Double(totalBytesSent) / Double(totalBytesExpectedToSend)
DispatchQueue.main.async {
self.progressHandler?(progress)
}
}
}
Handling Different File Types
Different file types may require specific handling. Here's a utility to determine MIME types:
extension URL {
var mimeType: String {
let pathExtension = self.pathExtension.lowercased()
switch pathExtension {
case "jpg", "jpeg":
return "image/jpeg"
case "png":
return "image/png"
case "gif":
return "image/gif"
case "pdf":
return "application/pdf"
case "txt":
return "text/plain"
case "json":
return "application/json"
case "xml":
return "application/xml"
case "zip":
return "application/zip"
default:
return "application/octet-stream"
}
}
}
// Usage example
let fileURL = URL(fileURLWithPath: "/path/to/document.pdf")
let mimeType = fileURL.mimeType
print("MIME type: \(mimeType)") // Output: application/pdf
Error Handling and Retry Logic
Robust file upload implementations should include proper error handling and retry mechanisms:
class RobustFileUploader {
private let session = URLSession.shared
private let maxRetries = 3
func uploadWithRetry(fileURL: URL,
to uploadURL: URL,
retryCount: Int = 0,
completion: @escaping (Result<Data, Error>) -> Void) {
var request = URLRequest(url: uploadURL)
request.httpMethod = "POST"
request.timeoutInterval = 60.0
let uploadTask = session.uploadTask(with: request, fromFile: fileURL) { [weak self] data, response, error in
if let error = error {
if retryCount < self?.maxRetries ?? 0 {
print("Upload failed, retrying... (attempt \(retryCount + 1))")
DispatchQueue.global().asyncAfter(deadline: .now() + Double(retryCount + 1)) {
self?.uploadWithRetry(fileURL: fileURL,
to: uploadURL,
retryCount: retryCount + 1,
completion: completion)
}
} else {
completion(.failure(error))
}
return
}
if let httpResponse = response as? HTTPURLResponse {
if httpResponse.statusCode >= 400 {
let statusError = NSError(domain: "HTTPError",
code: httpResponse.statusCode,
userInfo: [NSLocalizedDescriptionKey: "HTTP \(httpResponse.statusCode)"])
completion(.failure(statusError))
return
}
}
completion(.success(data ?? Data()))
}
uploadTask.resume()
}
}
Working with Web Forms and File Inputs
When scraping websites that require file uploads through HTML forms, you'll need to parse the form structure first. While Swift doesn't have built-in HTML parsing like some other languages, you can use techniques similar to how other browser automation tools handle form interactions to understand the required form fields.
struct FormField {
let name: String
let type: String
let required: Bool
}
class FormUploadHandler {
func parseFormFields(from html: String) -> [FormField] {
// Basic HTML parsing logic
// In production, consider using a proper HTML parser
var fields: [FormField] = []
let inputPattern = #"<input[^>]*name\s*=\s*["\']([^"\']*)["\'][^>]*>"#
let regex = try! NSRegularExpression(pattern: inputPattern, options: .caseInsensitive)
let matches = regex.matches(in: html, options: [], range: NSRange(html.startIndex..., in: html))
for match in matches {
if let nameRange = Range(match.range(at: 1), in: html) {
let fieldName = String(html[nameRange])
fields.append(FormField(name: fieldName, type: "input", required: false))
}
}
return fields
}
}
Async/Await Support (iOS 15+)
For modern Swift applications targeting iOS 15+, you can use async/await for cleaner code:
@available(iOS 15.0, *)
class AsyncFileUploader {
private let session = URLSession.shared
func uploadFile(fileURL: URL, to uploadURL: URL) async throws -> Data {
var request = URLRequest(url: uploadURL)
request.httpMethod = "POST"
let (data, response) = try await session.upload(for: request, fromFile: fileURL)
guard let httpResponse = response as? HTTPURLResponse else {
throw URLError(.badServerResponse)
}
guard 200...299 ~= httpResponse.statusCode else {
throw URLError(.badServerResponse)
}
return data
}
}
// Usage
Task {
do {
let uploader = AsyncFileUploader()
let result = try await uploader.uploadFile(fileURL: fileURL, to: uploadURL)
print("Upload successful: \(result.count) bytes received")
} catch {
print("Upload failed: \(error)")
}
}
Best Practices and Considerations
- File Size Limits: Always check the target server's file size limits before attempting uploads
- Memory Management: For large files, use streaming uploads to avoid loading entire files into memory
- Network Conditions: Implement proper timeout handling and retry logic for unreliable networks
- Security: Validate file types and sizes before uploading to prevent security issues
- Progress Feedback: Provide progress indicators for better user experience during long uploads
Troubleshooting Common Issues
Issue: Upload Fails with 413 (Payload Too Large)
Solution: Check server configuration and split large files into chunks if necessary.
Issue: Timeout Errors on Slow Networks
Solution: Increase timeout intervals and implement exponential backoff retry logic.
Issue: Authentication Failures
Solution: Ensure proper authentication headers are included and tokens are valid.
Conclusion
Handling file uploads in Swift web scraping requires understanding various upload mechanisms and implementing robust error handling. The examples provided cover the most common scenarios you'll encounter, from basic uploads to complex multipart forms with authentication. Similar to how browser automation tools handle file operations, proper implementation of file uploads requires attention to detail and robust error handling.
Remember to always test your upload implementations with different file types and sizes to ensure reliability across various scenarios. With these techniques, you'll be well-equipped to handle file uploads in your Swift web scraping projects.