Table of contents

How do I scrape data from REST APIs using Swift?

Scraping data from REST APIs in Swift is a fundamental skill for iOS and macOS developers. Unlike traditional web scraping that involves parsing HTML, API scraping focuses on making HTTP requests and parsing structured data formats like JSON. Swift provides robust tools for API communication, making it an excellent choice for building data-driven applications.

Understanding REST API Data Scraping

REST API scraping involves making HTTP requests to web services and processing the returned data. This approach is more reliable than HTML scraping because APIs provide structured data formats and are designed for programmatic access. Swift's strong typing system and modern networking capabilities make it particularly well-suited for this task.

Using URLSession for Basic API Requests

Swift's built-in URLSession is the foundation for all network requests. Here's how to make a basic GET request to fetch data from a REST API:

import Foundation

func fetchAPIData(from urlString: String, completion: @escaping (Data?, Error?) -> Void) {
    guard let url = URL(string: urlString) else {
        completion(nil, URLError(.badURL))
        return
    }

    let task = URLSession.shared.dataTask(with: url) { data, response, error in
        if let error = error {
            completion(nil, error)
            return
        }

        guard let httpResponse = response as? HTTPURLResponse,
              200...299 ~= httpResponse.statusCode else {
            completion(nil, URLError(.badServerResponse))
            return
        }

        completion(data, nil)
    }

    task.resume()
}

// Usage example
fetchAPIData(from: "https://api.example.com/users") { data, error in
    if let error = error {
        print("Error: \(error)")
        return
    }

    if let data = data {
        // Process the data
        print("Received \(data.count) bytes")
    }
}

Implementing Codable for JSON Parsing

Swift's Codable protocol makes JSON parsing straightforward and type-safe. Define your data models to match the API response structure:

struct User: Codable {
    let id: Int
    let name: String
    let email: String
    let company: Company?
}

struct Company: Codable {
    let name: String
    let catchPhrase: String
}

struct APIResponse<T: Codable>: Codable {
    let data: [T]
    let meta: Meta?
}

struct Meta: Codable {
    let total: Int
    let page: Int
    let perPage: Int
}

func parseJSON<T: Codable>(_ data: Data, as type: T.Type) -> T? {
    do {
        let decoder = JSONDecoder()
        decoder.keyDecodingStrategy = .convertFromSnakeCase
        return try decoder.decode(type, from: data)
    } catch {
        print("JSON parsing error: \(error)")
        return nil
    }
}

Creating a Comprehensive API Client

Build a reusable API client that handles common requirements like authentication, custom headers, and different HTTP methods:

import Foundation

class APIClient {
    private let baseURL: String
    private let session: URLSession
    private var defaultHeaders: [String: String] = [:]

    init(baseURL: String, timeout: TimeInterval = 30.0) {
        self.baseURL = baseURL

        let configuration = URLSessionConfiguration.default
        configuration.timeoutIntervalForRequest = timeout
        configuration.timeoutIntervalForResource = timeout * 2
        self.session = URLSession(configuration: configuration)
    }

    func setAuthToken(_ token: String) {
        defaultHeaders["Authorization"] = "Bearer \(token)"
    }

    func setAPIKey(_ key: String) {
        defaultHeaders["X-API-Key"] = key
    }

    enum HTTPMethod: String {
        case GET = "GET"
        case POST = "POST"
        case PUT = "PUT"
        case DELETE = "DELETE"
    }

    func request<T: Codable>(
        endpoint: String,
        method: HTTPMethod = .GET,
        parameters: [String: Any]? = nil,
        responseType: T.Type,
        completion: @escaping (Result<T, Error>) -> Void
    ) {
        guard let url = URL(string: baseURL + endpoint) else {
            completion(.failure(URLError(.badURL)))
            return
        }

        var request = URLRequest(url: url)
        request.httpMethod = method.rawValue

        // Add default headers
        defaultHeaders.forEach { request.setValue($1, forHTTPHeaderField: $0) }
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")

        // Add parameters for POST/PUT requests
        if let parameters = parameters, method != .GET {
            do {
                request.httpBody = try JSONSerialization.data(withJSONObject: parameters)
            } catch {
                completion(.failure(error))
                return
            }
        }

        let task = session.dataTask(with: request) { data, response, error in
            if let error = error {
                completion(.failure(error))
                return
            }

            guard let data = data else {
                completion(.failure(URLError(.badServerResponse)))
                return
            }

            do {
                let decoder = JSONDecoder()
                decoder.keyDecodingStrategy = .convertFromSnakeCase
                let result = try decoder.decode(responseType, from: data)
                completion(.success(result))
            } catch {
                completion(.failure(error))
            }
        }

        task.resume()
    }
}

Using Alamofire for Advanced Features

While URLSession is powerful, Alamofire provides additional convenience features for complex API interactions:

import Alamofire
import Foundation

class AlamofireAPIClient {
    private let baseURL: String
    private let session: Session

    init(baseURL: String) {
        self.baseURL = baseURL

        let configuration = URLSessionConfiguration.default
        configuration.timeoutIntervalForRequest = 30
        self.session = Session(configuration: configuration)
    }

    func fetchUsers(completion: @escaping (Result<[User], Error>) -> Void) {
        let url = baseURL + "/users"

        session.request(url)
            .validate(statusCode: 200..<300)
            .responseDecodable(of: [User].self) { response in
                switch response.result {
                case .success(let users):
                    completion(.success(users))
                case .failure(let error):
                    completion(.failure(error))
                }
            }
    }

    func uploadData<T: Codable>(
        endpoint: String,
        data: [String: Any],
        responseType: T.Type,
        completion: @escaping (Result<T, Error>) -> Void
    ) {
        let url = baseURL + endpoint

        session.request(url, method: .post, parameters: data, encoding: JSONEncoding.default)
            .validate()
            .responseDecodable(of: responseType) { response in
                completion(response.result)
            }
    }
}

Handling Authentication and Headers

Many APIs require authentication. Here's how to handle various authentication methods:

extension APIClient {
    // Basic Authentication
    func setBasicAuth(username: String, password: String) {
        let credentials = "\(username):\(password)"
        let base64 = Data(credentials.utf8).base64EncodedString()
        defaultHeaders["Authorization"] = "Basic \(base64)"
    }

    // OAuth Bearer Token
    func setBearerToken(_ token: String) {
        defaultHeaders["Authorization"] = "Bearer \(token)"
    }

    // Custom Headers
    func setCustomHeaders(_ headers: [String: String]) {
        headers.forEach { defaultHeaders[$0.key] = $0.value }
    }
}

// Usage example
let client = APIClient(baseURL: "https://api.example.com")
client.setAPIKey("your-api-key-here")
client.setCustomHeaders([
    "User-Agent": "MyApp/1.0",
    "Accept-Language": "en-US"
])

Error Handling and Retry Logic

Robust error handling is crucial for reliable API scraping. Implement retry logic for transient failures:

extension APIClient {
    func requestWithRetry<T: Codable>(
        endpoint: String,
        method: HTTPMethod = .GET,
        parameters: [String: Any]? = nil,
        responseType: T.Type,
        maxRetries: Int = 3,
        completion: @escaping (Result<T, Error>) -> Void
    ) {
        attemptRequest(
            endpoint: endpoint,
            method: method,
            parameters: parameters,
            responseType: responseType,
            attempt: 1,
            maxRetries: maxRetries,
            completion: completion
        )
    }

    private func attemptRequest<T: Codable>(
        endpoint: String,
        method: HTTPMethod,
        parameters: [String: Any]?,
        responseType: T.Type,
        attempt: Int,
        maxRetries: Int,
        completion: @escaping (Result<T, Error>) -> Void
    ) {
        request(
            endpoint: endpoint,
            method: method,
            parameters: parameters,
            responseType: responseType
        ) { result in
            switch result {
            case .success(let data):
                completion(.success(data))
            case .failure(let error):
                if attempt < maxRetries && self.shouldRetry(error: error) {
                    let delay = TimeInterval(attempt * 2) // Exponential backoff
                    DispatchQueue.global().asyncAfter(deadline: .now() + delay) {
                        self.attemptRequest(
                            endpoint: endpoint,
                            method: method,
                            parameters: parameters,
                            responseType: responseType,
                            attempt: attempt + 1,
                            maxRetries: maxRetries,
                            completion: completion
                        )
                    }
                } else {
                    completion(.failure(error))
                }
            }
        }
    }

    private func shouldRetry(error: Error) -> Bool {
        if let urlError = error as? URLError {
            switch urlError.code {
            case .timedOut, .networkConnectionLost, .notConnectedToInternet:
                return true
            default:
                return false
            }
        }
        return false
    }
}

Rate Limiting and Throttling

Implement rate limiting to respect API quotas and avoid being blocked:

class RateLimitedAPIClient: APIClient {
    private let requestQueue = DispatchQueue(label: "api.requests", qos: .utility)
    private let semaphore: DispatchSemaphore
    private let requestsPerSecond: Int

    init(baseURL: String, requestsPerSecond: Int = 10) {
        self.requestsPerSecond = requestsPerSecond
        self.semaphore = DispatchSemaphore(value: requestsPerSecond)
        super.init(baseURL: baseURL)
    }

    override func request<T: Codable>(
        endpoint: String,
        method: HTTPMethod = .GET,
        parameters: [String: Any]? = nil,
        responseType: T.Type,
        completion: @escaping (Result<T, Error>) -> Void
    ) {
        requestQueue.async {
            self.semaphore.wait()

            super.request(
                endpoint: endpoint,
                method: method,
                parameters: parameters,
                responseType: responseType
            ) { result in
                completion(result)

                // Release semaphore after delay
                DispatchQueue.global().asyncAfter(deadline: .now() + 1.0 / Double(self.requestsPerSecond)) {
                    self.semaphore.signal()
                }
            }
        }
    }
}

Pagination Handling

Many APIs use pagination for large datasets. Handle paginated responses efficiently:

struct PaginatedResponse<T: Codable>: Codable {
    let data: [T]
    let pagination: Pagination
}

struct Pagination: Codable {
    let currentPage: Int
    let totalPages: Int
    let perPage: Int
    let total: Int
}

extension APIClient {
    func fetchAllPages<T: Codable>(
        endpoint: String,
        responseType: T.Type,
        completion: @escaping (Result<[T], Error>) -> Void
    ) {
        var allData: [T] = []
        var currentPage = 1

        func fetchPage() {
            let paginatedEndpoint = "\(endpoint)?page=\(currentPage)"

            request(
                endpoint: paginatedEndpoint,
                responseType: PaginatedResponse<T>.self
            ) { result in
                switch result {
                case .success(let response):
                    allData.append(contentsOf: response.data)

                    if currentPage < response.pagination.totalPages {
                        currentPage += 1
                        fetchPage()
                    } else {
                        completion(.success(allData))
                    }
                case .failure(let error):
                    completion(.failure(error))
                }
            }
        }

        fetchPage()
    }
}

Working with Different Data Formats

While JSON is the most common format, APIs may return XML, CSV, or other formats:

// XML Parsing Example
import Foundation

class XMLAPIClient: APIClient {
    func parseXMLResponse(data: Data) -> [String: String] {
        var result: [String: String] = [:]
        let parser = XMLParser(data: data)
        parser.delegate = self as? XMLParserDelegate
        parser.parse()
        return result
    }
}

// CSV Parsing Example
extension APIClient {
    func parseCSVResponse(data: Data) -> [[String]] {
        guard let csvString = String(data: data, encoding: .utf8) else { return [] }

        return csvString.components(separatedBy: .newlines)
            .filter { !$0.isEmpty }
            .map { $0.components(separatedBy: ",") }
    }
}

Caching and Data Persistence

Implement caching to reduce API calls and improve performance:

import Foundation

class CachedAPIClient: APIClient {
    private let cache = NSCache<NSString, NSData>()
    private let cacheTimeout: TimeInterval = 300 // 5 minutes

    override func request<T: Codable>(
        endpoint: String,
        method: HTTPMethod = .GET,
        parameters: [String: Any]? = nil,
        responseType: T.Type,
        completion: @escaping (Result<T, Error>) -> Void
    ) {
        let cacheKey = "\(endpoint)-\(method.rawValue)" as NSString

        // Check cache for GET requests
        if method == .GET, let cachedData = cache.object(forKey: cacheKey) {
            do {
                let decoder = JSONDecoder()
                let result = try decoder.decode(responseType, from: cachedData as Data)
                completion(.success(result))
                return
            } catch {
                // Cache miss or corrupted data, continue with network request
            }
        }

        // Make network request
        super.request(
            endpoint: endpoint,
            method: method,
            parameters: parameters,
            responseType: responseType
        ) { result in
            switch result {
            case .success(let data):
                // Cache successful GET responses
                if method == .GET {
                    do {
                        let encoder = JSONEncoder()
                        let encodedData = try encoder.encode(data)
                        self.cache.setObject(encodedData as NSData, forKey: cacheKey)
                    } catch {
                        // Encoding failed, but don't fail the request
                    }
                }
                completion(.success(data))
            case .failure(let error):
                completion(.failure(error))
            }
        }
    }
}

Best Practices and Security Considerations

When scraping REST APIs with Swift, follow these best practices:

  1. Always validate SSL certificates in production environments
  2. Store API keys securely using Keychain Services
  3. Implement proper error logging for debugging and monitoring
  4. Cache responses when appropriate to reduce API calls
  5. Respect rate limits and implement exponential backoff
  6. Handle different HTTP status codes appropriately
// Secure API key storage
import Security

class KeychainHelper {
    static func store(key: String, data: Data) -> Bool {
        let query: [String: Any] = [
            kSecClass as String: kSecClassGenericPassword,
            kSecAttrAccount as String: key,
            kSecValueData as String: data
        ]

        SecItemDelete(query as CFDictionary)
        return SecItemAdd(query as CFDictionary, nil) == errSecSuccess
    }

    static func load(key: String) -> Data? {
        let query: [String: Any] = [
            kSecClass as String: kSecClassGenericPassword,
            kSecAttrAccount as String: key,
            kSecReturnData as String: true
        ]

        var result: AnyObject?
        SecItemCopyMatching(query as CFDictionary, &result)
        return result as? Data
    }
}

// Usage
let apiKey = "your-api-key"
if let keyData = apiKey.data(using: .utf8) {
    KeychainHelper.store(key: "api_key", data: keyData)
}

Testing and Monitoring

Implement proper testing and monitoring for your API client:

import XCTest

class APIClientTests: XCTestCase {
    var apiClient: APIClient!

    override func setUp() {
        super.setUp()
        apiClient = APIClient(baseURL: "https://httpbin.org")
    }

    func testSuccessfulRequest() {
        let expectation = XCTestExpectation(description: "API request")

        apiClient.request(
            endpoint: "/get",
            responseType: HTTPBinResponse.self
        ) { result in
            switch result {
            case .success(let response):
                XCTAssertNotNil(response.url)
                expectation.fulfill()
            case .failure(let error):
                XCTFail("Request failed: \(error)")
            }
        }

        wait(for: [expectation], timeout: 10.0)
    }
}

struct HTTPBinResponse: Codable {
    let url: String
    let headers: [String: String]
}

Conclusion

Swift provides excellent tools for REST API data scraping through URLSession, third-party libraries like Alamofire, and the Codable protocol. By implementing proper error handling, authentication, rate limiting, and security measures, you can build robust applications that efficiently consume API data. Whether you're building iOS apps that need to fetch user data or macOS applications that process large datasets, Swift's networking capabilities make API integration both powerful and maintainable.

For more complex scenarios involving JavaScript-heavy content or browser automation, consider exploring how to handle authentication in Puppeteer or monitoring network requests in Puppeteer for complementary web scraping approaches.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon