Table of contents

How do I use Swift for scraping data from GraphQL APIs?

GraphQL APIs offer a more flexible alternative to REST APIs, allowing clients to request exactly the data they need. When scraping data from GraphQL endpoints using Swift, you have several approaches ranging from simple HTTP requests to sophisticated GraphQL client libraries. This guide covers the most effective methods for extracting data from GraphQL APIs in Swift applications.

Understanding GraphQL Basics

GraphQL uses a single endpoint and accepts POST requests with JSON payloads containing queries, mutations, or subscriptions. Unlike REST APIs with multiple endpoints, GraphQL APIs typically expose one URL that handles all operations.

Method 1: Using URLSession for Simple GraphQL Requests

For basic GraphQL scraping, you can use Swift's built-in URLSession to send POST requests:

import Foundation

class GraphQLScraper {
    private let endpoint: URL
    private let session: URLSession

    init(endpoint: String) {
        self.endpoint = URL(string: endpoint)!
        self.session = URLSession.shared
    }

    func executeQuery(_ query: String, variables: [String: Any]? = nil) async throws -> Data {
        var request = URLRequest(url: endpoint)
        request.httpMethod = "POST"
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")

        // Prepare GraphQL payload
        var payload: [String: Any] = ["query": query]
        if let variables = variables {
            payload["variables"] = variables
        }

        request.httpBody = try JSONSerialization.data(withJSONObject: payload)

        let (data, response) = try await session.data(for: request)

        guard let httpResponse = response as? HTTPURLResponse,
              200...299 ~= httpResponse.statusCode else {
            throw URLError(.badServerResponse)
        }

        return data
    }
}

// Usage example
let scraper = GraphQLScraper(endpoint: "https://api.example.com/graphql")

let query = """
    query GetUsers($limit: Int!) {
        users(limit: $limit) {
            id
            name
            email
            posts {
                title
                content
            }
        }
    }
"""

let variables = ["limit": 10]

do {
    let data = try await scraper.executeQuery(query, variables: variables)
    let json = try JSONSerialization.jsonObject(with: data)
    print(json)
} catch {
    print("Error: \(error)")
}

Method 2: Creating a Robust GraphQL Client

For more complex scraping operations, create a dedicated GraphQL client with error handling and response parsing:

import Foundation

struct GraphQLResponse<T: Codable> {
    let data: T?
    let errors: [GraphQLError]?
}

struct GraphQLError: Codable {
    let message: String
    let locations: [GraphQLLocation]?
    let path: [String]?
}

struct GraphQLLocation: Codable {
    let line: Int
    let column: Int
}

class AdvancedGraphQLScraper {
    private let endpoint: URL
    private let session: URLSession
    private var headers: [String: String] = [:]

    init(endpoint: String, headers: [String: String] = [:]) {
        self.endpoint = URL(string: endpoint)!

        let config = URLSessionConfiguration.default
        config.timeoutIntervalForRequest = 30
        config.timeoutIntervalForResource = 60

        self.session = URLSession(configuration: config)
        self.headers = headers
    }

    func setAuthenticationToken(_ token: String) {
        headers["Authorization"] = "Bearer \(token)"
    }

    func query<T: Codable>(
        _ query: String,
        variables: [String: Any]? = nil,
        responseType: T.Type
    ) async throws -> GraphQLResponse<T> {

        var request = URLRequest(url: endpoint)
        request.httpMethod = "POST"
        request.setValue("application/json", forHTTPHeaderField: "Content-Type")

        // Add custom headers
        for (key, value) in headers {
            request.setValue(value, forHTTPHeaderField: key)
        }

        // Prepare GraphQL payload
        var payload: [String: Any] = ["query": query]
        if let variables = variables {
            payload["variables"] = variables
        }

        request.httpBody = try JSONSerialization.data(withJSONObject: payload)

        let (data, response) = try await session.data(for: request)

        guard let httpResponse = response as? HTTPURLResponse else {
            throw URLError(.badServerResponse)
        }

        guard 200...299 ~= httpResponse.statusCode else {
            if let errorString = String(data: data, encoding: .utf8) {
                print("Server error: \(errorString)")
            }
            throw URLError(.badServerResponse)
        }

        let decoder = JSONDecoder()
        return try decoder.decode(GraphQLResponse<T>.self, from: data)
    }
}

Method 3: Using Apollo iOS for Type-Safe GraphQL

Apollo iOS provides compile-time type safety and automatic code generation for GraphQL operations:

// First, add Apollo iOS to your project via Swift Package Manager
// https://github.com/apollographql/apollo-ios

import Apollo
import Foundation

class ApolloGraphQLScraper {
    private let apollo: ApolloClient

    init(endpoint: String) {
        let url = URL(string: endpoint)!
        let transport = RequestChainNetworkTransport(
            interceptorProvider: DefaultInterceptorProvider(),
            endpointURL: url
        )

        self.apollo = ApolloClient(networkTransport: transport)
    }

    func fetchData<Query: GraphQLQuery>(
        query: Query
    ) async throws -> GraphQLResult<Query.Data> {

        return try await withCheckedThrowingContinuation { continuation in
            apollo.fetch(query: query) { result in
                continuation.resume(with: result)
            }
        }
    }
}

// Example usage with generated types
// (This assumes you've run Apollo codegen on your GraphQL schema)
/*
let scraper = ApolloGraphQLScraper(endpoint: "https://api.example.com/graphql")

do {
    let result = try await scraper.fetchData(query: GetUsersQuery(limit: 10))
    if let users = result.data?.users {
        for user in users {
            print("User: \(user.name), Email: \(user.email)")
        }
    }

    if let errors = result.errors {
        for error in errors {
            print("GraphQL Error: \(error.localizedDescription)")
        }
    }
} catch {
    print("Network Error: \(error)")
}
*/

Handling Authentication and Headers

Many GraphQL APIs require authentication. Here's how to handle various authentication methods:

extension AdvancedGraphQLScraper {
    func setBasicAuth(username: String, password: String) {
        let credentials = "\(username):\(password)"
        let encodedCredentials = Data(credentials.utf8).base64EncodedString()
        headers["Authorization"] = "Basic \(encodedCredentials)"
    }

    func setCustomHeaders(_ customHeaders: [String: String]) {
        for (key, value) in customHeaders {
            headers[key] = value
        }
    }

    func setUserAgent(_ userAgent: String) {
        headers["User-Agent"] = userAgent
    }
}

// Usage
let scraper = AdvancedGraphQLScraper(endpoint: "https://api.github.com/graphql")
scraper.setAuthenticationToken("your_github_token")
scraper.setUserAgent("MySwiftScraper/1.0")

Implementing Rate Limiting and Retry Logic

When scraping GraphQL APIs, implement proper rate limiting and retry mechanisms:

import Foundation

class RateLimitedGraphQLScraper {
    private let scraper: AdvancedGraphQLScraper
    private let rateLimiter: RateLimiter
    private let retryManager: RetryManager

    init(endpoint: String, requestsPerSecond: Double = 1.0) {
        self.scraper = AdvancedGraphQLScraper(endpoint: endpoint)
        self.rateLimiter = RateLimiter(requestsPerSecond: requestsPerSecond)
        self.retryManager = RetryManager(maxRetries: 3, baseDelay: 1.0)
    }

    func query<T: Codable>(
        _ query: String,
        variables: [String: Any]? = nil,
        responseType: T.Type
    ) async throws -> GraphQLResponse<T> {

        return try await retryManager.executeWithRetry {
            await rateLimiter.waitIfNeeded()
            return try await scraper.query(query, variables: variables, responseType: responseType)
        }
    }
}

class RateLimiter {
    private let interval: TimeInterval
    private var lastRequestTime: Date = .distantPast

    init(requestsPerSecond: Double) {
        self.interval = 1.0 / requestsPerSecond
    }

    func waitIfNeeded() async {
        let now = Date()
        let timeSinceLastRequest = now.timeIntervalSince(lastRequestTime)

        if timeSinceLastRequest < interval {
            let waitTime = interval - timeSinceLastRequest
            try? await Task.sleep(nanoseconds: UInt64(waitTime * 1_000_000_000))
        }

        lastRequestTime = Date()
    }
}

class RetryManager {
    private let maxRetries: Int
    private let baseDelay: TimeInterval

    init(maxRetries: Int, baseDelay: TimeInterval) {
        self.maxRetries = maxRetries
        self.baseDelay = baseDelay
    }

    func executeWithRetry<T>(operation: () async throws -> T) async throws -> T {
        var lastError: Error?

        for attempt in 0...maxRetries {
            do {
                return try await operation()
            } catch {
                lastError = error

                if attempt < maxRetries {
                    let delay = baseDelay * pow(2.0, Double(attempt))
                    try await Task.sleep(nanoseconds: UInt64(delay * 1_000_000_000))
                }
            }
        }

        throw lastError!
    }
}

Parsing and Processing GraphQL Responses

Create models to handle GraphQL responses effectively:

struct User: Codable {
    let id: String
    let name: String
    let email: String
    let posts: [Post]?
}

struct Post: Codable {
    let title: String
    let content: String
    let publishedAt: String?
}

struct UsersResponse: Codable {
    let users: [User]
}

// Usage example
let query = """
    query GetUsers($limit: Int!) {
        users(limit: $limit) {
            id
            name
            email
            posts {
                title
                content
                publishedAt
            }
        }
    }
"""

let scraper = RateLimitedGraphQLScraper(endpoint: "https://api.example.com/graphql")

do {
    let response = try await scraper.query(
        query,
        variables: ["limit": 50],
        responseType: UsersResponse.self
    )

    if let users = response.data?.users {
        for user in users {
            print("Processing user: \(user.name)")
            if let posts = user.posts {
                print("  Found \(posts.count) posts")
            }
        }
    }

    if let errors = response.errors {
        print("GraphQL errors occurred:")
        for error in errors {
            print("  - \(error.message)")
        }
    }
} catch {
    print("Failed to fetch data: \(error)")
}

Best Practices for GraphQL Scraping

1. Optimize Your Queries

Request only the fields you need to minimize bandwidth and improve performance:

// Good - specific fields
let optimizedQuery = """
    query GetUserEmails {
        users {
            id
            email
        }
    }
"""

// Avoid - requesting unnecessary data
let inefficientQuery = """
    query GetAllUserData {
        users {
            id
            name
            email
            bio
            avatar
            posts {
                title
                content
                comments {
                    text
                    author {
                        name
                        email
                    }
                }
            }
        }
    }
"""

2. Handle Pagination Properly

Many GraphQL APIs use cursor-based pagination:

func fetchAllUsers() async throws -> [User] {
    var allUsers: [User] = []
    var cursor: String? = nil

    repeat {
        let query = """
            query GetUsers($cursor: String) {
                users(first: 100, after: $cursor) {
                    edges {
                        node {
                            id
                            name
                            email
                        }
                        cursor
                    }
                    pageInfo {
                        hasNextPage
                        endCursor
                    }
                }
            }
        """

        var variables: [String: Any] = [:]
        if let cursor = cursor {
            variables["cursor"] = cursor
        }

        let response = try await scraper.query(
            query,
            variables: variables,
            responseType: PaginatedUsersResponse.self
        )

        if let userData = response.data?.users {
            let users = userData.edges.map { $0.node }
            allUsers.append(contentsOf: users)

            if userData.pageInfo.hasNextPage {
                cursor = userData.pageInfo.endCursor
            } else {
                break
            }
        }
    } while true

    return allUsers
}

3. Error Handling and Logging

Implement comprehensive error handling:

extension AdvancedGraphQLScraper {
    func queryWithLogging<T: Codable>(
        _ query: String,
        variables: [String: Any]? = nil,
        responseType: T.Type
    ) async throws -> GraphQLResponse<T> {

        print("Executing GraphQL query...")

        do {
            let response = try await self.query(query, variables: variables, responseType: responseType)

            if let errors = response.errors, !errors.isEmpty {
                print("GraphQL errors:")
                for error in errors {
                    print("  - \(error.message)")
                }
            }

            if response.data != nil {
                print("Query executed successfully")
            }

            return response
        } catch {
            print("Query failed with error: \(error)")
            throw error
        }
    }
}

Working with Authentication Tokens

For APIs requiring authentication tokens, implement secure token management:

class SecureGraphQLScraper {
    private let scraper: AdvancedGraphQLScraper
    private let keychain: KeychainService

    init(endpoint: String) {
        self.scraper = AdvancedGraphQLScraper(endpoint: endpoint)
        self.keychain = KeychainService()
    }

    func setSecureToken(_ token: String, for key: String) {
        keychain.store(token, forKey: key)
        scraper.setAuthenticationToken(token)
    }

    func loadSecureToken(for key: String) -> String? {
        return keychain.retrieve(forKey: key)
    }
}

// Simple keychain wrapper
class KeychainService {
    func store(_ value: String, forKey key: String) {
        let data = value.data(using: .utf8)!
        let query: [String: Any] = [
            kSecClass as String: kSecClassGenericPassword,
            kSecAttrAccount as String: key,
            kSecValueData as String: data
        ]

        SecItemDelete(query as CFDictionary)
        SecItemAdd(query as CFDictionary, nil)
    }

    func retrieve(forKey key: String) -> String? {
        let query: [String: Any] = [
            kSecClass as String: kSecClassGenericPassword,
            kSecAttrAccount as String: key,
            kSecReturnData as String: true,
            kSecMatchLimit as String: kSecMatchLimitOne
        ]

        var result: AnyObject?
        let status = SecItemCopyMatching(query as CFDictionary, &result)

        guard status == errSecSuccess,
              let data = result as? Data else { return nil }

        return String(data: data, encoding: .utf8)
    }
}

Handling Complex GraphQL Schemas

For APIs with complex schemas, use introspection to understand available fields:

func introspectSchema() async throws {
    let introspectionQuery = """
        query IntrospectionQuery {
            __schema {
                types {
                    name
                    fields {
                        name
                        type {
                            name
                        }
                    }
                }
            }
        }
    """

    let response = try await scraper.executeQuery(introspectionQuery)
    let json = try JSONSerialization.jsonObject(with: response)
    print("Schema introspection result: \(json)")
}

Performance Optimization

Optimize your GraphQL scraping for better performance:

class OptimizedGraphQLScraper {
    private let scraper: AdvancedGraphQLScraper
    private let cache = NSCache<NSString, NSData>()

    init(endpoint: String) {
        self.scraper = AdvancedGraphQLScraper(endpoint: endpoint)
        cache.countLimit = 100
    }

    func cachedQuery<T: Codable>(
        _ query: String,
        variables: [String: Any]? = nil,
        responseType: T.Type,
        cacheKey: String? = nil
    ) async throws -> GraphQLResponse<T> {

        let key = cacheKey ?? "\(query.hashValue)"

        // Check cache first
        if let cachedData = cache.object(forKey: NSString(string: key)) {
            let decoder = JSONDecoder()
            return try decoder.decode(GraphQLResponse<T>.self, from: cachedData as Data)
        }

        // Execute query
        let response = try await scraper.query(query, variables: variables, responseType: responseType)

        // Cache successful responses
        if response.data != nil {
            let encoder = JSONEncoder()
            if let encodedData = try? encoder.encode(response) {
                cache.setObject(NSData(data: encodedData), forKey: NSString(string: key))
            }
        }

        return response
    }
}

Conclusion

Swift provides excellent capabilities for scraping GraphQL APIs, from simple URLSession-based approaches to sophisticated Apollo client implementations. Choose the method that best fits your project's complexity and requirements. For simple one-off scraping tasks, URLSession might suffice, while complex applications benefit from type-safe Apollo implementations.

When working with GraphQL APIs, remember to respect rate limits, handle errors gracefully, and optimize your queries for the specific data you need. Similar to how you might handle dynamic content that loads after page navigation in browser automation tools, GraphQL scraping requires careful attention to data loading patterns and proper error handling.

For production applications, consider implementing monitoring, caching mechanisms, and proper authentication handling to ensure reliable and efficient data extraction from GraphQL endpoints. Additionally, when dealing with authentication challenges in web scraping, apply similar security principles to protect your GraphQL API credentials and maintain secure communication channels.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon