How do I handle CSRF tokens and anti-scraping measures with Alamofire?

Cross-Site Request Forgery (CSRF) tokens and anti-scraping measures are common security mechanisms that websites implement to protect against automated requests and malicious attacks. When working with Alamofire for web scraping or API interactions, you'll frequently encounter these challenges. This guide provides comprehensive strategies for handling CSRF tokens and navigating anti-scraping measures effectively.

Understanding CSRF Tokens

CSRF tokens are security tokens that prevent cross-site request forgery attacks. They're typically embedded in forms or returned in API responses and must be included in subsequent requests to validate authenticity. Websites generate unique tokens for each session or request, making automated scraping more challenging.

Basic CSRF Token Extraction and Usage

The most common approach involves extracting CSRF tokens from initial page loads or API responses and including them in subsequent requests.

Extracting CSRF Tokens from HTML

import Alamofire
import SwiftSoup

class CSRFHandler {
    private var csrfToken: String?
    private var sessionCookies: HTTPCookieStorage = HTTPCookieStorage.shared

    func extractCSRFToken(from html: String) -> String? {
        do {
            let doc = try SwiftSoup.parse(html)
            // Common CSRF token selectors
            if let token = try doc.select("meta[name=csrf-token]").first()?.attr("content") {
                return token
            }
            if let token = try doc.select("input[name=_token]").first()?.attr("value") {
                return token
            }
            if let token = try doc.select("input[name=csrfmiddlewaretoken]").first()?.attr("value") {
                return token
            }
        } catch {
            print("Error parsing HTML: \(error)")
        }
        return nil
    }

    func fetchInitialPage(completion: @escaping (String?) -> Void) {
        let request = AF.request("https://example.com/login")
            .validate()
            .responseString { response in
                switch response.result {
                case .success(let html):
                    self.csrfToken = self.extractCSRFToken(from: html)
                    completion(self.csrfToken)
                case .failure(let error):
                    print("Failed to fetch initial page: \(error)")
                    completion(nil)
                }
            }
    }
}

Making Authenticated Requests with CSRF Tokens

func submitFormWithCSRF(username: String, password: String) {
    guard let token = csrfToken else {
        print("No CSRF token available")
        return
    }

    let parameters: [String: Any] = [
        "username": username,
        "password": password,
        "_token": token, // or "csrfmiddlewaretoken" depending on the framework
        "_method": "POST"
    ]

    let headers: HTTPHeaders = [
        "Content-Type": "application/x-www-form-urlencoded",
        "X-CSRF-TOKEN": token, // Some frameworks expect it in headers
        "X-Requested-With": "XMLHttpRequest",
        "Referer": "https://example.com/login"
    ]

    AF.request("https://example.com/login",
               method: .post,
               parameters: parameters,
               encoding: URLEncoding.default,
               headers: headers)
        .validate()
        .responseJSON { response in
            switch response.result {
            case .success(let data):
                print("Login successful: \(data)")
            case .failure(let error):
                print("Login failed: \(error)")
            }
        }
}

Advanced Anti-Scraping Countermeasures

Modern websites employ sophisticated anti-scraping measures beyond simple CSRF protection. Here's how to handle them with Alamofire.

Session Management and Cookie Handling

class AdvancedSession {
    private let session: Session
    private var csrfToken: String?

    init() {
        let configuration = URLSessionConfiguration.default
        configuration.httpCookieAcceptPolicy = .always
        configuration.httpShouldSetCookies = true
        configuration.httpCookieStorage = HTTPCookieStorage.shared

        // Custom interceptor for automatic CSRF token handling
        let interceptor = CSRFInterceptor()

        self.session = Session(
            configuration: configuration,
            interceptor: interceptor
        )
    }

    func makeRequest(url: String, parameters: [String: Any]? = nil) {
        session.request(url, 
                       method: .get, 
                       parameters: parameters)
            .validate()
            .responseData { response in
                // Handle response
            }
    }
}

class CSRFInterceptor: RequestInterceptor {
    func adapt(_ urlRequest: URLRequest, for session: Session, completion: @escaping (Result<URLRequest, Error>) -> Void) {
        var adaptedRequest = urlRequest

        // Add common headers to avoid detection
        adaptedRequest.addValue("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36", forHTTPHeaderField: "User-Agent")
        adaptedRequest.addValue("text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", forHTTPHeaderField: "Accept")
        adaptedRequest.addValue("gzip, deflate, br", forHTTPHeaderField: "Accept-Encoding")
        adaptedRequest.addValue("en-US,en;q=0.5", forHTTPHeaderField: "Accept-Language")
        adaptedRequest.addValue("keep-alive", forHTTPHeaderField: "Connection")
        adaptedRequest.addValue("1", forHTTPHeaderField: "DNT")

        completion(.success(adaptedRequest))
    }

    func retry(_ request: Request, for session: Session, dueTo error: Error, completion: @escaping (RetryResult) -> Void) {
        // Implement retry logic for failed requests
        if request.retryCount < 3 {
            completion(.retryWithDelay(TimeInterval.random(in: 1...3)))
        } else {
            completion(.doNotRetry)
        }
    }
}

Rate Limiting and Request Throttling

class ThrottledSession {
    private let session: Session
    private let requestQueue = DispatchQueue(label: "request.queue", qos: .utility)
    private var lastRequestTime: Date = Date()
    private let minRequestInterval: TimeInterval = 2.0 // 2 seconds between requests

    init() {
        let configuration = URLSessionConfiguration.default
        configuration.timeoutIntervalForRequest = 30
        configuration.timeoutIntervalForResource = 60

        self.session = Session(configuration: configuration)
    }

    func throttledRequest(url: String, completion: @escaping (AFDataResponse<Data>) -> Void) {
        requestQueue.async {
            let timeSinceLastRequest = Date().timeIntervalSince(self.lastRequestTime)

            if timeSinceLastRequest < self.minRequestInterval {
                let delay = self.minRequestInterval - timeSinceLastRequest
                Thread.sleep(forTimeInterval: delay)
            }

            self.lastRequestTime = Date()

            DispatchQueue.main.async {
                self.session.request(url)
                    .validate()
                    .responseData { response in
                        completion(response)
                    }
            }
        }
    }
}

Handling Dynamic CSRF Tokens

Some applications refresh CSRF tokens frequently or generate them dynamically through JavaScript. Here's how to handle these scenarios:

JavaScript-Generated Tokens

class DynamicCSRFHandler {
    private var webView: WKWebView?

    func extractDynamicCSRF(from url: String, completion: @escaping (String?) -> Void) {
        webView = WKWebView()
        webView?.load(URLRequest(url: URL(string: url)!))

        // Wait for page to load and execute JavaScript
        DispatchQueue.main.asyncAfter(deadline: .now() + 3.0) {
            let script = """
                (function() {
                    var token = document.querySelector('meta[name="csrf-token"]');
                    if (token) return token.getAttribute('content');

                    var input = document.querySelector('input[name="_token"]');
                    if (input) return input.value;

                    // Check if token is in JavaScript variables
                    if (window.csrfToken) return window.csrfToken;
                    if (window._token) return window._token;

                    return null;
                })();
            """

            self.webView?.evaluateJavaScript(script) { result, error in
                if let token = result as? String {
                    completion(token)
                } else {
                    completion(nil)
                }
            }
        }
    }
}

Bypassing Common Anti-Scraping Measures

User Agent Rotation

class UserAgentRotator {
    private let userAgents = [
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Safari/605.1.15"
    ]

    func randomUserAgent() -> String {
        return userAgents.randomElement() ?? userAgents[0]
    }
}

let rotator = UserAgentRotator()

AF.request("https://example.com")
    .validate()
    .headers(HTTPHeaders([
        "User-Agent": rotator.randomUserAgent()
    ]))
    .responseData { response in
        // Handle response
    }

Proxy Support and IP Rotation

class ProxySession {
    private let session: Session

    init(proxyHost: String, proxyPort: Int, username: String? = nil, password: String? = nil) {
        let configuration = URLSessionConfiguration.default

        var proxyDict: [String: Any] = [
            kCFNetworkProxiesHTTPEnable as String: true,
            kCFNetworkProxiesHTTPProxy as String: proxyHost,
            kCFNetworkProxiesHTTPPort as String: proxyPort,
            kCFNetworkProxiesHTTPSEnable as String: true,
            kCFNetworkProxiesHTTPSProxy as String: proxyHost,
            kCFNetworkProxiesHTTPSPort as String: proxyPort
        ]

        if let username = username, let password = password {
            proxyDict[kCFNetworkProxiesHTTPProxyUsername as String] = username
            proxyDict[kCFNetworkProxiesHTTPProxyPassword as String] = password
        }

        configuration.connectionProxyDictionary = proxyDict

        self.session = Session(configuration: configuration)
    }

    func makeRequest(url: String) {
        session.request(url)
            .validate()
            .responseData { response in
                // Handle response
            }
    }
}

Handling Captcha Challenges

When encountering captcha challenges, you have several options:

Captcha Detection and Handling

class CaptchaHandler {
    func detectCaptcha(in response: String) -> Bool {
        let captchaIndicators = [
            "recaptcha",
            "captcha",
            "hcaptcha",
            "cloudflare",
            "challenge-form"
        ]

        let lowercaseResponse = response.lowercased()
        return captchaIndicators.contains { lowercaseResponse.contains($0) }
    }

    func handleCaptchaResponse(response: AFDataResponse<String>) {
        switch response.result {
        case .success(let html):
            if detectCaptcha(in: html) {
                print("Captcha detected. Consider:")
                print("1. Using a captcha solving service")
                print("2. Implementing manual intervention")
                print("3. Using browser automation tools for complex cases")

                // Handle captcha - could integrate with services like 2captcha
                handleCaptchaChallenge(html: html)
            } else {
                // Process normal response
                processNormalResponse(html: html)
            }
        case .failure(let error):
            print("Request failed: \(error)")
        }
    }

    private func handleCaptchaChallenge(html: String) {
        // Implement captcha solving logic
        // This might involve:
        // 1. Extracting captcha site key
        // 2. Sending to captcha solving service
        // 3. Waiting for solution
        // 4. Submitting solution
    }

    private func processNormalResponse(html: String) {
        // Process the successful response
    }
}

Complete Example: Robust Web Scraping with CSRF Protection

import Alamofire
import SwiftSoup

class RobustWebScraper {
    private let session: Session
    private var csrfToken: String?
    private let userAgentRotator = UserAgentRotator()
    private let captchaHandler = CaptchaHandler()

    init() {
        let configuration = URLSessionConfiguration.default
        configuration.httpCookieAcceptPolicy = .always
        configuration.httpShouldSetCookies = true
        configuration.timeoutIntervalForRequest = 30

        let interceptor = AntiDetectionInterceptor(userAgentRotator: userAgentRotator)
        self.session = Session(configuration: configuration, interceptor: interceptor)
    }

    func scrapeProtectedContent(url: String, completion: @escaping (Result<String, Error>) -> Void) {
        // Step 1: Get initial page and extract CSRF token
        fetchInitialPage(url: url) { [weak self] result in
            switch result {
            case .success(let token):
                self?.csrfToken = token
                // Step 2: Make authenticated request
                self?.makeAuthenticatedRequest(url: url, completion: completion)
            case .failure(let error):
                completion(.failure(error))
            }
        }
    }

    private func fetchInitialPage(url: String, completion: @escaping (Result<String?, Error>) -> Void) {
        session.request(url)
            .validate()
            .responseString { [weak self] response in
                switch response.result {
                case .success(let html):
                    // Check for captcha
                    if self?.captchaHandler.detectCaptcha(in: html) == true {
                        // Handle captcha scenario
                        completion(.failure(ScrapingError.captchaDetected))
                        return
                    }

                    // Extract CSRF token
                    let token = self?.extractCSRFToken(from: html)
                    completion(.success(token))

                case .failure(let error):
                    completion(.failure(error))
                }
            }
    }

    private func makeAuthenticatedRequest(url: String, completion: @escaping (Result<String, Error>) -> Void) {
        var headers: HTTPHeaders = [
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.5",
            "Accept-Encoding": "gzip, deflate",
            "Connection": "keep-alive",
            "Upgrade-Insecure-Requests": "1"
        ]

        if let token = csrfToken {
            headers["X-CSRF-Token"] = token
        }

        session.request(url, headers: headers)
            .validate()
            .responseString { response in
                completion(response.result)
            }
    }

    private func extractCSRFToken(from html: String) -> String? {
        do {
            let doc = try SwiftSoup.parse(html)

            // Try multiple common selectors
            let selectors = [
                "meta[name=csrf-token]",
                "meta[name=_token]",
                "input[name=_token]",
                "input[name=csrfmiddlewaretoken]"
            ]

            for selector in selectors {
                if let element = try doc.select(selector).first() {
                    let token = selector.contains("meta") 
                        ? try element.attr("content") 
                        : try element.attr("value")

                    if !token.isEmpty {
                        return token
                    }
                }
            }
        } catch {
            print("Error parsing HTML for CSRF token: \(error)")
        }

        return nil
    }
}

class AntiDetectionInterceptor: RequestInterceptor {
    private let userAgentRotator: UserAgentRotator

    init(userAgentRotator: UserAgentRotator) {
        self.userAgentRotator = userAgentRotator
    }

    func adapt(_ urlRequest: URLRequest, for session: Session, completion: @escaping (Result<URLRequest, Error>) -> Void) {
        var request = urlRequest

        // Rotate user agent
        request.setValue(userAgentRotator.randomUserAgent(), forHTTPHeaderField: "User-Agent")

        // Add realistic headers
        request.setValue("same-origin", forHTTPHeaderField: "Sec-Fetch-Site")
        request.setValue("navigate", forHTTPHeaderField: "Sec-Fetch-Mode")
        request.setValue("document", forHTTPHeaderField: "Sec-Fetch-Dest")

        completion(.success(request))
    }
}

enum ScrapingError: Error {
    case captchaDetected
    case csrfTokenNotFound
    case rateLimited
}

Best Practices and Ethical Considerations

When implementing CSRF token handling and anti-scraping countermeasures, consider these best practices:

Performance Optimization

Cache CSRF tokens: Store tokens for reuse across multiple requests within the same session
Implement request pooling: Reuse connections to minimize overhead
Use appropriate delays: Respect rate limits and implement exponential backoff

Security Considerations

Respect robots.txt: Always check and follow website guidelines
Implement proper error handling: Gracefully handle failures and edge cases
Use HTTPS: Ensure all communications are encrypted
Handle sensitive data properly: Never log or expose authentication tokens

Monitoring and Maintenance

class ScrapingMonitor {
    private var successCount: Int = 0
    private var errorCount: Int = 0
    private var captchaCount: Int = 0

    func logSuccess() {
        successCount += 1
        printStats()
    }

    func logError(_ error: Error) {
        errorCount += 1
        print("Error: \(error)")
        printStats()
    }

    func logCaptcha() {
        captchaCount += 1
        printStats()
    }

    private func printStats() {
        let total = successCount + errorCount + captchaCount
        print("Stats - Success: \(successCount), Errors: \(errorCount), Captchas: \(captchaCount), Total: \(total)")

        if total > 0 {
            let successRate = Double(successCount) / Double(total) * 100
            print("Success Rate: \(String(format: "%.1f", successRate))%")
        }
    }
}

Advanced JavaScript Detection Techniques

Some websites use sophisticated JavaScript-based detection mechanisms. Here's how to handle them:

Handling JavaScript Fingerprinting

class JavaScriptHandler {
    func executeJavaScriptWithDelay(url: String, completion: @escaping (String?) -> Void) {
        let webView = WKWebView()
        webView.load(URLRequest(url: URL(string: url)!))

        // Wait for JavaScript execution and dynamic content loading
        DispatchQueue.main.asyncAfter(deadline: .now() + 5.0) {
            webView.evaluateJavaScript("document.documentElement.outerHTML") { result, error in
                if let html = result as? String {
                    completion(html)
                } else {
                    completion(nil)
                }
            }
        }
    }
}

Handling Single Page Applications (SPAs)

For complex SPAs that heavily rely on JavaScript, consider integrating with browser automation tools that handle dynamic content loading more effectively than pure HTTP requests.

Conclusion

Handling CSRF tokens and anti-scraping measures with Alamofire requires a multi-faceted approach combining proper session management, token extraction, request throttling, and robust error handling. The key is to mimic legitimate browser behavior while respecting website terms of service and implementing proper fallback mechanisms.

For more complex scenarios involving JavaScript-heavy sites, consider integrating with browser automation solutions that can handle dynamic content loading and complex authentication flows more effectively.

Remember to always test your implementations thoroughly, monitor success rates, and be prepared to adapt to changing anti-scraping measures as websites evolve their protection mechanisms.

Table of contents