How to Handle Pagination When Scraping Multiple Pages with Alamofire
Pagination is a common challenge when scraping data from multiple pages using Alamofire in iOS applications. Whether you're dealing with REST APIs that return paginated results or scraping HTML pages with numbered pagination, you need efficient strategies to traverse all pages while maintaining good performance and avoiding rate limits.
Understanding Pagination Patterns
Before implementing pagination handling, it's important to understand the different pagination patterns you might encounter:
URL-Based Pagination
This is the most common pattern where page numbers or offsets are included in the URL:
https://api.example.com/data?page=1
https://api.example.com/data?page=2&limit=50
https://api.example.com/data?offset=0&limit=20
Token-Based Pagination
APIs often use pagination tokens or cursors:
{
"data": [...],
"next_page_token": "eyJwYWdlIjo....",
"has_more": true
}
Link Header Pagination
Some APIs provide pagination links in HTTP headers:
Link: <https://api.example.com/data?page=2>; rel="next",
<https://api.example.com/data?page=5>; rel="last"
Sequential Pagination Implementation
The most straightforward approach is to scrape pages sequentially, one after another:
Basic Sequential Pagination
import Alamofire
import Foundation
class PaginationScraper {
private let baseURL = "https://api.example.com/data"
private var allData: [DataModel] = []
func scrapeAllPages(completion: @escaping ([DataModel]) -> Void) {
scrapePage(pageNumber: 1, completion: completion)
}
private func scrapePage(pageNumber: Int, completion: @escaping ([DataModel]) -> Void) {
let url = "\(baseURL)?page=\(pageNumber)&limit=50"
AF.request(url, method: .get, headers: getHeaders())
.validate()
.responseDecodable(of: PaginatedResponse.self) { response in
switch response.result {
case .success(let paginatedData):
self.allData.append(contentsOf: paginatedData.data)
// Check if there are more pages
if paginatedData.hasMore && pageNumber < paginatedData.totalPages {
// Add delay to respect rate limits
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
self.scrapePage(pageNumber: pageNumber + 1, completion: completion)
}
} else {
// All pages scraped
completion(self.allData)
}
case .failure(let error):
print("Error scraping page \(pageNumber): \(error)")
// Implement retry logic or complete with current data
completion(self.allData)
}
}
}
private func getHeaders() -> HTTPHeaders {
return [
"User-Agent": "iOS-App/1.0",
"Accept": "application/json"
]
}
}
// Data models
struct PaginatedResponse: Codable {
let data: [DataModel]
let hasMore: Bool
let totalPages: Int
let currentPage: Int
}
struct DataModel: Codable {
let id: String
let title: String
let content: String
}
Token-Based Sequential Pagination
For APIs using pagination tokens:
class TokenPaginationScraper {
private let baseURL = "https://api.example.com/data"
private var allData: [DataModel] = []
func scrapeAllPages(completion: @escaping ([DataModel]) -> Void) {
scrapePage(token: nil, completion: completion)
}
private func scrapePage(token: String?, completion: @escaping ([DataModel]) -> Void) {
var parameters: [String: Any] = ["limit": 50]
if let token = token {
parameters["page_token"] = token
}
AF.request(baseURL, method: .get, parameters: parameters, headers: getHeaders())
.validate()
.responseDecodable(of: TokenPaginatedResponse.self) { response in
switch response.result {
case .success(let paginatedData):
self.allData.append(contentsOf: paginatedData.data)
if let nextToken = paginatedData.nextPageToken {
// Add delay to respect rate limits
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
self.scrapePage(token: nextToken, completion: completion)
}
} else {
completion(self.allData)
}
case .failure(let error):
print("Error scraping page: \(error)")
completion(self.allData)
}
}
}
}
struct TokenPaginatedResponse: Codable {
let data: [DataModel]
let nextPageToken: String?
}
Concurrent Pagination Implementation
For better performance, you can scrape multiple pages concurrently, but this requires knowing the total number of pages upfront:
class ConcurrentPaginationScraper {
private let baseURL = "https://api.example.com/data"
private let maxConcurrentRequests = 5
func scrapeAllPagesConcurrently(completion: @escaping ([DataModel]) -> Void) {
// First, get the total number of pages
getTotalPages { totalPages in
guard totalPages > 0 else {
completion([])
return
}
self.scrapePagesConcurrently(totalPages: totalPages, completion: completion)
}
}
private func getTotalPages(completion: @escaping (Int) -> Void) {
let url = "\(baseURL)?page=1&limit=1"
AF.request(url, method: .get, headers: getHeaders())
.validate()
.responseDecodable(of: PaginatedResponse.self) { response in
switch response.result {
case .success(let data):
completion(data.totalPages)
case .failure:
completion(0)
}
}
}
private func scrapePagesConcurrently(totalPages: Int, completion: @escaping ([DataModel]) -> Void) {
let dispatchGroup = DispatchGroup()
let semaphore = DispatchSemaphore(value: maxConcurrentRequests)
var allData: [Int: [DataModel]] = [:]
let dataQueue = DispatchQueue(label: "data.queue", attributes: .concurrent)
for pageNumber in 1...totalPages {
dispatchGroup.enter()
semaphore.wait()
DispatchQueue.global().async {
self.scrapeSinglePage(pageNumber: pageNumber) { data in
defer {
semaphore.signal()
dispatchGroup.leave()
}
dataQueue.async(flags: .barrier) {
allData[pageNumber] = data
}
}
}
}
dispatchGroup.notify(queue: .main) {
// Combine data in correct page order
let sortedData = (1...totalPages).compactMap { allData[$0] }.flatMap { $0 }
completion(sortedData)
}
}
private func scrapeSinglePage(pageNumber: Int, completion: @escaping ([DataModel]) -> Void) {
let url = "\(baseURL)?page=\(pageNumber)&limit=50"
AF.request(url, method: .get, headers: getHeaders())
.validate()
.responseDecodable(of: PaginatedResponse.self) { response in
switch response.result {
case .success(let paginatedData):
completion(paginatedData.data)
case .failure(let error):
print("Error scraping page \(pageNumber): \(error)")
completion([])
}
}
}
}
Advanced Pagination Strategies
Implementing Retry Logic
Add resilience to your pagination scraper with retry logic:
extension PaginationScraper {
private func scrapePageWithRetry(pageNumber: Int, maxRetries: Int = 3, completion: @escaping ([DataModel]) -> Void) {
scrapePageWithRetryHelper(pageNumber: pageNumber, currentAttempt: 1, maxRetries: maxRetries, completion: completion)
}
private func scrapePageWithRetryHelper(pageNumber: Int, currentAttempt: Int, maxRetries: Int, completion: @escaping ([DataModel]) -> Void) {
let url = "\(baseURL)?page=\(pageNumber)&limit=50"
AF.request(url, method: .get, headers: getHeaders())
.validate()
.responseDecodable(of: PaginatedResponse.self) { response in
switch response.result {
case .success(let paginatedData):
self.allData.append(contentsOf: paginatedData.data)
if paginatedData.hasMore && pageNumber < paginatedData.totalPages {
let delay = self.calculateDelay(attempt: currentAttempt)
DispatchQueue.main.asyncAfter(deadline: .now() + delay) {
self.scrapePageWithRetry(pageNumber: pageNumber + 1, completion: completion)
}
} else {
completion(self.allData)
}
case .failure(let error):
if currentAttempt < maxRetries {
let backoffDelay = self.calculateBackoffDelay(attempt: currentAttempt)
print("Retrying page \(pageNumber) in \(backoffDelay) seconds (attempt \(currentAttempt + 1)/\(maxRetries))")
DispatchQueue.main.asyncAfter(deadline: .now() + backoffDelay) {
self.scrapePageWithRetryHelper(
pageNumber: pageNumber,
currentAttempt: currentAttempt + 1,
maxRetries: maxRetries,
completion: completion
)
}
} else {
print("Failed to scrape page \(pageNumber) after \(maxRetries) attempts: \(error)")
completion(self.allData)
}
}
}
}
private func calculateDelay(attempt: Int) -> TimeInterval {
// Rate limiting delay
return 0.5
}
private func calculateBackoffDelay(attempt: Int) -> TimeInterval {
// Exponential backoff: 1s, 2s, 4s, 8s...
return pow(2.0, Double(attempt))
}
}
Progress Tracking
Implement progress tracking for long-running pagination tasks:
protocol PaginationProgressDelegate: AnyObject {
func paginationProgress(_ progress: Float, currentPage: Int, totalPages: Int)
func paginationCompleted(with data: [DataModel])
func paginationFailed(with error: Error)
}
class ProgressTrackingPaginationScraper {
weak var delegate: PaginationProgressDelegate?
private var totalPages: Int = 0
private var completedPages: Int = 0
private var allData: [DataModel] = []
func scrapeWithProgress() {
getTotalPages { [weak self] totalPages in
self?.totalPages = totalPages
self?.scrapePageWithProgress(pageNumber: 1)
}
}
private func scrapePageWithProgress(pageNumber: Int) {
let url = "\(baseURL)?page=\(pageNumber)&limit=50"
AF.request(url, method: .get, headers: getHeaders())
.validate()
.responseDecodable(of: PaginatedResponse.self) { [weak self] response in
guard let self = self else { return }
switch response.result {
case .success(let paginatedData):
self.allData.append(contentsOf: paginatedData.data)
self.completedPages += 1
let progress = Float(self.completedPages) / Float(self.totalPages)
self.delegate?.paginationProgress(progress, currentPage: pageNumber, totalPages: self.totalPages)
if paginatedData.hasMore && pageNumber < paginatedData.totalPages {
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
self.scrapePageWithProgress(pageNumber: pageNumber + 1)
}
} else {
self.delegate?.paginationCompleted(with: self.allData)
}
case .failure(let error):
self.delegate?.paginationFailed(with: error)
}
}
}
}
Best Practices for Pagination Scraping
1. Respect Rate Limits
Always implement delays between requests to avoid overwhelming the server:
private let rateLimiter = RateLimiter(requestsPerSecond: 2)
// Usage in your pagination method
rateLimiter.executeWithDelay {
// Your Alamofire request here
}
class RateLimiter {
private let semaphore: DispatchSemaphore
private let delayBetweenRequests: TimeInterval
init(requestsPerSecond: Double) {
self.semaphore = DispatchSemaphore(value: 1)
self.delayBetweenRequests = 1.0 / requestsPerSecond
}
func executeWithDelay(block: @escaping () -> Void) {
DispatchQueue.global().async {
self.semaphore.wait()
DispatchQueue.main.async {
block()
}
DispatchQueue.global().asyncAfter(deadline: .now() + self.delayBetweenRequests) {
self.semaphore.signal()
}
}
}
}
2. Handle Memory Management
For large datasets, consider processing data in chunks rather than keeping everything in memory:
class StreamingPaginationScraper {
typealias DataProcessor = ([DataModel]) -> Void
func scrapeAndProcess(processor: @escaping DataProcessor) {
scrapePageAndProcess(pageNumber: 1, processor: processor)
}
private func scrapePageAndProcess(pageNumber: Int, processor: @escaping DataProcessor) {
// Scrape page and immediately process data
// Don't accumulate in memory
scrapePage(pageNumber: pageNumber) { data in
processor(data)
// Continue to next page
if self.hasMorePages(pageNumber) {
DispatchQueue.main.asyncAfter(deadline: .now() + 0.5) {
self.scrapePageAndProcess(pageNumber: pageNumber + 1, processor: processor)
}
}
}
}
}
3. Error Recovery
Implement comprehensive error handling for different failure scenarios:
enum PaginationError: Error {
case networkError(Error)
case parseError(Error)
case rateLimitExceeded
case serverError(Int)
case invalidPageNumber
var localizedDescription: String {
switch self {
case .networkError(let error):
return "Network error: \(error.localizedDescription)"
case .parseError(let error):
return "Parse error: \(error.localizedDescription)"
case .rateLimitExceeded:
return "Rate limit exceeded. Please wait and try again."
case .serverError(let code):
return "Server error with status code: \(code)"
case .invalidPageNumber:
return "Invalid page number requested"
}
}
}
Integration with Other Tools
When building complex scraping workflows, you might want to combine Alamofire's pagination capabilities with other tools. For example, if you're also working with web automation tools like Puppeteer, you can learn about how to navigate to different pages using Puppeteer for scenarios where you need to handle JavaScript-heavy pagination.
Additionally, for applications that need to handle concurrent operations across multiple sessions, understanding how to run multiple pages in parallel with Puppeteer can provide insights into efficient parallel processing strategies.
Conclusion
Handling pagination with Alamofire requires careful consideration of the pagination pattern, rate limiting, error handling, and memory management. Whether you choose sequential or concurrent approaches depends on your specific requirements, server limitations, and the total amount of data you need to scrape.
The key to successful pagination handling is implementing robust error recovery, respecting rate limits, and providing progress feedback to users. By following the patterns and best practices outlined in this guide, you can build reliable and efficient pagination scrapers for your iOS applications.
Remember to always test your pagination logic thoroughly with different scenarios, including network failures, empty pages, and varying response times to ensure your implementation is robust and user-friendly.