How can I implement retry logic with HTTParty for failed requests?
Implementing retry logic is crucial for building resilient web scraping applications that can handle temporary network failures, server errors, and rate limiting. HTTParty doesn't include built-in retry functionality, but you can implement robust retry mechanisms using Ruby's standard libraries and custom wrapper methods.
Basic Retry Implementation
Simple Retry with Ruby's Retry Gem
The most straightforward approach is using the retries
gem, which provides clean retry syntax:
require 'httparty'
require 'retries'
class ScrapingClient
include HTTParty
base_uri 'https://api.example.com'
def self.fetch_with_retry(path, options = {})
with_retries(max_tries: 3, base_sleep_seconds: 1, max_sleep_seconds: 5) do
response = get(path, options)
raise "HTTP Error: #{response.code}" unless response.success?
response
end
end
end
# Usage
begin
response = ScrapingClient.fetch_with_retry('/data')
puts response.body
rescue => e
puts "Failed after retries: #{e.message}"
end
Manual Retry Implementation
For more control, implement retry logic manually:
require 'httparty'
class HTTPRetryClient
include HTTParty
MAX_RETRIES = 3
RETRY_DELAY = 1
BACKOFF_MULTIPLIER = 2
def self.get_with_retry(url, options = {})
retries = 0
begin
response = get(url, options)
# Check if we should retry based on status code
if should_retry?(response.code, retries)
raise "Retryable error: HTTP #{response.code}"
end
return response
rescue => e
retries += 1
if retries <= MAX_RETRIES && retryable_error?(e)
delay = calculate_delay(retries)
puts "Attempt #{retries} failed: #{e.message}. Retrying in #{delay}s..."
sleep(delay)
retry
else
raise e
end
end
end
private
def self.should_retry?(status_code, current_retries)
return false if current_retries >= MAX_RETRIES
# Retry on server errors and rate limiting
[429, 500, 502, 503, 504].include?(status_code)
end
def self.retryable_error?(error)
# Define which errors are worth retrying
error.is_a?(Net::TimeoutError) ||
error.is_a?(Net::OpenTimeout) ||
error.is_a?(Net::ReadTimeout) ||
error.is_a?(Errno::ECONNREFUSED) ||
error.is_a?(Errno::ECONNRESET) ||
error.message.include?("Retryable error")
end
def self.calculate_delay(attempt)
# Exponential backoff with jitter
base_delay = RETRY_DELAY * (BACKOFF_MULTIPLIER ** (attempt - 1))
jitter = rand(0.1..0.3) * base_delay
[base_delay + jitter, 30].min # Cap at 30 seconds
end
end
Advanced Retry Strategies
Exponential Backoff with Jitter
Exponential backoff prevents thundering herd problems when multiple clients retry simultaneously:
class AdvancedRetryClient
include HTTParty
def self.robust_request(method, url, options = {})
max_retries = options.delete(:max_retries) || 3
base_delay = options.delete(:base_delay) || 1
max_delay = options.delete(:max_delay) || 30
(0..max_retries).each do |attempt|
begin
response = send(method, url, options)
# Success conditions
return response if response.success?
# Don't retry client errors (4xx except 429)
unless retryable_status?(response.code)
raise "Non-retryable error: HTTP #{response.code}"
end
# Last attempt failed
if attempt == max_retries
raise "Max retries exceeded: HTTP #{response.code}"
end
delay = exponential_backoff_with_jitter(attempt, base_delay, max_delay)
puts "HTTP #{response.code} - Retrying in #{delay.round(2)}s (attempt #{attempt + 1}/#{max_retries + 1})"
sleep(delay)
rescue Net::TimeoutError, Errno::ECONNREFUSED, Errno::ECONNRESET => e
if attempt == max_retries
raise "Network error after #{max_retries + 1} attempts: #{e.message}"
end
delay = exponential_backoff_with_jitter(attempt, base_delay, max_delay)
puts "Network error - Retrying in #{delay.round(2)}s: #{e.message}"
sleep(delay)
end
end
end
private
def self.retryable_status?(code)
# Retry on rate limiting and server errors
[429, 500, 502, 503, 504, 520, 521, 522, 523, 524].include?(code)
end
def self.exponential_backoff_with_jitter(attempt, base_delay, max_delay)
# Calculate exponential backoff
delay = base_delay * (2 ** attempt)
# Add jitter (random variation) to prevent thundering herd
jitter = delay * (0.1 + rand * 0.1) # 10-20% jitter
# Cap the delay
[delay + jitter, max_delay].min
end
end
# Usage with custom retry settings
begin
response = AdvancedRetryClient.robust_request(
:get,
'https://api.example.com/data',
headers: { 'User-Agent' => 'MyBot/1.0' },
timeout: 10,
max_retries: 5,
base_delay: 2,
max_delay: 60
)
puts "Success: #{response.code}"
rescue => e
puts "Failed: #{e.message}"
end
Conditional Retry Logic
Implement sophisticated retry conditions based on response content:
class ConditionalRetryClient
include HTTParty
def self.smart_retry(url, options = {}, &block)
max_retries = options.delete(:max_retries) || 3
retry_conditions = options.delete(:retry_if) || default_retry_conditions
(0..max_retries).each do |attempt|
begin
response = get(url, options)
# Check custom retry conditions
should_retry = retry_conditions.any? { |condition| condition.call(response, attempt) }
unless should_retry
return block_given? ? yield(response) : response
end
if attempt == max_retries
raise "Max retries exceeded. Last response: #{response.code}"
end
delay = 2 ** attempt + rand(0..1)
puts "Retry condition met - waiting #{delay}s before attempt #{attempt + 2}"
sleep(delay)
rescue => e
if attempt == max_retries
raise "Request failed after #{max_retries + 1} attempts: #{e.message}"
end
sleep(2 ** attempt)
end
end
end
private
def self.default_retry_conditions
[
# Retry on server errors
->(response, attempt) { response.code >= 500 },
# Retry on rate limiting
->(response, attempt) { response.code == 429 },
# Retry if response contains error indicators
->(response, attempt) {
response.body&.include?('temporarily unavailable') ||
response.body&.include?('try again later')
},
# Don't retry after too many attempts
->(response, attempt) { attempt < 3 }
]
end
end
# Usage with custom retry conditions
custom_conditions = [
->(response, attempt) { response.code == 503 && attempt < 2 },
->(response, attempt) { response.headers['retry-after'] && attempt < 1 }
]
response = ConditionalRetryClient.smart_retry(
'https://api.example.com/data',
timeout: 15,
retry_if: custom_conditions
) do |resp|
JSON.parse(resp.body)
end
Handling Rate Limiting
Respect Retry-After Headers
Many APIs include Retry-After
headers when rate limiting:
class RateLimitAwareClient
include HTTParty
def self.respectful_request(url, options = {})
max_retries = 3
(0..max_retries).each do |attempt|
response = get(url, options)
case response.code
when 200..299
return response
when 429
# Rate limited - check for Retry-After header
retry_after = response.headers['retry-after']&.to_i || response.headers['x-ratelimit-reset']&.to_i
if retry_after && retry_after <= 300 # Don't wait more than 5 minutes
puts "Rate limited. Waiting #{retry_after} seconds..."
sleep(retry_after + 1) # Add 1 second buffer
next
else
# No retry-after or too long, use exponential backoff
delay = [2 ** attempt, 60].min
puts "Rate limited. Waiting #{delay} seconds..."
sleep(delay)
end
when 500..599
if attempt == max_retries
raise "Server error after #{max_retries + 1} attempts: #{response.code}"
end
delay = 2 ** attempt
puts "Server error #{response.code}. Retrying in #{delay}s..."
sleep(delay)
else
raise "HTTP Error: #{response.code} - #{response.message}"
end
end
raise "Max retries exceeded"
end
end
Production-Ready Retry Wrapper
Complete Implementation with Logging
require 'httparty'
require 'logger'
class ProductionHTTPClient
include HTTParty
def initialize(options = {})
@logger = options[:logger] || Logger.new(STDOUT)
@max_retries = options[:max_retries] || 3
@base_delay = options[:base_delay] || 1
@max_delay = options[:max_delay] || 30
@timeout = options[:timeout] || 30
end
def request(method, url, options = {})
request_id = SecureRandom.hex(8)
start_time = Time.now
@logger.info("#{request_id}: Starting #{method.upcase} #{url}")
(0..@max_retries).each do |attempt|
attempt_start = Time.now
begin
# Set default timeout if not specified
options[:timeout] ||= @timeout
response = self.class.send(method, url, options)
duration = ((Time.now - attempt_start) * 1000).round(2)
@logger.info("#{request_id}: Attempt #{attempt + 1} - #{response.code} (#{duration}ms)")
if response.success?
total_duration = ((Time.now - start_time) * 1000).round(2)
@logger.info("#{request_id}: Success after #{attempt + 1} attempts (#{total_duration}ms total)")
return response
elsif retryable_status?(response.code) && attempt < @max_retries
delay = calculate_delay(attempt)
@logger.warn("#{request_id}: HTTP #{response.code} - Retrying in #{delay}s")
sleep(delay)
else
raise "HTTP #{response.code}: #{response.message}"
end
rescue Net::TimeoutError, Errno::ECONNREFUSED, Errno::ECONNRESET, SocketError => e
duration = ((Time.now - attempt_start) * 1000).round(2)
@logger.error("#{request_id}: Network error on attempt #{attempt + 1} (#{duration}ms): #{e.class} - #{e.message}")
if attempt < @max_retries
delay = calculate_delay(attempt)
@logger.info("#{request_id}: Retrying in #{delay}s...")
sleep(delay)
else
total_duration = ((Time.now - start_time) * 1000).round(2)
@logger.error("#{request_id}: Failed after #{@max_retries + 1} attempts (#{total_duration}ms total)")
raise
end
end
end
end
private
def retryable_status?(code)
[429, 500, 502, 503, 504, 520, 521, 522, 523, 524].include?(code)
end
def calculate_delay(attempt)
delay = @base_delay * (2 ** attempt)
jitter = delay * (0.1 + rand * 0.1)
[delay + jitter, @max_delay].min
end
end
# Usage
client = ProductionHTTPClient.new(
max_retries: 5,
base_delay: 2,
timeout: 15,
logger: Logger.new('scraping.log')
)
begin
response = client.request(:get, 'https://api.example.com/data', {
headers: { 'User-Agent' => 'MyBot/1.0' },
query: { limit: 100 }
})
data = JSON.parse(response.body)
puts "Retrieved #{data.size} records"
rescue => e
puts "Request failed: #{e.message}"
end
Best Practices
1. Choose Appropriate Retry Conditions
- Retry on transient failures (5xx errors, timeouts, connection issues)
- Don't retry on client errors (4xx except 429)
- Respect rate limiting with appropriate delays
2. Implement Exponential Backoff
- Use exponential backoff to reduce load on struggling servers
- Add jitter to prevent synchronized retry storms
- Cap maximum delay to avoid indefinite waits
3. Log Retry Attempts
- Track retry attempts for monitoring and debugging
- Include request IDs for tracing
- Log timing information for performance analysis
4. Set Reasonable Limits
- Limit maximum retry attempts (typically 3-5)
- Set maximum delay caps (usually 30-60 seconds)
- Implement overall request timeouts
5. Handle Different Error Types
- Network errors: Connection refused, timeouts
- HTTP errors: Server errors, rate limiting
- Application errors: Invalid responses, parsing failures
When building web scraping applications, robust retry logic is essential for handling the unpredictable nature of web services. For more complex scenarios involving browser automation and error handling, consider implementing similar retry patterns with your chosen tools.
The retry implementations shown above provide a solid foundation for building resilient HTTParty-based scraping applications that can gracefully handle temporary failures while respecting server resources and rate limits.