Table of contents

How can I implement retry logic with HTTParty for failed requests?

Implementing retry logic is crucial for building resilient web scraping applications that can handle temporary network failures, server errors, and rate limiting. HTTParty doesn't include built-in retry functionality, but you can implement robust retry mechanisms using Ruby's standard libraries and custom wrapper methods.

Basic Retry Implementation

Simple Retry with Ruby's Retry Gem

The most straightforward approach is using the retries gem, which provides clean retry syntax:

require 'httparty'
require 'retries'

class ScrapingClient
  include HTTParty
  base_uri 'https://api.example.com'

  def self.fetch_with_retry(path, options = {})
    with_retries(max_tries: 3, base_sleep_seconds: 1, max_sleep_seconds: 5) do
      response = get(path, options)
      raise "HTTP Error: #{response.code}" unless response.success?
      response
    end
  end
end

# Usage
begin
  response = ScrapingClient.fetch_with_retry('/data')
  puts response.body
rescue => e
  puts "Failed after retries: #{e.message}"
end

Manual Retry Implementation

For more control, implement retry logic manually:

require 'httparty'

class HTTPRetryClient
  include HTTParty

  MAX_RETRIES = 3
  RETRY_DELAY = 1
  BACKOFF_MULTIPLIER = 2

  def self.get_with_retry(url, options = {})
    retries = 0

    begin
      response = get(url, options)

      # Check if we should retry based on status code
      if should_retry?(response.code, retries)
        raise "Retryable error: HTTP #{response.code}"
      end

      return response

    rescue => e
      retries += 1

      if retries <= MAX_RETRIES && retryable_error?(e)
        delay = calculate_delay(retries)
        puts "Attempt #{retries} failed: #{e.message}. Retrying in #{delay}s..."
        sleep(delay)
        retry
      else
        raise e
      end
    end
  end

  private

  def self.should_retry?(status_code, current_retries)
    return false if current_retries >= MAX_RETRIES

    # Retry on server errors and rate limiting
    [429, 500, 502, 503, 504].include?(status_code)
  end

  def self.retryable_error?(error)
    # Define which errors are worth retrying
    error.is_a?(Net::TimeoutError) ||
    error.is_a?(Net::OpenTimeout) ||
    error.is_a?(Net::ReadTimeout) ||
    error.is_a?(Errno::ECONNREFUSED) ||
    error.is_a?(Errno::ECONNRESET) ||
    error.message.include?("Retryable error")
  end

  def self.calculate_delay(attempt)
    # Exponential backoff with jitter
    base_delay = RETRY_DELAY * (BACKOFF_MULTIPLIER ** (attempt - 1))
    jitter = rand(0.1..0.3) * base_delay
    [base_delay + jitter, 30].min # Cap at 30 seconds
  end
end

Advanced Retry Strategies

Exponential Backoff with Jitter

Exponential backoff prevents thundering herd problems when multiple clients retry simultaneously:

class AdvancedRetryClient
  include HTTParty

  def self.robust_request(method, url, options = {})
    max_retries = options.delete(:max_retries) || 3
    base_delay = options.delete(:base_delay) || 1
    max_delay = options.delete(:max_delay) || 30

    (0..max_retries).each do |attempt|
      begin
        response = send(method, url, options)

        # Success conditions
        return response if response.success?

        # Don't retry client errors (4xx except 429)
        unless retryable_status?(response.code)
          raise "Non-retryable error: HTTP #{response.code}"
        end

        # Last attempt failed
        if attempt == max_retries
          raise "Max retries exceeded: HTTP #{response.code}"
        end

        delay = exponential_backoff_with_jitter(attempt, base_delay, max_delay)
        puts "HTTP #{response.code} - Retrying in #{delay.round(2)}s (attempt #{attempt + 1}/#{max_retries + 1})"
        sleep(delay)

      rescue Net::TimeoutError, Errno::ECONNREFUSED, Errno::ECONNRESET => e
        if attempt == max_retries
          raise "Network error after #{max_retries + 1} attempts: #{e.message}"
        end

        delay = exponential_backoff_with_jitter(attempt, base_delay, max_delay)
        puts "Network error - Retrying in #{delay.round(2)}s: #{e.message}"
        sleep(delay)
      end
    end
  end

  private

  def self.retryable_status?(code)
    # Retry on rate limiting and server errors
    [429, 500, 502, 503, 504, 520, 521, 522, 523, 524].include?(code)
  end

  def self.exponential_backoff_with_jitter(attempt, base_delay, max_delay)
    # Calculate exponential backoff
    delay = base_delay * (2 ** attempt)

    # Add jitter (random variation) to prevent thundering herd
    jitter = delay * (0.1 + rand * 0.1) # 10-20% jitter

    # Cap the delay
    [delay + jitter, max_delay].min
  end
end

# Usage with custom retry settings
begin
  response = AdvancedRetryClient.robust_request(
    :get, 
    'https://api.example.com/data',
    headers: { 'User-Agent' => 'MyBot/1.0' },
    timeout: 10,
    max_retries: 5,
    base_delay: 2,
    max_delay: 60
  )

  puts "Success: #{response.code}"
rescue => e
  puts "Failed: #{e.message}"
end

Conditional Retry Logic

Implement sophisticated retry conditions based on response content:

class ConditionalRetryClient
  include HTTParty

  def self.smart_retry(url, options = {}, &block)
    max_retries = options.delete(:max_retries) || 3
    retry_conditions = options.delete(:retry_if) || default_retry_conditions

    (0..max_retries).each do |attempt|
      begin
        response = get(url, options)

        # Check custom retry conditions
        should_retry = retry_conditions.any? { |condition| condition.call(response, attempt) }

        unless should_retry
          return block_given? ? yield(response) : response
        end

        if attempt == max_retries
          raise "Max retries exceeded. Last response: #{response.code}"
        end

        delay = 2 ** attempt + rand(0..1)
        puts "Retry condition met - waiting #{delay}s before attempt #{attempt + 2}"
        sleep(delay)

      rescue => e
        if attempt == max_retries
          raise "Request failed after #{max_retries + 1} attempts: #{e.message}"
        end
        sleep(2 ** attempt)
      end
    end
  end

  private

  def self.default_retry_conditions
    [
      # Retry on server errors
      ->(response, attempt) { response.code >= 500 },

      # Retry on rate limiting
      ->(response, attempt) { response.code == 429 },

      # Retry if response contains error indicators
      ->(response, attempt) { 
        response.body&.include?('temporarily unavailable') ||
        response.body&.include?('try again later')
      },

      # Don't retry after too many attempts
      ->(response, attempt) { attempt < 3 }
    ]
  end
end

# Usage with custom retry conditions
custom_conditions = [
  ->(response, attempt) { response.code == 503 && attempt < 2 },
  ->(response, attempt) { response.headers['retry-after'] && attempt < 1 }
]

response = ConditionalRetryClient.smart_retry(
  'https://api.example.com/data',
  timeout: 15,
  retry_if: custom_conditions
) do |resp|
  JSON.parse(resp.body)
end

Handling Rate Limiting

Respect Retry-After Headers

Many APIs include Retry-After headers when rate limiting:

class RateLimitAwareClient
  include HTTParty

  def self.respectful_request(url, options = {})
    max_retries = 3

    (0..max_retries).each do |attempt|
      response = get(url, options)

      case response.code
      when 200..299
        return response

      when 429
        # Rate limited - check for Retry-After header
        retry_after = response.headers['retry-after']&.to_i || response.headers['x-ratelimit-reset']&.to_i

        if retry_after && retry_after <= 300 # Don't wait more than 5 minutes
          puts "Rate limited. Waiting #{retry_after} seconds..."
          sleep(retry_after + 1) # Add 1 second buffer
          next
        else
          # No retry-after or too long, use exponential backoff
          delay = [2 ** attempt, 60].min
          puts "Rate limited. Waiting #{delay} seconds..."
          sleep(delay)
        end

      when 500..599
        if attempt == max_retries
          raise "Server error after #{max_retries + 1} attempts: #{response.code}"
        end

        delay = 2 ** attempt
        puts "Server error #{response.code}. Retrying in #{delay}s..."
        sleep(delay)

      else
        raise "HTTP Error: #{response.code} - #{response.message}"
      end
    end

    raise "Max retries exceeded"
  end
end

Production-Ready Retry Wrapper

Complete Implementation with Logging

require 'httparty'
require 'logger'

class ProductionHTTPClient
  include HTTParty

  def initialize(options = {})
    @logger = options[:logger] || Logger.new(STDOUT)
    @max_retries = options[:max_retries] || 3
    @base_delay = options[:base_delay] || 1
    @max_delay = options[:max_delay] || 30
    @timeout = options[:timeout] || 30
  end

  def request(method, url, options = {})
    request_id = SecureRandom.hex(8)
    start_time = Time.now

    @logger.info("#{request_id}: Starting #{method.upcase} #{url}")

    (0..@max_retries).each do |attempt|
      attempt_start = Time.now

      begin
        # Set default timeout if not specified
        options[:timeout] ||= @timeout

        response = self.class.send(method, url, options)
        duration = ((Time.now - attempt_start) * 1000).round(2)

        @logger.info("#{request_id}: Attempt #{attempt + 1} - #{response.code} (#{duration}ms)")

        if response.success?
          total_duration = ((Time.now - start_time) * 1000).round(2)
          @logger.info("#{request_id}: Success after #{attempt + 1} attempts (#{total_duration}ms total)")
          return response
        elsif retryable_status?(response.code) && attempt < @max_retries
          delay = calculate_delay(attempt)
          @logger.warn("#{request_id}: HTTP #{response.code} - Retrying in #{delay}s")
          sleep(delay)
        else
          raise "HTTP #{response.code}: #{response.message}"
        end

      rescue Net::TimeoutError, Errno::ECONNREFUSED, Errno::ECONNRESET, SocketError => e
        duration = ((Time.now - attempt_start) * 1000).round(2)
        @logger.error("#{request_id}: Network error on attempt #{attempt + 1} (#{duration}ms): #{e.class} - #{e.message}")

        if attempt < @max_retries
          delay = calculate_delay(attempt)
          @logger.info("#{request_id}: Retrying in #{delay}s...")
          sleep(delay)
        else
          total_duration = ((Time.now - start_time) * 1000).round(2)
          @logger.error("#{request_id}: Failed after #{@max_retries + 1} attempts (#{total_duration}ms total)")
          raise
        end
      end
    end
  end

  private

  def retryable_status?(code)
    [429, 500, 502, 503, 504, 520, 521, 522, 523, 524].include?(code)
  end

  def calculate_delay(attempt)
    delay = @base_delay * (2 ** attempt)
    jitter = delay * (0.1 + rand * 0.1)
    [delay + jitter, @max_delay].min
  end
end

# Usage
client = ProductionHTTPClient.new(
  max_retries: 5,
  base_delay: 2,
  timeout: 15,
  logger: Logger.new('scraping.log')
)

begin
  response = client.request(:get, 'https://api.example.com/data', {
    headers: { 'User-Agent' => 'MyBot/1.0' },
    query: { limit: 100 }
  })

  data = JSON.parse(response.body)
  puts "Retrieved #{data.size} records"
rescue => e
  puts "Request failed: #{e.message}"
end

Best Practices

1. Choose Appropriate Retry Conditions

  • Retry on transient failures (5xx errors, timeouts, connection issues)
  • Don't retry on client errors (4xx except 429)
  • Respect rate limiting with appropriate delays

2. Implement Exponential Backoff

  • Use exponential backoff to reduce load on struggling servers
  • Add jitter to prevent synchronized retry storms
  • Cap maximum delay to avoid indefinite waits

3. Log Retry Attempts

  • Track retry attempts for monitoring and debugging
  • Include request IDs for tracing
  • Log timing information for performance analysis

4. Set Reasonable Limits

  • Limit maximum retry attempts (typically 3-5)
  • Set maximum delay caps (usually 30-60 seconds)
  • Implement overall request timeouts

5. Handle Different Error Types

  • Network errors: Connection refused, timeouts
  • HTTP errors: Server errors, rate limiting
  • Application errors: Invalid responses, parsing failures

When building web scraping applications, robust retry logic is essential for handling the unpredictable nature of web services. For more complex scenarios involving browser automation and error handling, consider implementing similar retry patterns with your chosen tools.

The retry implementations shown above provide a solid foundation for building resilient HTTParty-based scraping applications that can gracefully handle temporary failures while respecting server resources and rate limits.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon