Table of contents

What are the best practices for handling HTTP errors and exceptions in HTTParty?

HTTParty is a popular Ruby gem that simplifies HTTP requests, but proper error handling is crucial for building reliable web scraping and API integration applications. This comprehensive guide covers the best practices for handling various types of HTTP errors and exceptions when using HTTParty.

Understanding HTTParty Error Types

HTTParty can encounter several types of errors during HTTP requests:

  1. Network-level errors (connection timeouts, DNS failures)
  2. HTTP status code errors (4xx client errors, 5xx server errors)
  3. Response parsing errors (invalid JSON, XML parsing issues)
  4. SSL/TLS certificate errors
  5. Redirect loop errors

Essential Error Handling Patterns

1. Basic Exception Handling Structure

Start with a comprehensive rescue block that catches common exceptions:

require 'httparty'

class APIClient
  include HTTParty

  def fetch_data(url)
    response = self.class.get(url)
    handle_response(response)
  rescue Net::TimeoutError => e
    handle_timeout_error(e)
  rescue Errno::ECONNREFUSED => e
    handle_connection_error(e)
  rescue SocketError => e
    handle_dns_error(e)
  rescue OpenSSL::SSL::SSLError => e
    handle_ssl_error(e)
  rescue HTTParty::Error => e
    handle_httparty_error(e)
  rescue StandardError => e
    handle_generic_error(e)
  end

  private

  def handle_response(response)
    case response.code
    when 200..299
      response.parsed_response
    when 400..499
      handle_client_error(response)
    when 500..599
      handle_server_error(response)
    else
      handle_unknown_status(response)
    end
  end
end

2. Timeout Configuration and Handling

Configure appropriate timeouts to prevent hanging requests:

class RobustAPIClient
  include HTTParty

  # Set default timeouts
  default_timeout 30
  open_timeout 10
  read_timeout 30

  def self.safe_request(method, url, options = {})
    # Override timeouts for specific requests if needed
    options[:timeout] ||= 15
    options[:open_timeout] ||= 5

    response = send(method, url, options)
    validate_response(response)
  rescue Net::TimeoutError
    retry_with_backoff(method, url, options)
  rescue Net::OpenTimeout
    { error: 'Connection timeout', code: 'TIMEOUT' }
  rescue Net::ReadTimeout
    { error: 'Read timeout', code: 'READ_TIMEOUT' }
  end

  private

  def self.retry_with_backoff(method, url, options, attempt = 1)
    max_retries = options.fetch(:max_retries, 3)

    if attempt <= max_retries
      sleep(2 ** attempt) # Exponential backoff
      safe_request(method, url, options.merge(attempt: attempt + 1))
    else
      { error: 'Maximum retries exceeded', code: 'MAX_RETRIES' }
    end
  end
end

3. HTTP Status Code Handling

Implement comprehensive status code handling:

module HTTPErrorHandler
  def handle_http_status(response)
    case response.code
    when 200
      success_response(response)
    when 201
      created_response(response)
    when 204
      no_content_response
    when 400
      bad_request_error(response)
    when 401
      unauthorized_error(response)
    when 403
      forbidden_error(response)
    when 404
      not_found_error(response)
    when 422
      validation_error(response)
    when 429
      rate_limit_error(response)
    when 500
      internal_server_error(response)
    when 502, 503, 504
      service_unavailable_error(response)
    else
      unexpected_status_error(response)
    end
  end

  private

  def rate_limit_error(response)
    retry_after = response.headers['retry-after']&.to_i || 60
    {
      error: 'Rate limit exceeded',
      code: 'RATE_LIMIT',
      retry_after: retry_after,
      headers: response.headers
    }
  end

  def validation_error(response)
    {
      error: 'Validation failed',
      code: 'VALIDATION_ERROR',
      details: parse_error_details(response)
    }
  end
end

Advanced Error Handling Techniques

4. Circuit Breaker Pattern

Implement a circuit breaker to prevent cascading failures:

class CircuitBreaker
  def initialize(failure_threshold = 5, recovery_timeout = 60)
    @failure_threshold = failure_threshold
    @recovery_timeout = recovery_timeout
    @failure_count = 0
    @last_failure_time = nil
    @state = :closed # :closed, :open, :half_open
  end

  def call
    case @state
    when :open
      if Time.now - @last_failure_time > @recovery_timeout
        @state = :half_open
        attempt_request
      else
        raise CircuitBreakerOpenError, "Circuit breaker is open"
      end
    when :half_open, :closed
      attempt_request
    end
  end

  private

  def attempt_request
    yield
    reset_failure_count
  rescue StandardError => e
    record_failure
    raise e
  end

  def record_failure
    @failure_count += 1
    @last_failure_time = Time.now

    if @failure_count >= @failure_threshold
      @state = :open
    end
  end

  def reset_failure_count
    @failure_count = 0
    @state = :closed
  end
end

5. Response Validation and Parsing

Validate responses before processing:

class ResponseValidator
  def self.validate_and_parse(response)
    validate_response_present(response)
    validate_content_type(response)
    validate_response_size(response)
    parse_response_safely(response)
  end

  private

  def self.validate_response_present(response)
    raise ResponseError, "No response received" if response.nil?
  end

  def self.validate_content_type(response)
    content_type = response.headers['content-type']
    unless content_type&.include?('application/json')
      raise ContentTypeError, "Unexpected content type: #{content_type}"
    end
  end

  def self.validate_response_size(response)
    if response.body.bytesize > 10.megabytes
      raise ResponseTooLargeError, "Response too large"
    end
  end

  def self.parse_response_safely(response)
    JSON.parse(response.body)
  rescue JSON::ParserError => e
    raise ParseError, "Invalid JSON response: #{e.message}"
  end
end

6. Retry Logic with Exponential Backoff

Implement sophisticated retry mechanisms:

module RetryableHTTP
  def with_retry(max_attempts: 3, base_delay: 1, max_delay: 60)
    attempt = 1

    begin
      yield
    rescue Net::TimeoutError, 
           Errno::ECONNREFUSED, 
           Errno::EHOSTUNREACH,
           SocketError => e

      if attempt < max_attempts
        delay = [base_delay * (2 ** (attempt - 1)), max_delay].min

        Rails.logger.warn(
          "Request failed (attempt #{attempt}/#{max_attempts}): #{e.message}. " \
          "Retrying in #{delay} seconds..."
        )

        sleep(delay)
        attempt += 1
        retry
      else
        raise e
      end
    end
  end
end

# Usage example
class APIService
  include RetryableHTTP

  def fetch_user_data(user_id)
    with_retry(max_attempts: 5, base_delay: 2) do
      HTTParty.get("https://api.example.com/users/#{user_id}")
    end
  end
end

Logging and Monitoring

7. Comprehensive Error Logging

Implement detailed logging for debugging and monitoring:

class HTTPLogger
  def self.log_request(method, url, options = {})
    Rails.logger.info({
      event: 'http_request_start',
      method: method.upcase,
      url: sanitize_url(url),
      headers: sanitize_headers(options[:headers]),
      timestamp: Time.current.iso8601
    }.to_json)
  end

  def self.log_response(response, duration)
    Rails.logger.info({
      event: 'http_request_complete',
      status: response.code,
      duration_ms: (duration * 1000).round(2),
      response_size: response.body&.bytesize,
      timestamp: Time.current.iso8601
    }.to_json)
  end

  def self.log_error(error, context = {})
    Rails.logger.error({
      event: 'http_request_error',
      error_class: error.class.name,
      error_message: error.message,
      context: context,
      backtrace: error.backtrace&.first(5),
      timestamp: Time.current.iso8601
    }.to_json)
  end

  private

  def self.sanitize_url(url)
    # Remove sensitive parameters
    uri = URI.parse(url)
    uri.query = nil if uri.query&.include?('password')
    uri.to_s
  end

  def self.sanitize_headers(headers)
    return {} unless headers

    headers.reject { |key, _| key.to_s.downcase.include?('authorization') }
  end
end

Production-Ready Error Handling Class

Here's a complete, production-ready HTTP client with comprehensive error handling:

class RobustHTTPClient
  include HTTParty

  # Configure HTTParty defaults
  default_timeout 30
  format :json

  class << self
    def safe_get(url, options = {})
      make_request(:get, url, options)
    end

    def safe_post(url, options = {})
      make_request(:post, url, options)
    end

    private

    def make_request(method, url, options = {})
      start_time = Time.current
      HTTPLogger.log_request(method, url, options)

      response = with_timeout_and_retry(options) do
        send(method, url, prepare_options(options))
      end

      duration = Time.current - start_time
      HTTPLogger.log_response(response, duration)

      process_response(response)

    rescue => e
      HTTPLogger.log_error(e, { method: method, url: url })
      handle_exception(e)
    end

    def with_timeout_and_retry(options)
      max_retries = options.fetch(:max_retries, 3)
      current_attempt = 1

      begin
        yield
      rescue Net::TimeoutError, 
             Net::OpenTimeout, 
             Net::ReadTimeout,
             Errno::ECONNREFUSED => e

        if current_attempt <= max_retries
          delay = calculate_backoff_delay(current_attempt)
          sleep(delay)
          current_attempt += 1
          retry
        else
          raise e
        end
      end
    end

    def prepare_options(options)
      {
        timeout: options.fetch(:timeout, 30),
        open_timeout: options.fetch(:open_timeout, 10),
        read_timeout: options.fetch(:read_timeout, 30),
        headers: default_headers.merge(options.fetch(:headers, {})),
        follow_redirects: options.fetch(:follow_redirects, true),
        limit: options.fetch(:redirect_limit, 5)
      }.merge(options.except(:max_retries, :timeout, :open_timeout, :read_timeout))
    end

    def default_headers
      {
        'User-Agent' => 'RobustHTTPClient/1.0',
        'Accept' => 'application/json',
        'Content-Type' => 'application/json'
      }
    end

    def process_response(response)
      case response.code
      when 200..299
        { success: true, data: response.parsed_response, status: response.code }
      when 400..499
        { success: false, error: 'Client error', status: response.code, details: response.parsed_response }
      when 500..599
        { success: false, error: 'Server error', status: response.code, details: response.parsed_response }
      else
        { success: false, error: 'Unknown error', status: response.code }
      end
    end

    def handle_exception(exception)
      case exception
      when Net::TimeoutError, Net::OpenTimeout, Net::ReadTimeout
        { success: false, error: 'Request timeout', exception: exception.class.name }
      when Errno::ECONNREFUSED
        { success: false, error: 'Connection refused', exception: exception.class.name }
      when SocketError
        { success: false, error: 'Network error', exception: exception.class.name }
      when OpenSSL::SSL::SSLError
        { success: false, error: 'SSL error', exception: exception.class.name }
      else
        { success: false, error: 'Unexpected error', exception: exception.class.name, message: exception.message }
      end
    end

    def calculate_backoff_delay(attempt)
      [1, 2, 4, 8, 16][attempt - 1] || 30
    end
  end
end

Integration with Background Jobs

When using HTTParty in background jobs, consider implementing additional error handling:

class APIDataFetchJob < ApplicationJob
  queue_as :default

  retry_on Net::TimeoutError, wait: :exponentially_longer, attempts: 5
  retry_on Errno::ECONNREFUSED, wait: 30.seconds, attempts: 3

  def perform(url, options = {})
    response = RobustHTTPClient.safe_get(url, options)

    if response[:success]
      process_successful_response(response[:data])
    else
      handle_failed_response(response)
    end
  rescue => e
    Rails.logger.error("Job failed: #{e.message}")
    raise e
  end

  private

  def process_successful_response(data)
    # Process the successful response
  end

  def handle_failed_response(response)
    case response[:status]
    when 404
      Rails.logger.warn("Resource not found: #{response}")
    when 429
      # Schedule retry after rate limit reset
      self.class.set(wait: 1.hour).perform_later(*arguments)
    else
      Rails.logger.error("API request failed: #{response}")
    end
  end
end

Best Practices Summary

  1. Always handle timeouts - Set appropriate connection and read timeouts
  2. Implement retry logic - Use exponential backoff for transient failures
  3. Validate responses - Check status codes and content types before processing
  4. Log comprehensively - Include timing, status codes, and error details
  5. Use circuit breakers - Prevent cascading failures in distributed systems
  6. Handle rate limits gracefully - Respect Retry-After headers
  7. Sanitize logs - Remove sensitive information from request/response logs
  8. Monitor and alert - Set up monitoring for error rates and response times

By following these best practices, you'll build robust applications that gracefully handle HTTP errors and exceptions when using HTTParty. Similar error handling principles apply when handling errors in Puppeteer for browser automation or implementing timeout handling in Puppeteer for web scraping scenarios.

Remember that proper error handling is not just about catching exceptions—it's about building resilient systems that can recover from failures and provide meaningful feedback to users and developers.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon