Table of contents

How do I handle HTTP status codes and response validation in HTTParty?

When building robust web scraping applications or API integrations with HTTParty, proper handling of HTTP status codes and response validation is crucial for creating reliable, production-ready applications. This comprehensive guide covers various techniques to check status codes, validate responses, and implement error handling strategies.

Understanding HTTP Status Codes in HTTParty

HTTParty provides several ways to access and handle HTTP status codes from server responses. The response object includes methods to check both the numeric status code and the response success state.

Basic Status Code Access

require 'httparty'

response = HTTParty.get('https://api.example.com/users')

# Access status code
puts response.code          # Returns: 200 (as integer)
puts response.response.code # Returns: "200" (as string)
puts response.message       # Returns: "OK"

# Check response success
puts response.success?      # Returns: true for 2xx codes
puts response.ok?          # Alias for success?

Checking Specific Status Codes

response = HTTParty.get('https://api.example.com/data')

case response.code
when 200
  puts "Success: Data retrieved successfully"
  process_data(response.parsed_response)
when 201
  puts "Created: Resource created successfully"
when 204
  puts "No Content: Request successful but no data"
when 400
  puts "Bad Request: Check your parameters"
when 401
  puts "Unauthorized: Authentication required"
when 403
  puts "Forbidden: Access denied"
when 404
  puts "Not Found: Resource doesn't exist"
when 429
  puts "Rate Limited: Too many requests"
when 500..599
  puts "Server Error: #{response.message}"
else
  puts "Unexpected status: #{response.code}"
end

Advanced Response Validation Techniques

Custom Response Validation Class

Create a reusable response validator for consistent error handling across your application:

class ResponseValidator
  def self.validate(response)
    case response.code
    when 200..299
      { success: true, data: response.parsed_response }
    when 400
      { success: false, error: 'Bad Request', details: parse_error_details(response) }
    when 401
      { success: false, error: 'Unauthorized', action: 'refresh_token' }
    when 403
      { success: false, error: 'Forbidden', action: 'check_permissions' }
    when 404
      { success: false, error: 'Not Found', retryable: false }
    when 429
      { success: false, error: 'Rate Limited', retry_after: response.headers['retry-after'] }
    when 500..599
      { success: false, error: 'Server Error', retryable: true }
    else
      { success: false, error: 'Unknown Status', code: response.code }
    end
  end

  private

  def self.parse_error_details(response)
    return response.parsed_response if response.parsed_response.is_a?(Hash)
    response.body
  rescue
    'Unable to parse error details'
  end
end

# Usage
response = HTTParty.get('https://api.example.com/data')
result = ResponseValidator.validate(response)

if result[:success]
  puts "Data: #{result[:data]}"
else
  puts "Error: #{result[:error]}"
  handle_error(result)
end

Response Content Validation

Beyond status codes, validate the actual response content structure and data integrity:

def validate_api_response(response)
  # First check HTTP status
  unless response.success?
    raise "HTTP Error: #{response.code} - #{response.message}"
  end

  # Parse and validate JSON structure
  data = response.parsed_response

  # Validate required fields
  required_fields = ['id', 'name', 'email']
  missing_fields = required_fields - data.keys

  if missing_fields.any?
    raise "Missing required fields: #{missing_fields.join(', ')}"
  end

  # Validate data types
  unless data['id'].is_a?(Integer)
    raise "Invalid data type for 'id': expected Integer, got #{data['id'].class}"
  end

  # Validate email format
  unless data['email'].match?(/\A[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+\z/i)
    raise "Invalid email format: #{data['email']}"
  end

  data
rescue JSON::ParserError => e
  raise "Invalid JSON response: #{e.message}"
end

# Usage
begin
  response = HTTParty.get('https://api.example.com/user/123')
  user_data = validate_api_response(response)
  puts "Valid user data: #{user_data}"
rescue => e
  puts "Validation failed: #{e.message}"
end

Implementing Robust Error Handling

Retry Logic with Exponential Backoff

For handling temporary failures and rate limiting:

class HTTPClient
  include HTTParty

  MAX_RETRIES = 3
  BASE_DELAY = 1

  def self.get_with_retry(url, options = {})
    retries = 0

    begin
      response = get(url, options)

      case response.code
      when 200..299
        return response
      when 429, 500..599
        raise RetryableError.new("Retryable error: #{response.code}")
      else
        raise NonRetryableError.new("Non-retryable error: #{response.code}")
      end

    rescue RetryableError => e
      retries += 1

      if retries <= MAX_RETRIES
        delay = BASE_DELAY * (2 ** (retries - 1))
        puts "Retry #{retries}/#{MAX_RETRIES} after #{delay}s: #{e.message}"
        sleep(delay)
        retry
      else
        raise "Max retries exceeded: #{e.message}"
      end
    end
  end
end

class RetryableError < StandardError; end
class NonRetryableError < StandardError; end

# Usage
begin
  response = HTTPClient.get_with_retry('https://api.example.com/data')
  puts "Success: #{response.parsed_response}"
rescue => e
  puts "Failed after retries: #{e.message}"
end

Comprehensive Error Handler

module HTTPErrorHandler
  def handle_response(response, context = {})
    case response.code
    when 200..299
      yield(response) if block_given?
      response.parsed_response
    when 400
      handle_bad_request(response, context)
    when 401
      handle_unauthorized(response, context)
    when 403
      handle_forbidden(response, context)
    when 404
      handle_not_found(response, context)
    when 429
      handle_rate_limit(response, context)
    when 500..599
      handle_server_error(response, context)
    else
      handle_unknown_error(response, context)
    end
  end

  private

  def handle_bad_request(response, context)
    error_details = response.parsed_response rescue response.body
    log_error("Bad Request", response.code, error_details, context)
    raise ArgumentError, "Invalid request parameters: #{error_details}"
  end

  def handle_unauthorized(response, context)
    log_error("Unauthorized", response.code, "Authentication failed", context)
    # Trigger token refresh if applicable
    refresh_authentication if respond_to?(:refresh_authentication)
    raise StandardError, "Authentication required"
  end

  def handle_forbidden(response, context)
    log_error("Forbidden", response.code, "Access denied", context)
    raise StandardError, "Insufficient permissions"
  end

  def handle_not_found(response, context)
    log_error("Not Found", response.code, "Resource not found", context)
    return nil if context[:allow_not_found]
    raise StandardError, "Resource not found"
  end

  def handle_rate_limit(response, context)
    retry_after = response.headers['retry-after']&.to_i || 60
    log_error("Rate Limited", response.code, "Retry after #{retry_after}s", context)

    if context[:auto_retry]
      sleep(retry_after)
      raise RetryableError, "Rate limited, retrying after #{retry_after}s"
    else
      raise StandardError, "Rate limit exceeded"
    end
  end

  def handle_server_error(response, context)
    log_error("Server Error", response.code, response.message, context)
    raise RetryableError, "Server error: #{response.message}"
  end

  def handle_unknown_error(response, context)
    log_error("Unknown Error", response.code, response.message, context)
    raise StandardError, "Unexpected response: #{response.code}"
  end

  def log_error(type, code, message, context)
    puts "[#{Time.now}] #{type} (#{code}): #{message} | Context: #{context}"
  end
end

class APIClient
  include HTTParty
  include HTTPErrorHandler

  def fetch_user(user_id)
    response = self.class.get("/users/#{user_id}")
    handle_response(response, { resource: 'user', id: user_id, allow_not_found: true })
  end
end

Status Code-Specific Handling Strategies

Handling Different Response Types

def process_api_response(url, expected_types = ['application/json'])
  response = HTTParty.get(url)

  # Validate status code
  unless response.success?
    puts "Request failed with status: #{response.code}"
    return handle_error_response(response)
  end

  # Validate content type
  content_type = response.headers['content-type']
  unless expected_types.any? { |type| content_type&.include?(type) }
    puts "Unexpected content type: #{content_type}"
    return nil
  end

  # Process based on content type
  case content_type
  when /application\/json/
    response.parsed_response
  when /text\/html/
    response.body
  when /text\/plain/
    response.body
  else
    response.body
  end
rescue => e
  puts "Processing error: #{e.message}"
  nil
end

def handle_error_response(response)
  case response.code
  when 400..499
    puts "Client error: #{response.parsed_response rescue response.body}"
  when 500..599
    puts "Server error: #{response.message}"
  end
  nil
end

Best Practices for Production Applications

1. Implement Circuit Breaker Pattern

class CircuitBreaker
  FAILURE_THRESHOLD = 5
  RECOVERY_TIMEOUT = 30

  def initialize
    @failure_count = 0
    @last_failure_time = nil
    @state = :closed # :closed, :open, :half_open
  end

  def call(&block)
    case @state
    when :closed
      execute_request(&block)
    when :open
      if Time.now - @last_failure_time > RECOVERY_TIMEOUT
        @state = :half_open
        execute_request(&block)
      else
        raise "Circuit breaker is open"
      end
    when :half_open
      execute_request(&block)
    end
  end

  private

  def execute_request(&block)
    result = yield
    on_success
    result
  rescue => e
    on_failure
    raise e
  end

  def on_success
    @failure_count = 0
    @state = :closed
  end

  def on_failure
    @failure_count += 1
    @last_failure_time = Time.now
    @state = :open if @failure_count >= FAILURE_THRESHOLD
  end
end

2. Comprehensive Logging

class HTTPLogger
  def self.log_request(method, url, options = {})
    puts "[REQUEST] #{method.upcase} #{url}"
    puts "[HEADERS] #{options[:headers]}" if options[:headers]
    puts "[BODY] #{options[:body]}" if options[:body]
  end

  def self.log_response(response, duration)
    puts "[RESPONSE] #{response.code} #{response.message} (#{duration}ms)"
    puts "[HEADERS] #{response.headers}"
    puts "[BODY] #{response.body[0..500]}..." if response.body
  end
end

Conclusion

Proper HTTP status code handling and response validation are essential for building reliable web scraping and API integration applications with HTTParty. By implementing comprehensive error handling, retry logic, and validation strategies, you can create robust applications that gracefully handle various failure scenarios and provide meaningful feedback to users.

Remember to always validate both the HTTP status codes and the actual response content, implement appropriate retry mechanisms for transient errors, and maintain detailed logging for debugging and monitoring purposes. For more complex scenarios involving browser automation, consider exploring how to handle timeouts in Puppeteer or how to handle errors in Puppeteer for complementary error handling strategies.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon