Table of contents

How Can I Implement Custom Response Parsing with HTTParty?

HTTParty provides several powerful mechanisms for implementing custom response parsing, allowing you to process API responses in ways that go beyond the default JSON and XML parsing. Whether you're dealing with custom data formats, need specialized validation, or want to transform responses into domain-specific objects, HTTParty's flexible parsing system has you covered.

Understanding HTTParty's Response Parsing System

HTTParty automatically detects and parses common response formats like JSON and XML based on the Content-Type header. However, when working with APIs that return custom formats or when you need specialized processing, you'll need to implement custom parsing logic.

Method 1: Using Custom Parser Classes

The most robust approach is to create custom parser classes that implement HTTParty's parser interface. This method gives you complete control over how responses are processed.

Creating a Custom Parser

# Custom CSV parser
class CSVParser < HTTParty::Parser
  SupportedFormats = {
    'text/csv' => :csv,
    'application/csv' => :csv
  }.freeze

  def csv
    require 'csv'
    CSV.parse(body, headers: true, header_converters: :symbol)
  end
end

# Custom XML parser with additional processing
class CustomXMLParser < HTTParty::Parser
  SupportedFormats = {
    'application/xml' => :custom_xml,
    'text/xml' => :custom_xml
  }.freeze

  def custom_xml
    require 'nokogiri'
    doc = Nokogiri::XML(body)

    # Custom processing logic
    {
      metadata: extract_metadata(doc),
      items: extract_items(doc),
      parsed_at: Time.current
    }
  end

  private

  def extract_metadata(doc)
    {
      version: doc.at('//version')&.text,
      timestamp: doc.at('//timestamp')&.text
    }
  end

  def extract_items(doc)
    doc.xpath('//item').map do |item|
      {
        id: item['id'],
        name: item.at('name')&.text,
        value: item.at('value')&.text&.to_f
      }
    end
  end
end

Registering and Using Custom Parsers

class APIClient
  include HTTParty

  base_uri 'https://api.example.com'

  # Register custom parsers
  parser CSVParser
  parser CustomXMLParser

  def self.fetch_csv_data
    get('/data.csv')
  end

  def self.fetch_xml_report
    get('/reports/latest.xml')
  end
end

# Usage
csv_data = APIClient.fetch_csv_data
puts csv_data.class # => Array (from CSV parser)

xml_data = APIClient.fetch_xml_report
puts xml_data[:metadata] # => Custom parsed metadata

Method 2: Using Response Callbacks

For simpler parsing needs, you can use response callbacks to process data after it's been parsed by the default parsers.

class APIClient
  include HTTParty

  base_uri 'https://api.example.com'

  # Response callback for post-processing
  after_request do |request, response|
    if response.content_type == 'application/json'
      # Add custom fields to JSON responses
      response.parsed_response['processed_at'] = Time.current
      response.parsed_response['request_info'] = {
        method: request.http_method,
        uri: request.uri.to_s
      }
    end
  end

  def self.fetch_user(id)
    get("/users/#{id}")
  end
end

Method 3: Custom Response Processing with Blocks

You can process responses using blocks for one-off custom parsing:

class DataProcessor
  include HTTParty

  base_uri 'https://data.example.com'

  def self.fetch_and_process_metrics
    response = get('/metrics') do |response|
      case response.content_type
      when /json/
        process_json_metrics(response.parsed_response)
      when /xml/
        process_xml_metrics(response.body)
      when /text\/plain/
        process_text_metrics(response.body)
      else
        { error: "Unsupported format: #{response.content_type}" }
      end
    end

    response
  end

  private

  def self.process_json_metrics(data)
    {
      total_requests: data['requests'],
      avg_response_time: data['metrics']['avg_response_time'],
      success_rate: (data['successful'] / data['total'].to_f) * 100
    }
  end

  def self.process_xml_metrics(xml_body)
    require 'nokogiri'
    doc = Nokogiri::XML(xml_body)

    {
      total_requests: doc.at('//requests')&.text&.to_i,
      avg_response_time: doc.at('//avg_response_time')&.text&.to_f,
      errors: doc.xpath('//error').map(&:text)
    }
  end

  def self.process_text_metrics(text_body)
    lines = text_body.split("\n")
    metrics = {}

    lines.each do |line|
      key, value = line.split(': ')
      metrics[key.downcase.gsub(' ', '_')] = value if key && value
    end

    metrics
  end
end

Method 4: Creating Response Wrapper Classes

For complex applications, you might want to wrap responses in custom classes that provide domain-specific methods:

class APIResponse
  attr_reader :raw_response, :data, :metadata

  def initialize(httparty_response)
    @raw_response = httparty_response
    @data = parse_data
    @metadata = extract_metadata
  end

  def success?
    @raw_response.success?
  end

  def error_message
    @data.dig('error', 'message') if @data.is_a?(Hash)
  end

  def paginated?
    @metadata[:pagination].present?
  end

  def next_page_url
    @metadata.dig(:pagination, :next_url)
  end

  private

  def parse_data
    case @raw_response.content_type
    when /json/
      parsed = @raw_response.parsed_response
      # Custom JSON processing
      transform_json_keys(parsed)
    when /xml/
      # Custom XML processing
      transform_xml_to_hash(@raw_response.body)
    else
      @raw_response.body
    end
  end

  def extract_metadata
    {
      status_code: @raw_response.code,
      content_type: @raw_response.content_type,
      response_time: @raw_response.headers['x-response-time'],
      pagination: extract_pagination_info
    }
  end

  def transform_json_keys(hash)
    return hash unless hash.is_a?(Hash)

    hash.transform_keys { |key| key.to_s.underscore }
        .transform_values { |value| value.is_a?(Hash) ? transform_json_keys(value) : value }
  end

  def transform_xml_to_hash(xml_body)
    require 'nokogiri'
    doc = Nokogiri::XML(xml_body)
    # Custom XML to hash conversion logic
    xml_to_hash(doc.root)
  end

  def xml_to_hash(node)
    if node.children.any? { |child| child.element? }
      result = {}
      node.children.each do |child|
        next unless child.element?
        key = child.name.underscore
        result[key] = xml_to_hash(child)
      end
      result
    else
      node.text
    end
  end

  def extract_pagination_info
    if @data.is_a?(Hash) && @data['pagination']
      {
        current_page: @data['pagination']['current_page'],
        total_pages: @data['pagination']['total_pages'],
        next_url: @data['pagination']['next_url'],
        prev_url: @data['pagination']['prev_url']
      }
    end
  end
end

# Usage with wrapper class
class EnhancedAPIClient
  include HTTParty

  base_uri 'https://api.example.com'

  def self.fetch_users(params = {})
    response = get('/users', query: params)
    APIResponse.new(response)
  end
end

# Using the enhanced client
users_response = EnhancedAPIClient.fetch_users(page: 1, limit: 10)

if users_response.success?
  puts "Found #{users_response.data['users'].length} users"

  if users_response.paginated?
    puts "Next page: #{users_response.next_page_url}"
  end
else
  puts "Error: #{users_response.error_message}"
end

Advanced Parsing Techniques

Handling Binary Data

class FileDownloader
  include HTTParty

  def self.download_image(url)
    response = get(url, stream_body: true) do |response|
      if response.content_type.start_with?('image/')
        {
          content_type: response.content_type,
          size: response.body.length,
          data: response.body,
          filename: extract_filename(response.headers['content-disposition'])
        }
      else
        { error: 'Not an image file' }
      end
    end

    response
  end

  private

  def self.extract_filename(content_disposition)
    return nil unless content_disposition

    content_disposition[/filename[^;=\n]*=((['"]).*?\2|[^;\n]*)/, 1]
                       .delete('"')
  end
end

Conditional Parsing Based on Response Headers

class SmartAPIClient
  include HTTParty

  parser Proc.new do |response, format|
    api_version = response.headers['x-api-version']

    case api_version
    when '1.0'
      parse_v1_response(response.body, format)
    when '2.0'
      parse_v2_response(response.body, format)
    else
      # Default parsing
      HTTParty::Parser.call(response.body, format)
    end
  end

  private

  def self.parse_v1_response(body, format)
    data = HTTParty::Parser.call(body, format)
    # Transform v1 format to standardized format
    transform_v1_to_standard(data)
  end

  def self.parse_v2_response(body, format)
    # v2 is already in standard format
    HTTParty::Parser.call(body, format)
  end

  def self.transform_v1_to_standard(data)
    return data unless data.is_a?(Hash)

    {
      version: '1.0',
      data: data['payload'],
      metadata: {
        timestamp: data['timestamp'],
        request_id: data['req_id']
      }
    }
  end
end

Error Handling in Custom Parsers

class RobustParser < HTTParty::Parser
  SupportedFormats = {
    'application/json' => :safe_json,
    'text/xml' => :safe_xml
  }.freeze

  def safe_json
    JSON.parse(body)
  rescue JSON::ParserError => e
    {
      error: 'JSON parsing failed',
      message: e.message,
      raw_body: body[0, 500] # First 500 chars for debugging
    }
  end

  def safe_xml
    require 'nokogiri'
    Nokogiri::XML(body) do |config|
      config.strict.nonet
    end
  rescue Nokogiri::XML::SyntaxError => e
    {
      error: 'XML parsing failed',
      message: e.message,
      line: e.line,
      column: e.column
    }
  end
end

Testing Custom Parsers

# spec/parsers/csv_parser_spec.rb
RSpec.describe CSVParser do
  let(:csv_body) { "name,age,city\nJohn,30,NYC\nJane,25,LA" }
  let(:response) { double('response', body: csv_body) }

  describe '#csv' do
    subject { described_class.new(csv_body, 'text/csv') }

    it 'parses CSV data correctly' do
      result = subject.csv

      expect(result).to be_an(Array)
      expect(result.first).to include(name: 'John', age: '30', city: 'NYC')
    end
  end
end

Best Practices for Custom Response Parsing

  1. Handle Errors Gracefully: Always include error handling in your custom parsers to prevent application crashes from malformed data.

  2. Validate Input: Check content types and response codes before attempting to parse.

  3. Performance Considerations: For large responses, consider streaming parsers or processing data in chunks.

  4. Memory Management: Be mindful of memory usage when parsing large datasets.

  5. Testing: Write comprehensive tests for your custom parsers with various input scenarios.

  6. Documentation: Document your custom parsing logic and expected input/output formats.

Custom response parsing with HTTParty provides the flexibility needed for complex data processing scenarios. Whether you're working with proprietary formats, need specialized validation, or want to create domain-specific response objects, these techniques will help you build robust and maintainable parsing solutions.

When implementing custom parsing, consider combining it with proper error handling techniques and authentication strategies for a complete web scraping solution.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon