Table of contents

How do I Set Custom Headers When Making Requests with HTTParty?

HTTParty is a popular Ruby gem that simplifies HTTP requests and makes web scraping and API interactions more straightforward. Setting custom headers is essential for many web scraping scenarios, including authentication, mimicking browser behavior, and accessing APIs that require specific headers.

Basic Header Configuration

The most straightforward way to set custom headers in HTTParty is using the :headers option in your request methods:

require 'httparty'

response = HTTParty.get('https://api.example.com/data', 
  headers: {
    'User-Agent' => 'MyApp/1.0',
    'Authorization' => 'Bearer your-token-here',
    'Content-Type' => 'application/json'
  }
)

puts response.body

Setting Headers at the Class Level

For applications that make multiple requests with the same headers, you can set default headers at the class level:

class APIClient
  include HTTParty

  base_uri 'https://api.example.com'

  headers({
    'User-Agent' => 'MyApp/1.0',
    'Accept' => 'application/json',
    'Content-Type' => 'application/json'
  })

  def self.get_user_data(user_id)
    get("/users/#{user_id}")
  end

  def self.create_user(user_data)
    post('/users', 
      body: user_data.to_json,
      headers: { 'Authorization' => "Bearer #{ENV['API_TOKEN']}" }
    )
  end
end

# Usage
user = APIClient.get_user_data(123)

Common Header Use Cases

1. User-Agent Headers for Web Scraping

Setting a realistic User-Agent header is crucial for successful web scraping, as many websites block requests from automated tools:

# Common browser user agents
user_agents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
]

response = HTTParty.get('https://example.com', 
  headers: {
    'User-Agent' => user_agents.sample,
    'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language' => 'en-US,en;q=0.5',
    'Accept-Encoding' => 'gzip, deflate',
    'DNT' => '1',
    'Connection' => 'keep-alive',
    'Upgrade-Insecure-Requests' => '1'
  }
)

2. API Authentication Headers

Different APIs require various authentication methods through headers:

# Bearer token authentication
response = HTTParty.get('https://api.example.com/protected', 
  headers: {
    'Authorization' => 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...',
    'Accept' => 'application/json'
  }
)

# API key authentication
response = HTTParty.get('https://api.example.com/data', 
  headers: {
    'X-API-Key' => 'your-api-key-here',
    'X-RapidAPI-Host' => 'example.p.rapidapi.com',
    'X-RapidAPI-Key' => 'your-rapidapi-key'
  }
)

# Basic authentication (alternative to HTTParty's basic_auth)
require 'base64'
credentials = Base64.encode64("username:password").chomp
response = HTTParty.get('https://api.example.com/data', 
  headers: {
    'Authorization' => "Basic #{credentials}"
  }
)

3. Content-Type Headers for POST Requests

When sending data to APIs, proper Content-Type headers are essential:

# JSON data
user_data = { name: 'John Doe', email: 'john@example.com' }
response = HTTParty.post('https://api.example.com/users',
  body: user_data.to_json,
  headers: {
    'Content-Type' => 'application/json',
    'Accept' => 'application/json'
  }
)

# Form data
form_data = { username: 'johndoe', password: 'secret123' }
response = HTTParty.post('https://example.com/login',
  body: form_data,
  headers: {
    'Content-Type' => 'application/x-www-form-urlencoded',
    'User-Agent' => 'Mozilla/5.0 (compatible; MyBot/1.0)'
  }
)

# Multipart form data
require 'mime/types'
response = HTTParty.post('https://api.example.com/upload',
  body: {
    file: File.open('/path/to/file.pdf'),
    description: 'Important document'
  },
  headers: {
    'Authorization' => 'Bearer your-token'
    # HTTParty automatically sets Content-Type for multipart
  }
)

Dynamic Header Management

For more complex scenarios, you might need to set headers dynamically:

class WebScraper
  include HTTParty

  def initialize(options = {})
    @default_headers = {
      'User-Agent' => options[:user_agent] || default_user_agent,
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language' => 'en-US,en;q=0.5'
    }
  end

  def fetch_page(url, additional_headers = {})
    headers = @default_headers.merge(additional_headers)

    self.class.get(url, headers: headers)
  end

  def fetch_with_auth(url, token)
    auth_headers = { 'Authorization' => "Bearer #{token}" }
    fetch_page(url, auth_headers)
  end

  private

  def default_user_agent
    'Mozilla/5.0 (compatible; WebScraper/1.0; +http://example.com/bot)'
  end
end

# Usage
scraper = WebScraper.new
response = scraper.fetch_with_auth('https://api.example.com/data', 'your-token')

Header Rotation for Anti-Detection

To avoid detection during extensive web scraping, you can rotate headers:

class HeaderRotator
  BROWSER_HEADERS = [
    {
      'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
      'Accept-Language' => 'en-US,en;q=0.9'
    },
    {
      'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language' => 'en-GB,en;q=0.9'
    },
    {
      'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36',
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language' => 'en-US,en;q=0.8'
    }
  ].freeze

  def self.random_headers
    BROWSER_HEADERS.sample.merge({
      'Accept-Encoding' => 'gzip, deflate, br',
      'DNT' => ['1', '0'].sample,
      'Connection' => 'keep-alive',
      'Upgrade-Insecure-Requests' => '1',
      'Sec-Fetch-Dest' => 'document',
      'Sec-Fetch-Mode' => 'navigate',
      'Sec-Fetch-Site' => 'none'
    })
  end
end

# Usage with rotation
urls = ['https://example1.com', 'https://example2.com', 'https://example3.com']

urls.each do |url|
  response = HTTParty.get(url, headers: HeaderRotator.random_headers)
  puts "Scraped #{url}: #{response.code}"
  sleep(rand(1..3)) # Random delay
end

Debugging Header Issues

When working with headers, debugging is often necessary. HTTParty provides several options to help:

# Enable debug output
HTTParty.get('https://httpbin.org/headers', 
  headers: { 'Custom-Header' => 'test-value' },
  debug_output: $stdout
)

# Check what headers were actually sent
response = HTTParty.get('https://httpbin.org/headers', 
  headers: { 
    'User-Agent' => 'TestBot/1.0',
    'Custom-Header' => 'debug-test'
  }
)

# httpbin.org echoes back the headers it received
puts JSON.pretty_generate(JSON.parse(response.body))

Advanced Header Techniques

Conditional Headers

def fetch_with_conditional_headers(url, options = {})
  headers = { 'User-Agent' => 'MyApp/1.0' }

  # Add authorization only if token is provided
  headers['Authorization'] = "Bearer #{options[:token]}" if options[:token]

  # Add custom content type for API requests
  if options[:api_request]
    headers.merge!({
      'Accept' => 'application/json',
      'Content-Type' => 'application/json'
    })
  end

  # Add referer for web scraping
  headers['Referer'] = options[:referer] if options[:referer]

  HTTParty.get(url, headers: headers)
end

Header Validation

class SafeHeaderManager
  ALLOWED_HEADERS = %w[
    Authorization User-Agent Accept Content-Type 
    Accept-Language Accept-Encoding X-API-Key
  ].freeze

  def self.sanitize_headers(headers)
    headers.select { |key, _| ALLOWED_HEADERS.include?(key) }
  end

  def self.safe_request(url, headers)
    clean_headers = sanitize_headers(headers)
    HTTParty.get(url, headers: clean_headers)
  end
end

Best Practices

  1. Always set User-Agent: Many websites block requests without proper User-Agent headers
  2. Use realistic headers: Copy headers from actual browser requests using developer tools
  3. Respect robots.txt: Check website policies before extensive scraping
  4. Implement rate limiting: Avoid overwhelming servers with rapid requests
  5. Handle errors gracefully: Always check response status and handle failures
  6. Keep credentials secure: Use environment variables for API keys and tokens

Error Handling with Custom Headers

def safe_request_with_headers(url, headers)
  begin
    response = HTTParty.get(url, 
      headers: headers,
      timeout: 30,
      follow_redirects: true
    )

    case response.code
    when 200
      response
    when 401
      puts "Authentication failed - check your headers"
      nil
    when 403
      puts "Access forbidden - headers might be detected as bot"
      nil
    when 429
      puts "Rate limited - consider adding delays"
      nil
    else
      puts "Request failed with status: #{response.code}"
      nil
    end
  rescue HTTParty::Error => e
    puts "HTTParty error: #{e.message}"
    nil
  rescue StandardError => e
    puts "General error: #{e.message}"
    nil
  end
end

Conclusion

Setting custom headers with HTTParty is straightforward and essential for successful web scraping and API integration. Whether you're authenticating with APIs, mimicking browser behavior, or implementing anti-detection measures, proper header management will significantly improve your success rate.

Remember to always respect website terms of service, implement appropriate delays between requests, and use headers responsibly. For complex scraping scenarios requiring JavaScript execution, consider complementing HTTParty with tools that can handle browser sessions in Puppeteer or learn how to handle authentication in Puppeteer for more advanced scenarios.

The key to successful header management is understanding your target's requirements, testing thoroughly, and implementing robust error handling to ensure your applications remain reliable and efficient.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon