How do I Set Custom Headers When Making Requests with HTTParty?
HTTParty is a popular Ruby gem that simplifies HTTP requests and makes web scraping and API interactions more straightforward. Setting custom headers is essential for many web scraping scenarios, including authentication, mimicking browser behavior, and accessing APIs that require specific headers.
Basic Header Configuration
The most straightforward way to set custom headers in HTTParty is using the :headers
option in your request methods:
require 'httparty'
response = HTTParty.get('https://api.example.com/data',
headers: {
'User-Agent' => 'MyApp/1.0',
'Authorization' => 'Bearer your-token-here',
'Content-Type' => 'application/json'
}
)
puts response.body
Setting Headers at the Class Level
For applications that make multiple requests with the same headers, you can set default headers at the class level:
class APIClient
include HTTParty
base_uri 'https://api.example.com'
headers({
'User-Agent' => 'MyApp/1.0',
'Accept' => 'application/json',
'Content-Type' => 'application/json'
})
def self.get_user_data(user_id)
get("/users/#{user_id}")
end
def self.create_user(user_data)
post('/users',
body: user_data.to_json,
headers: { 'Authorization' => "Bearer #{ENV['API_TOKEN']}" }
)
end
end
# Usage
user = APIClient.get_user_data(123)
Common Header Use Cases
1. User-Agent Headers for Web Scraping
Setting a realistic User-Agent header is crucial for successful web scraping, as many websites block requests from automated tools:
# Common browser user agents
user_agents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
]
response = HTTParty.get('https://example.com',
headers: {
'User-Agent' => user_agents.sample,
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5',
'Accept-Encoding' => 'gzip, deflate',
'DNT' => '1',
'Connection' => 'keep-alive',
'Upgrade-Insecure-Requests' => '1'
}
)
2. API Authentication Headers
Different APIs require various authentication methods through headers:
# Bearer token authentication
response = HTTParty.get('https://api.example.com/protected',
headers: {
'Authorization' => 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...',
'Accept' => 'application/json'
}
)
# API key authentication
response = HTTParty.get('https://api.example.com/data',
headers: {
'X-API-Key' => 'your-api-key-here',
'X-RapidAPI-Host' => 'example.p.rapidapi.com',
'X-RapidAPI-Key' => 'your-rapidapi-key'
}
)
# Basic authentication (alternative to HTTParty's basic_auth)
require 'base64'
credentials = Base64.encode64("username:password").chomp
response = HTTParty.get('https://api.example.com/data',
headers: {
'Authorization' => "Basic #{credentials}"
}
)
3. Content-Type Headers for POST Requests
When sending data to APIs, proper Content-Type headers are essential:
# JSON data
user_data = { name: 'John Doe', email: 'john@example.com' }
response = HTTParty.post('https://api.example.com/users',
body: user_data.to_json,
headers: {
'Content-Type' => 'application/json',
'Accept' => 'application/json'
}
)
# Form data
form_data = { username: 'johndoe', password: 'secret123' }
response = HTTParty.post('https://example.com/login',
body: form_data,
headers: {
'Content-Type' => 'application/x-www-form-urlencoded',
'User-Agent' => 'Mozilla/5.0 (compatible; MyBot/1.0)'
}
)
# Multipart form data
require 'mime/types'
response = HTTParty.post('https://api.example.com/upload',
body: {
file: File.open('/path/to/file.pdf'),
description: 'Important document'
},
headers: {
'Authorization' => 'Bearer your-token'
# HTTParty automatically sets Content-Type for multipart
}
)
Dynamic Header Management
For more complex scenarios, you might need to set headers dynamically:
class WebScraper
include HTTParty
def initialize(options = {})
@default_headers = {
'User-Agent' => options[:user_agent] || default_user_agent,
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5'
}
end
def fetch_page(url, additional_headers = {})
headers = @default_headers.merge(additional_headers)
self.class.get(url, headers: headers)
end
def fetch_with_auth(url, token)
auth_headers = { 'Authorization' => "Bearer #{token}" }
fetch_page(url, auth_headers)
end
private
def default_user_agent
'Mozilla/5.0 (compatible; WebScraper/1.0; +http://example.com/bot)'
end
end
# Usage
scraper = WebScraper.new
response = scraper.fetch_with_auth('https://api.example.com/data', 'your-token')
Header Rotation for Anti-Detection
To avoid detection during extensive web scraping, you can rotate headers:
class HeaderRotator
BROWSER_HEADERS = [
{
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.9'
},
{
'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-GB,en;q=0.9'
},
{
'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.8'
}
].freeze
def self.random_headers
BROWSER_HEADERS.sample.merge({
'Accept-Encoding' => 'gzip, deflate, br',
'DNT' => ['1', '0'].sample,
'Connection' => 'keep-alive',
'Upgrade-Insecure-Requests' => '1',
'Sec-Fetch-Dest' => 'document',
'Sec-Fetch-Mode' => 'navigate',
'Sec-Fetch-Site' => 'none'
})
end
end
# Usage with rotation
urls = ['https://example1.com', 'https://example2.com', 'https://example3.com']
urls.each do |url|
response = HTTParty.get(url, headers: HeaderRotator.random_headers)
puts "Scraped #{url}: #{response.code}"
sleep(rand(1..3)) # Random delay
end
Debugging Header Issues
When working with headers, debugging is often necessary. HTTParty provides several options to help:
# Enable debug output
HTTParty.get('https://httpbin.org/headers',
headers: { 'Custom-Header' => 'test-value' },
debug_output: $stdout
)
# Check what headers were actually sent
response = HTTParty.get('https://httpbin.org/headers',
headers: {
'User-Agent' => 'TestBot/1.0',
'Custom-Header' => 'debug-test'
}
)
# httpbin.org echoes back the headers it received
puts JSON.pretty_generate(JSON.parse(response.body))
Advanced Header Techniques
Conditional Headers
def fetch_with_conditional_headers(url, options = {})
headers = { 'User-Agent' => 'MyApp/1.0' }
# Add authorization only if token is provided
headers['Authorization'] = "Bearer #{options[:token]}" if options[:token]
# Add custom content type for API requests
if options[:api_request]
headers.merge!({
'Accept' => 'application/json',
'Content-Type' => 'application/json'
})
end
# Add referer for web scraping
headers['Referer'] = options[:referer] if options[:referer]
HTTParty.get(url, headers: headers)
end
Header Validation
class SafeHeaderManager
ALLOWED_HEADERS = %w[
Authorization User-Agent Accept Content-Type
Accept-Language Accept-Encoding X-API-Key
].freeze
def self.sanitize_headers(headers)
headers.select { |key, _| ALLOWED_HEADERS.include?(key) }
end
def self.safe_request(url, headers)
clean_headers = sanitize_headers(headers)
HTTParty.get(url, headers: clean_headers)
end
end
Best Practices
- Always set User-Agent: Many websites block requests without proper User-Agent headers
- Use realistic headers: Copy headers from actual browser requests using developer tools
- Respect robots.txt: Check website policies before extensive scraping
- Implement rate limiting: Avoid overwhelming servers with rapid requests
- Handle errors gracefully: Always check response status and handle failures
- Keep credentials secure: Use environment variables for API keys and tokens
Error Handling with Custom Headers
def safe_request_with_headers(url, headers)
begin
response = HTTParty.get(url,
headers: headers,
timeout: 30,
follow_redirects: true
)
case response.code
when 200
response
when 401
puts "Authentication failed - check your headers"
nil
when 403
puts "Access forbidden - headers might be detected as bot"
nil
when 429
puts "Rate limited - consider adding delays"
nil
else
puts "Request failed with status: #{response.code}"
nil
end
rescue HTTParty::Error => e
puts "HTTParty error: #{e.message}"
nil
rescue StandardError => e
puts "General error: #{e.message}"
nil
end
end
Conclusion
Setting custom headers with HTTParty is straightforward and essential for successful web scraping and API integration. Whether you're authenticating with APIs, mimicking browser behavior, or implementing anti-detection measures, proper header management will significantly improve your success rate.
Remember to always respect website terms of service, implement appropriate delays between requests, and use headers responsibly. For complex scraping scenarios requiring JavaScript execution, consider complementing HTTParty with tools that can handle browser sessions in Puppeteer or learn how to handle authentication in Puppeteer for more advanced scenarios.
The key to successful header management is understanding your target's requirements, testing thoroughly, and implementing robust error handling to ensure your applications remain reliable and efficient.