How do I handle HTTPS websites with custom certificates in Ruby?
When web scraping HTTPS websites with custom certificates, self-signed certificates, or internal Certificate Authorities (CAs), Ruby's default SSL configuration may reject these connections. This comprehensive guide shows you how to properly configure Ruby to handle custom certificates while maintaining security best practices.
Understanding Custom Certificate Scenarios
Custom certificates are commonly encountered in:
- Internal corporate networks with private Certificate Authorities
- Development environments using self-signed certificates
- Staging servers with temporary certificates
- Legacy systems with outdated certificate chains
- API endpoints requiring client certificate authentication
- Load balancers with custom SSL termination
Unlike completely disabling SSL verification, properly handling custom certificates maintains security while enabling access to these protected resources.
Method 1: Using Custom CA Certificate Bundles
The most secure approach is to specify custom Certificate Authority (CA) bundles that include your required certificates.
With Net::HTTP (Built-in Library)
require 'net/http'
require 'openssl'
class CustomCertScraper
def initialize(ca_bundle_path = nil)
@ca_bundle_path = ca_bundle_path
@default_headers = {
'User-Agent' => 'Ruby CustomCert Scraper 1.0',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}
end
def fetch_with_custom_ca(url_string)
url = URI.parse(url_string)
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
# Set custom CA file or use system default
if @ca_bundle_path && File.exist?(@ca_bundle_path)
http.ca_file = @ca_bundle_path
puts "Using custom CA bundle: #{@ca_bundle_path}"
else
# Use system default CA bundle
http.ca_file = OpenSSL::X509::DEFAULT_CERT_FILE
end
# Enable proper certificate verification
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.verify_depth = 5
# Optional: Set SSL version constraints
http.ssl_version = :TLSv1_2
http.ssl_timeout = 30
request = Net::HTTP::Get.new(url.request_uri)
@default_headers.each { |key, value| request[key] = value }
response = http.request(request)
handle_response(response)
rescue OpenSSL::SSL::SSLError => e
puts "SSL Error: #{e.message}"
puts "Certificate verification failed. Check your CA bundle."
nil
rescue => e
puts "Request failed: #{e.message}"
nil
end
private
def handle_response(response)
case response
when Net::HTTPSuccess
response.body
when Net::HTTPRedirection
puts "Redirected to: #{response['location']}"
fetch_with_custom_ca(response['location']) if response['location']
else
puts "HTTP Error: #{response.code} #{response.message}"
nil
end
end
end
# Usage with custom CA bundle
scraper = CustomCertScraper.new('/path/to/custom-ca-bundle.pem')
content = scraper.fetch_with_custom_ca('https://internal.company.com/api/data')
Creating Custom CA Bundles
To create a custom CA bundle, combine your organization's root certificates with system certificates:
# Download your organization's root certificate
curl -o company-root-ca.crt https://internal.company.com/ca/root.crt
# Combine with system CA bundle (on macOS)
cat /etc/ssl/cert.pem company-root-ca.crt > custom-ca-bundle.pem
# On Ubuntu/Debian
cat /etc/ssl/certs/ca-certificates.crt company-root-ca.crt > custom-ca-bundle.pem
Method 2: Client Certificate Authentication
Some HTTPS websites require client certificates for mutual TLS authentication:
require 'net/http'
require 'openssl'
class ClientCertScraper
def initialize(client_cert_path, client_key_path, ca_bundle_path = nil)
@client_cert_path = client_cert_path
@client_key_path = client_key_path
@ca_bundle_path = ca_bundle_path
end
def fetch_with_client_cert(url_string)
url = URI.parse(url_string)
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
# Load client certificate and private key
client_cert = OpenSSL::X509::Certificate.new(File.read(@client_cert_path))
client_key = OpenSSL::PKey::RSA.new(File.read(@client_key_path))
# Configure client certificate authentication
http.cert = client_cert
http.key = client_key
# Set CA bundle if provided
if @ca_bundle_path && File.exist?(@ca_bundle_path)
http.ca_file = @ca_bundle_path
end
# Enable certificate verification
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
request = Net::HTTP::Get.new(url.request_uri)
request['User-Agent'] = 'Ruby ClientCert Scraper'
response = http.request(request)
if response.code.to_i == 200
puts "Client certificate authentication successful"
response.body
else
puts "Authentication failed: #{response.code} #{response.message}"
nil
end
rescue OpenSSL::SSL::SSLError => e
puts "SSL/Client Certificate Error: #{e.message}"
nil
rescue => e
puts "Request Error: #{e.message}"
nil
end
end
# Usage
scraper = ClientCertScraper.new(
'/path/to/client.crt',
'/path/to/client.key',
'/path/to/ca-bundle.pem'
)
content = scraper.fetch_with_client_cert('https://secure-api.company.com/data')
Method 3: HTTParty with Custom Certificates
HTTParty provides a cleaner interface for SSL configuration:
require 'httparty'
class HTTPartyCustomSSL
include HTTParty
def initialize(ssl_options = {})
@ssl_options = {
verify: true,
ca_file: ssl_options[:ca_file],
cert: ssl_options[:cert],
key: ssl_options[:key]
}.compact
# Set default headers
self.class.headers({
'User-Agent' => 'HTTParty Custom SSL Scraper',
'Accept' => 'application/json, text/html'
})
end
def fetch(url, additional_options = {})
options = {
verify: @ssl_options[:verify],
ssl_ca_file: @ssl_options[:ca_file],
timeout: 30
}.merge(additional_options)
# Add client certificate if provided
if @ssl_options[:cert] && @ssl_options[:key]
options[:ssl_cert] = OpenSSL::X509::Certificate.new(File.read(@ssl_options[:cert]))
options[:ssl_key] = OpenSSL::PKey::RSA.new(File.read(@ssl_options[:key]))
end
response = self.class.get(url, options)
if response.success?
response.body
else
puts "Request failed: #{response.code} #{response.message}"
nil
end
rescue => e
puts "HTTParty SSL Error: #{e.message}"
nil
end
end
# Usage with custom CA bundle
ssl_config = {
ca_file: '/path/to/custom-ca-bundle.pem'
}
scraper = HTTPartyCustomSSL.new(ssl_config)
content = scraper.fetch('https://internal-api.company.com/endpoint')
# Usage with client certificate
ssl_config = {
ca_file: '/path/to/ca-bundle.pem',
cert: '/path/to/client.crt',
key: '/path/to/client.key'
}
secure_scraper = HTTPartyCustomSSL.new(ssl_config)
secure_content = secure_scraper.fetch('https://mutual-tls-api.company.com/data')
Method 4: Faraday with Advanced SSL Configuration
Faraday offers the most flexible SSL configuration options:
require 'faraday'
require 'openssl'
class FaradayCustomSSL
def initialize
@connection = Faraday.new do |conn|
# Configure SSL settings
conn.ssl.verify = true
conn.ssl.ca_file = '/path/to/custom-ca-bundle.pem'
# Optional: Configure SSL version and ciphers
conn.ssl.version = :TLSv1_2
conn.ssl.ciphers = 'HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA'
# Enable SNI (Server Name Indication)
conn.ssl.verify_hostname = true
# Configure timeouts
conn.options.timeout = 30
conn.options.open_timeout = 10
# Add response middleware
conn.response :follow_redirects, limit: 3
conn.response :raise_error
# Set headers
conn.headers = {
'User-Agent' => 'Faraday Custom SSL Scraper',
'Accept' => 'text/html,application/json'
}
conn.adapter Faraday.default_adapter
end
end
def fetch(url)
response = @connection.get(url)
response.body
rescue Faraday::SSLError => e
puts "Faraday SSL Error: #{e.message}"
handle_ssl_error(e, url)
rescue Faraday::Error => e
puts "Faraday Error: #{e.message}"
nil
end
def fetch_with_client_cert(url, cert_path, key_path, key_password = nil)
# Create new connection with client certificate
cert_connection = Faraday.new do |conn|
conn.ssl.verify = true
conn.ssl.ca_file = '/path/to/custom-ca-bundle.pem'
# Load client certificate
conn.ssl.client_cert = OpenSSL::X509::Certificate.new(File.read(cert_path))
conn.ssl.client_key = OpenSSL::PKey::RSA.new(File.read(key_path), key_password)
conn.headers = {
'User-Agent' => 'Faraday Client Cert Scraper'
}
conn.adapter Faraday.default_adapter
end
response = cert_connection.get(url)
response.body
rescue => e
puts "Client certificate authentication failed: #{e.message}"
nil
end
private
def handle_ssl_error(error, url)
puts "SSL Error Details:"
puts " URL: #{url}"
puts " Error: #{error.message}"
puts " Suggestions:"
puts " - Verify the CA bundle includes the required certificates"
puts " - Check if the server uses SNI (Server Name Indication)"
puts " - Ensure certificate chain is complete"
nil
end
end
# Usage
scraper = FaradayCustomSSL.new
content = scraper.fetch('https://internal.company.com/api')
# With client certificate
auth_content = scraper.fetch_with_client_cert(
'https://secure.company.com/api',
'/path/to/client.crt',
'/path/to/client.key'
)
Method 5: Environment-Based SSL Configuration
Create flexible SSL configurations that adapt to different environments:
require 'httparty'
class EnvironmentAwareSSLScraper
include HTTParty
def initialize
configure_ssl_for_environment
setup_headers
end
private
def configure_ssl_for_environment
case environment
when 'production'
# Production: Use system CA bundle with strict verification
self.class.default_options.update(
verify: true,
ssl_ca_file: system_ca_bundle_path
)
puts "Using production SSL configuration"
when 'staging'
# Staging: Use custom CA bundle for staging certificates
self.class.default_options.update(
verify: true,
ssl_ca_file: staging_ca_bundle_path
)
puts "Using staging SSL configuration"
when 'development'
# Development: Use custom CA bundle but allow more flexibility
if development_ca_bundle_exists?
self.class.default_options.update(
verify: true,
ssl_ca_file: development_ca_bundle_path
)
puts "Using development SSL configuration with custom CA bundle"
else
puts "Warning: No development CA bundle found, using default system certificates"
end
else
puts "Unknown environment: #{environment}, using default SSL settings"
end
end
def setup_headers
self.class.headers({
'User-Agent' => "SSL Scraper (#{environment})",
'Accept' => 'application/json, text/html',
'Accept-Encoding' => 'gzip, deflate'
})
end
def environment
ENV['RAILS_ENV'] || ENV['RACK_ENV'] || 'development'
end
def system_ca_bundle_path
# Try common system CA bundle locations
candidates = [
'/etc/ssl/certs/ca-certificates.crt', # Ubuntu/Debian
'/etc/ssl/cert.pem', # macOS
'/etc/pki/tls/certs/ca-bundle.crt' # RHEL/CentOS
]
candidates.find { |path| File.exist?(path) } || OpenSSL::X509::DEFAULT_CERT_FILE
end
def staging_ca_bundle_path
'config/ssl/staging-ca-bundle.pem'
end
def development_ca_bundle_path
'config/ssl/development-ca-bundle.pem'
end
def development_ca_bundle_exists?
File.exist?(development_ca_bundle_path)
end
public
def fetch(url)
response = self.class.get(url)
if response.success?
response.body
else
puts "Request failed: #{response.code} #{response.message}"
nil
end
rescue => e
puts "SSL Configuration Error: #{e.message}"
puts "Environment: #{environment}"
puts "Check SSL configuration for your environment"
nil
end
end
# Usage
scraper = EnvironmentAwareSSLScraper.new
content = scraper.fetch('https://api.company.com/data')
Debugging SSL Certificate Issues
When dealing with custom certificates, debugging is crucial. Here's a comprehensive debugging helper:
require 'net/http'
require 'openssl'
class SSLDebugger
def self.analyze_certificate(hostname, port = 443)
puts "=== SSL Certificate Analysis for #{hostname}:#{port} ==="
begin
# Establish SSL connection to get certificate
tcp_socket = TCPSocket.new(hostname, port)
ssl_context = OpenSSL::SSL::SSLContext.new
ssl_context.verify_mode = OpenSSL::SSL::VERIFY_NONE # For debugging only
ssl_socket = OpenSSL::SSL::SSLSocket.new(tcp_socket, ssl_context)
ssl_socket.hostname = hostname # Enable SNI
ssl_socket.connect
cert = ssl_socket.peer_cert
cert_chain = ssl_socket.peer_cert_chain
# Display certificate information
puts "\n--- Certificate Details ---"
puts "Subject: #{cert.subject}"
puts "Issuer: #{cert.issuer}"
puts "Serial: #{cert.serial}"
puts "Not Before: #{cert.not_before}"
puts "Not After: #{cert.not_after}"
puts "Version: #{cert.version}"
# Check if certificate is valid
puts "\n--- Certificate Validation ---"
now = Time.now
if cert.not_before <= now && now <= cert.not_after
puts "✓ Certificate is within valid date range"
else
puts "✗ Certificate is expired or not yet valid"
end
# Display certificate chain
puts "\n--- Certificate Chain ---"
cert_chain.each_with_index do |chain_cert, index|
puts "#{index}: #{chain_cert.subject}"
end
# Display available cipher suites
puts "\n--- SSL Configuration ---"
puts "SSL Version: #{ssl_socket.ssl_version}"
puts "Cipher: #{ssl_socket.cipher[0]}"
ssl_socket.close
tcp_socket.close
rescue => e
puts "Error analyzing certificate: #{e.message}"
end
end
def self.test_custom_ca(hostname, ca_bundle_path, port = 443)
puts "=== Testing Custom CA Bundle ==="
puts "Hostname: #{hostname}:#{port}"
puts "CA Bundle: #{ca_bundle_path}"
unless File.exist?(ca_bundle_path)
puts "✗ CA bundle file not found: #{ca_bundle_path}"
return false
end
begin
http = Net::HTTP.new(hostname, port)
http.use_ssl = true
http.ca_file = ca_bundle_path
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
http.ssl_timeout = 10
request = Net::HTTP::Head.new('/')
response = http.request(request)
puts "✓ SSL connection successful with custom CA bundle"
puts "Response code: #{response.code}"
true
rescue OpenSSL::SSL::SSLError => e
puts "✗ SSL verification failed: #{e.message}"
false
rescue => e
puts "✗ Connection failed: #{e.message}"
false
end
end
end
# Usage for debugging
SSLDebugger.analyze_certificate('internal.company.com')
SSLDebugger.test_custom_ca('internal.company.com', '/path/to/custom-ca-bundle.pem')
Best Practices for Custom Certificate Handling
1. Certificate Validation
Always validate certificates properly rather than disabling verification entirely:
def validate_custom_certificate(cert, expected_hostname)
# Check certificate validity period
now = Time.now
unless cert.not_before <= now && now <= cert.not_after
raise "Certificate expired or not yet valid"
end
# Verify hostname matches certificate
unless OpenSSL::SSL.verify_certificate_identity(cert, expected_hostname)
raise "Certificate hostname mismatch"
end
# Additional custom validations
# Check specific certificate properties your organization requires
true
end
2. Secure Certificate Storage
Store certificates securely and manage them properly:
class SecureCertificateManager
CERT_DIR = 'config/certificates'
def self.load_certificate(cert_name)
cert_path = File.join(CERT_DIR, "#{cert_name}.pem")
unless File.exist?(cert_path)
raise "Certificate not found: #{cert_path}"
end
# Verify file permissions (should not be world-readable)
stat = File.stat(cert_path)
if stat.mode & 0o044 != 0
puts "Warning: Certificate file has loose permissions"
end
OpenSSL::X509::Certificate.new(File.read(cert_path))
end
def self.load_private_key(key_name, password = nil)
key_path = File.join(CERT_DIR, "#{key_name}.key")
unless File.exist?(key_path)
raise "Private key not found: #{key_path}"
end
OpenSSL::PKey::RSA.new(File.read(key_path), password)
end
end
3. Error Recovery and Fallbacks
Implement graceful handling when certificate validation fails:
class ResilientSSLScraper
def initialize
@fallback_strategies = [
:use_custom_ca_bundle,
:use_system_ca_bundle,
:use_mozilla_ca_bundle
]
end
def fetch_with_fallback(url)
@fallback_strategies.each do |strategy|
begin
return send(strategy, url)
rescue OpenSSL::SSL::SSLError => e
puts "Strategy #{strategy} failed: #{e.message}"
next
end
end
puts "All SSL strategies failed for #{url}"
nil
end
private
def use_custom_ca_bundle(url)
# Implementation for custom CA bundle
end
def use_system_ca_bundle(url)
# Implementation for system CA bundle
end
def use_mozilla_ca_bundle(url)
# Implementation using Mozilla's CA bundle
end
end
Summary
Handling HTTPS websites with custom certificates in Ruby requires understanding SSL/TLS configuration and proper certificate management. The key approaches include:
- Custom CA Bundles: The most secure method for handling internal or self-signed certificates
- Client Certificates: For mutual TLS authentication scenarios
- Environment-Specific Configuration: Adapting SSL settings based on deployment environment
- Proper Debugging: Using tools to analyze and troubleshoot certificate issues
When dealing with complex authentication flows that require browser-like behavior, consider complementing your Ruby scraping with tools that can handle authentication processes more naturally. For websites that require sophisticated session management, understanding how browser sessions work can provide valuable insights.
Always prioritize security by validating certificates properly rather than simply disabling SSL verification. Custom certificate handling allows you to maintain strong security while accessing the internal or specialized HTTPS resources your scraping project requires.