Table of contents

How do I handle HTTPS websites with custom certificates in Ruby?

When web scraping HTTPS websites with custom certificates, self-signed certificates, or internal Certificate Authorities (CAs), Ruby's default SSL configuration may reject these connections. This comprehensive guide shows you how to properly configure Ruby to handle custom certificates while maintaining security best practices.

Understanding Custom Certificate Scenarios

Custom certificates are commonly encountered in: - Internal corporate networks with private Certificate Authorities - Development environments using self-signed certificates
- Staging servers with temporary certificates - Legacy systems with outdated certificate chains - API endpoints requiring client certificate authentication - Load balancers with custom SSL termination

Unlike completely disabling SSL verification, properly handling custom certificates maintains security while enabling access to these protected resources.

Method 1: Using Custom CA Certificate Bundles

The most secure approach is to specify custom Certificate Authority (CA) bundles that include your required certificates.

With Net::HTTP (Built-in Library)

require 'net/http'
require 'openssl'

class CustomCertScraper
  def initialize(ca_bundle_path = nil)
    @ca_bundle_path = ca_bundle_path
    @default_headers = {
      'User-Agent' => 'Ruby CustomCert Scraper 1.0',
      'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
    }
  end

  def fetch_with_custom_ca(url_string)
    url = URI.parse(url_string)

    http = Net::HTTP.new(url.host, url.port)
    http.use_ssl = true

    # Set custom CA file or use system default
    if @ca_bundle_path && File.exist?(@ca_bundle_path)
      http.ca_file = @ca_bundle_path
      puts "Using custom CA bundle: #{@ca_bundle_path}"
    else
      # Use system default CA bundle
      http.ca_file = OpenSSL::X509::DEFAULT_CERT_FILE
    end

    # Enable proper certificate verification
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
    http.verify_depth = 5

    # Optional: Set SSL version constraints
    http.ssl_version = :TLSv1_2
    http.ssl_timeout = 30

    request = Net::HTTP::Get.new(url.request_uri)
    @default_headers.each { |key, value| request[key] = value }

    response = http.request(request)
    handle_response(response)

  rescue OpenSSL::SSL::SSLError => e
    puts "SSL Error: #{e.message}"
    puts "Certificate verification failed. Check your CA bundle."
    nil
  rescue => e
    puts "Request failed: #{e.message}"
    nil
  end

  private

  def handle_response(response)
    case response
    when Net::HTTPSuccess
      response.body
    when Net::HTTPRedirection
      puts "Redirected to: #{response['location']}"
      fetch_with_custom_ca(response['location']) if response['location']
    else
      puts "HTTP Error: #{response.code} #{response.message}"
      nil
    end
  end
end

# Usage with custom CA bundle
scraper = CustomCertScraper.new('/path/to/custom-ca-bundle.pem')
content = scraper.fetch_with_custom_ca('https://internal.company.com/api/data')

Creating Custom CA Bundles

To create a custom CA bundle, combine your organization's root certificates with system certificates:

# Download your organization's root certificate
curl -o company-root-ca.crt https://internal.company.com/ca/root.crt

# Combine with system CA bundle (on macOS)
cat /etc/ssl/cert.pem company-root-ca.crt > custom-ca-bundle.pem

# On Ubuntu/Debian
cat /etc/ssl/certs/ca-certificates.crt company-root-ca.crt > custom-ca-bundle.pem

Method 2: Client Certificate Authentication

Some HTTPS websites require client certificates for mutual TLS authentication:

require 'net/http'
require 'openssl'

class ClientCertScraper
  def initialize(client_cert_path, client_key_path, ca_bundle_path = nil)
    @client_cert_path = client_cert_path
    @client_key_path = client_key_path
    @ca_bundle_path = ca_bundle_path
  end

  def fetch_with_client_cert(url_string)
    url = URI.parse(url_string)

    http = Net::HTTP.new(url.host, url.port)
    http.use_ssl = true

    # Load client certificate and private key
    client_cert = OpenSSL::X509::Certificate.new(File.read(@client_cert_path))
    client_key = OpenSSL::PKey::RSA.new(File.read(@client_key_path))

    # Configure client certificate authentication
    http.cert = client_cert
    http.key = client_key

    # Set CA bundle if provided
    if @ca_bundle_path && File.exist?(@ca_bundle_path)
      http.ca_file = @ca_bundle_path
    end

    # Enable certificate verification
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER

    request = Net::HTTP::Get.new(url.request_uri)
    request['User-Agent'] = 'Ruby ClientCert Scraper'

    response = http.request(request)

    if response.code.to_i == 200
      puts "Client certificate authentication successful"
      response.body
    else
      puts "Authentication failed: #{response.code} #{response.message}"
      nil
    end

  rescue OpenSSL::SSL::SSLError => e
    puts "SSL/Client Certificate Error: #{e.message}"
    nil
  rescue => e
    puts "Request Error: #{e.message}"
    nil
  end
end

# Usage
scraper = ClientCertScraper.new(
  '/path/to/client.crt',
  '/path/to/client.key', 
  '/path/to/ca-bundle.pem'
)
content = scraper.fetch_with_client_cert('https://secure-api.company.com/data')

Method 3: HTTParty with Custom Certificates

HTTParty provides a cleaner interface for SSL configuration:

require 'httparty'

class HTTPartyCustomSSL
  include HTTParty

  def initialize(ssl_options = {})
    @ssl_options = {
      verify: true,
      ca_file: ssl_options[:ca_file],
      cert: ssl_options[:cert],
      key: ssl_options[:key]
    }.compact

    # Set default headers
    self.class.headers({
      'User-Agent' => 'HTTParty Custom SSL Scraper',
      'Accept' => 'application/json, text/html'
    })
  end

  def fetch(url, additional_options = {})
    options = {
      verify: @ssl_options[:verify],
      ssl_ca_file: @ssl_options[:ca_file],
      timeout: 30
    }.merge(additional_options)

    # Add client certificate if provided
    if @ssl_options[:cert] && @ssl_options[:key]
      options[:ssl_cert] = OpenSSL::X509::Certificate.new(File.read(@ssl_options[:cert]))
      options[:ssl_key] = OpenSSL::PKey::RSA.new(File.read(@ssl_options[:key]))
    end

    response = self.class.get(url, options)

    if response.success?
      response.body
    else
      puts "Request failed: #{response.code} #{response.message}"
      nil
    end

  rescue => e
    puts "HTTParty SSL Error: #{e.message}"
    nil
  end
end

# Usage with custom CA bundle
ssl_config = {
  ca_file: '/path/to/custom-ca-bundle.pem'
}
scraper = HTTPartyCustomSSL.new(ssl_config)
content = scraper.fetch('https://internal-api.company.com/endpoint')

# Usage with client certificate
ssl_config = {
  ca_file: '/path/to/ca-bundle.pem',
  cert: '/path/to/client.crt',
  key: '/path/to/client.key'
}
secure_scraper = HTTPartyCustomSSL.new(ssl_config)
secure_content = secure_scraper.fetch('https://mutual-tls-api.company.com/data')

Method 4: Faraday with Advanced SSL Configuration

Faraday offers the most flexible SSL configuration options:

require 'faraday'
require 'openssl'

class FaradayCustomSSL
  def initialize
    @connection = Faraday.new do |conn|
      # Configure SSL settings
      conn.ssl.verify = true
      conn.ssl.ca_file = '/path/to/custom-ca-bundle.pem'

      # Optional: Configure SSL version and ciphers
      conn.ssl.version = :TLSv1_2
      conn.ssl.ciphers = 'HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA'

      # Enable SNI (Server Name Indication)
      conn.ssl.verify_hostname = true

      # Configure timeouts
      conn.options.timeout = 30
      conn.options.open_timeout = 10

      # Add response middleware
      conn.response :follow_redirects, limit: 3
      conn.response :raise_error

      # Set headers
      conn.headers = {
        'User-Agent' => 'Faraday Custom SSL Scraper',
        'Accept' => 'text/html,application/json'
      }

      conn.adapter Faraday.default_adapter
    end
  end

  def fetch(url)
    response = @connection.get(url)
    response.body
  rescue Faraday::SSLError => e
    puts "Faraday SSL Error: #{e.message}"
    handle_ssl_error(e, url)
  rescue Faraday::Error => e
    puts "Faraday Error: #{e.message}"
    nil
  end

  def fetch_with_client_cert(url, cert_path, key_path, key_password = nil)
    # Create new connection with client certificate
    cert_connection = Faraday.new do |conn|
      conn.ssl.verify = true
      conn.ssl.ca_file = '/path/to/custom-ca-bundle.pem'

      # Load client certificate
      conn.ssl.client_cert = OpenSSL::X509::Certificate.new(File.read(cert_path))
      conn.ssl.client_key = OpenSSL::PKey::RSA.new(File.read(key_path), key_password)

      conn.headers = {
        'User-Agent' => 'Faraday Client Cert Scraper'
      }

      conn.adapter Faraday.default_adapter
    end

    response = cert_connection.get(url)
    response.body
  rescue => e
    puts "Client certificate authentication failed: #{e.message}"
    nil
  end

  private

  def handle_ssl_error(error, url)
    puts "SSL Error Details:"
    puts "  URL: #{url}"
    puts "  Error: #{error.message}"
    puts "  Suggestions:"
    puts "  - Verify the CA bundle includes the required certificates"
    puts "  - Check if the server uses SNI (Server Name Indication)"
    puts "  - Ensure certificate chain is complete"
    nil
  end
end

# Usage
scraper = FaradayCustomSSL.new
content = scraper.fetch('https://internal.company.com/api')

# With client certificate
auth_content = scraper.fetch_with_client_cert(
  'https://secure.company.com/api',
  '/path/to/client.crt',
  '/path/to/client.key'
)

Method 5: Environment-Based SSL Configuration

Create flexible SSL configurations that adapt to different environments:

require 'httparty'

class EnvironmentAwareSSLScraper
  include HTTParty

  def initialize
    configure_ssl_for_environment
    setup_headers
  end

  private

  def configure_ssl_for_environment
    case environment
    when 'production'
      # Production: Use system CA bundle with strict verification
      self.class.default_options.update(
        verify: true,
        ssl_ca_file: system_ca_bundle_path
      )
      puts "Using production SSL configuration"

    when 'staging'
      # Staging: Use custom CA bundle for staging certificates
      self.class.default_options.update(
        verify: true,
        ssl_ca_file: staging_ca_bundle_path
      )
      puts "Using staging SSL configuration"

    when 'development'
      # Development: Use custom CA bundle but allow more flexibility
      if development_ca_bundle_exists?
        self.class.default_options.update(
          verify: true,
          ssl_ca_file: development_ca_bundle_path
        )
        puts "Using development SSL configuration with custom CA bundle"
      else
        puts "Warning: No development CA bundle found, using default system certificates"
      end

    else
      puts "Unknown environment: #{environment}, using default SSL settings"
    end
  end

  def setup_headers
    self.class.headers({
      'User-Agent' => "SSL Scraper (#{environment})",
      'Accept' => 'application/json, text/html',
      'Accept-Encoding' => 'gzip, deflate'
    })
  end

  def environment
    ENV['RAILS_ENV'] || ENV['RACK_ENV'] || 'development'
  end

  def system_ca_bundle_path
    # Try common system CA bundle locations
    candidates = [
      '/etc/ssl/certs/ca-certificates.crt',  # Ubuntu/Debian
      '/etc/ssl/cert.pem',                   # macOS
      '/etc/pki/tls/certs/ca-bundle.crt'    # RHEL/CentOS
    ]

    candidates.find { |path| File.exist?(path) } || OpenSSL::X509::DEFAULT_CERT_FILE
  end

  def staging_ca_bundle_path
    'config/ssl/staging-ca-bundle.pem'
  end

  def development_ca_bundle_path
    'config/ssl/development-ca-bundle.pem'
  end

  def development_ca_bundle_exists?
    File.exist?(development_ca_bundle_path)
  end

  public

  def fetch(url)
    response = self.class.get(url)

    if response.success?
      response.body
    else
      puts "Request failed: #{response.code} #{response.message}"
      nil
    end
  rescue => e
    puts "SSL Configuration Error: #{e.message}"
    puts "Environment: #{environment}"
    puts "Check SSL configuration for your environment"
    nil
  end
end

# Usage
scraper = EnvironmentAwareSSLScraper.new
content = scraper.fetch('https://api.company.com/data')

Debugging SSL Certificate Issues

When dealing with custom certificates, debugging is crucial. Here's a comprehensive debugging helper:

require 'net/http'
require 'openssl'

class SSLDebugger
  def self.analyze_certificate(hostname, port = 443)
    puts "=== SSL Certificate Analysis for #{hostname}:#{port} ==="

    begin
      # Establish SSL connection to get certificate
      tcp_socket = TCPSocket.new(hostname, port)
      ssl_context = OpenSSL::SSL::SSLContext.new
      ssl_context.verify_mode = OpenSSL::SSL::VERIFY_NONE  # For debugging only

      ssl_socket = OpenSSL::SSL::SSLSocket.new(tcp_socket, ssl_context)
      ssl_socket.hostname = hostname  # Enable SNI
      ssl_socket.connect

      cert = ssl_socket.peer_cert
      cert_chain = ssl_socket.peer_cert_chain

      # Display certificate information
      puts "\n--- Certificate Details ---"
      puts "Subject: #{cert.subject}"
      puts "Issuer: #{cert.issuer}"
      puts "Serial: #{cert.serial}"
      puts "Not Before: #{cert.not_before}"
      puts "Not After: #{cert.not_after}"
      puts "Version: #{cert.version}"

      # Check if certificate is valid
      puts "\n--- Certificate Validation ---"
      now = Time.now
      if cert.not_before <= now && now <= cert.not_after
        puts "✓ Certificate is within valid date range"
      else
        puts "✗ Certificate is expired or not yet valid"
      end

      # Display certificate chain
      puts "\n--- Certificate Chain ---"
      cert_chain.each_with_index do |chain_cert, index|
        puts "#{index}: #{chain_cert.subject}"
      end

      # Display available cipher suites
      puts "\n--- SSL Configuration ---"
      puts "SSL Version: #{ssl_socket.ssl_version}"
      puts "Cipher: #{ssl_socket.cipher[0]}"

      ssl_socket.close
      tcp_socket.close

    rescue => e
      puts "Error analyzing certificate: #{e.message}"
    end
  end

  def self.test_custom_ca(hostname, ca_bundle_path, port = 443)
    puts "=== Testing Custom CA Bundle ==="
    puts "Hostname: #{hostname}:#{port}"
    puts "CA Bundle: #{ca_bundle_path}"

    unless File.exist?(ca_bundle_path)
      puts "✗ CA bundle file not found: #{ca_bundle_path}"
      return false
    end

    begin
      http = Net::HTTP.new(hostname, port)
      http.use_ssl = true
      http.ca_file = ca_bundle_path
      http.verify_mode = OpenSSL::SSL::VERIFY_PEER
      http.ssl_timeout = 10

      request = Net::HTTP::Head.new('/')
      response = http.request(request)

      puts "✓ SSL connection successful with custom CA bundle"
      puts "Response code: #{response.code}"
      true

    rescue OpenSSL::SSL::SSLError => e
      puts "✗ SSL verification failed: #{e.message}"
      false
    rescue => e
      puts "✗ Connection failed: #{e.message}"
      false
    end
  end
end

# Usage for debugging
SSLDebugger.analyze_certificate('internal.company.com')
SSLDebugger.test_custom_ca('internal.company.com', '/path/to/custom-ca-bundle.pem')

Best Practices for Custom Certificate Handling

1. Certificate Validation

Always validate certificates properly rather than disabling verification entirely:

def validate_custom_certificate(cert, expected_hostname)
  # Check certificate validity period
  now = Time.now
  unless cert.not_before <= now && now <= cert.not_after
    raise "Certificate expired or not yet valid"
  end

  # Verify hostname matches certificate
  unless OpenSSL::SSL.verify_certificate_identity(cert, expected_hostname)
    raise "Certificate hostname mismatch"
  end

  # Additional custom validations
  # Check specific certificate properties your organization requires
  true
end

2. Secure Certificate Storage

Store certificates securely and manage them properly:

class SecureCertificateManager
  CERT_DIR = 'config/certificates'

  def self.load_certificate(cert_name)
    cert_path = File.join(CERT_DIR, "#{cert_name}.pem")

    unless File.exist?(cert_path)
      raise "Certificate not found: #{cert_path}"
    end

    # Verify file permissions (should not be world-readable)
    stat = File.stat(cert_path)
    if stat.mode & 0o044 != 0
      puts "Warning: Certificate file has loose permissions"
    end

    OpenSSL::X509::Certificate.new(File.read(cert_path))
  end

  def self.load_private_key(key_name, password = nil)
    key_path = File.join(CERT_DIR, "#{key_name}.key")

    unless File.exist?(key_path)
      raise "Private key not found: #{key_path}"
    end

    OpenSSL::PKey::RSA.new(File.read(key_path), password)
  end
end

3. Error Recovery and Fallbacks

Implement graceful handling when certificate validation fails:

class ResilientSSLScraper
  def initialize
    @fallback_strategies = [
      :use_custom_ca_bundle,
      :use_system_ca_bundle,
      :use_mozilla_ca_bundle
    ]
  end

  def fetch_with_fallback(url)
    @fallback_strategies.each do |strategy|
      begin
        return send(strategy, url)
      rescue OpenSSL::SSL::SSLError => e
        puts "Strategy #{strategy} failed: #{e.message}"
        next
      end
    end

    puts "All SSL strategies failed for #{url}"
    nil
  end

  private

  def use_custom_ca_bundle(url)
    # Implementation for custom CA bundle
  end

  def use_system_ca_bundle(url)
    # Implementation for system CA bundle
  end

  def use_mozilla_ca_bundle(url)
    # Implementation using Mozilla's CA bundle
  end
end

Summary

Handling HTTPS websites with custom certificates in Ruby requires understanding SSL/TLS configuration and proper certificate management. The key approaches include:

  1. Custom CA Bundles: The most secure method for handling internal or self-signed certificates
  2. Client Certificates: For mutual TLS authentication scenarios
  3. Environment-Specific Configuration: Adapting SSL settings based on deployment environment
  4. Proper Debugging: Using tools to analyze and troubleshoot certificate issues

When dealing with complex authentication flows that require browser-like behavior, consider complementing your Ruby scraping with tools that can handle authentication processes more naturally. For websites that require sophisticated session management, understanding how browser sessions work can provide valuable insights.

Always prioritize security by validating certificates properly rather than simply disabling SSL verification. Custom certificate handling allows you to maintain strong security while accessing the internal or specialized HTTPS resources your scraping project requires.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon