How to Handle Cookies with Specific Domain or Path Settings in Mechanize
Managing cookies with specific domain and path settings is crucial for successful web scraping with Mechanize, especially when dealing with complex authentication systems, subdomain navigation, or applications that use path-based session management. This comprehensive guide covers everything you need to know about handling sophisticated cookie configurations in Mechanize.
Understanding Cookie Domain and Path Attributes
Before diving into implementation, it's essential to understand how cookie domains and paths work:
- Domain: Specifies which hosts can receive the cookie. A cookie with
domain=.example.com
can be sent towww.example.com
,api.example.com
, etc. - Path: Defines the URL path that must exist in the requested URL for the cookie to be sent. A cookie with
path=/admin
will only be sent to URLs starting with/admin
.
Basic Cookie Management in Mechanize
Mechanize provides robust cookie handling through its built-in cookie jar functionality:
require 'mechanize'
# Create a new Mechanize agent
agent = Mechanize.new
# Access the cookie jar
cookie_jar = agent.cookie_jar
# Enable verbose cookie logging for debugging
agent.log = Logger.new(STDOUT)
agent.log.level = Logger::DEBUG
Setting Cookies with Specific Domain Settings
Creating Domain-Specific Cookies
require 'mechanize'
agent = Mechanize.new
# Create a cookie with specific domain settings
cookie = Mechanize::Cookie.new(
name: 'session_token',
value: 'abc123xyz789',
domain: '.example.com', # Will work for all subdomains
path: '/',
secure: true,
httponly: true
)
# Add the cookie to the jar
agent.cookie_jar.add(URI('https://www.example.com'), cookie)
# Verify cookie was added
puts "Cookies for example.com:"
agent.cookie_jar.cookies(URI('https://www.example.com')).each do |c|
puts "#{c.name}: #{c.value} (Domain: #{c.domain}, Path: #{c.path})"
end
Handling Subdomain Cookies
# Set up cookies for different subdomains
subdomains = ['www', 'api', 'admin']
subdomains.each do |subdomain|
cookie = Mechanize::Cookie.new(
name: "#{subdomain}_session",
value: "#{subdomain}_token_123",
domain: "#{subdomain}.example.com", # Specific subdomain
path: '/',
expires: Time.now + 3600 # 1 hour from now
)
agent.cookie_jar.add(URI("https://#{subdomain}.example.com"), cookie)
end
# Test cookie availability across subdomains
['www.example.com', 'api.example.com', 'shop.example.com'].each do |host|
uri = URI("https://#{host}")
cookies = agent.cookie_jar.cookies(uri)
puts "#{host}: #{cookies.length} cookies available"
end
Managing Path-Specific Cookies
Setting Cookies for Specific Paths
# Create cookies for different application sections
sections = [
{ path: '/admin', name: 'admin_session', value: 'admin_token_456' },
{ path: '/api/v1', name: 'api_key', value: 'api_secret_789' },
{ path: '/user/profile', name: 'profile_prefs', value: 'theme=dark' }
]
sections.each do |section|
cookie = Mechanize::Cookie.new(
name: section[:name],
value: section[:value],
domain: 'example.com',
path: section[:path],
secure: true
)
agent.cookie_jar.add(URI('https://example.com'), cookie)
end
# Test path-specific cookie behavior
test_urls = [
'https://example.com/',
'https://example.com/admin',
'https://example.com/admin/users',
'https://example.com/api/v1/data',
'https://example.com/user/profile/settings'
]
test_urls.each do |url|
uri = URI(url)
cookies = agent.cookie_jar.cookies(uri)
puts "#{url}: #{cookies.map(&:name).join(', ')}"
end
Advanced Cookie Configuration
Creating Cookies with All Attributes
def create_advanced_cookie(agent, options = {})
defaults = {
name: 'advanced_session',
value: 'secure_token_xyz',
domain: '.example.com',
path: '/',
secure: true,
httponly: true,
expires: Time.now + 86400, # 24 hours
max_age: 86400,
same_site: 'Strict'
}
config = defaults.merge(options)
cookie = Mechanize::Cookie.new(
name: config[:name],
value: config[:value],
domain: config[:domain],
path: config[:path],
secure: config[:secure],
httponly: config[:httponly],
expires: config[:expires]
)
agent.cookie_jar.add(URI("https://#{config[:domain].sub(/^\./, '')}"), cookie)
cookie
end
# Usage examples
agent = Mechanize.new
# Production session cookie
create_advanced_cookie(agent, {
name: 'prod_session',
domain: '.myapp.com',
path: '/',
secure: true
})
# Development cookie with different settings
create_advanced_cookie(agent, {
name: 'dev_session',
domain: 'localhost',
path: '/dev',
secure: false,
expires: Time.now + 3600
})
Extracting and Manipulating Existing Cookies
Reading Cookies from Server Responses
agent = Mechanize.new
# Make a request to get cookies from server
page = agent.get('https://example.com/login')
# Examine all cookies received
agent.cookie_jar.each do |cookie|
puts "Cookie: #{cookie.name}"
puts " Value: #{cookie.value}"
puts " Domain: #{cookie.domain}"
puts " Path: #{cookie.path}"
puts " Secure: #{cookie.secure?}"
puts " HttpOnly: #{cookie.httponly?}"
puts " Expires: #{cookie.expires}"
puts "---"
end
Modifying Existing Cookies
# Find and modify a specific cookie
session_cookie = agent.cookie_jar.find do |cookie|
cookie.name == 'JSESSIONID' && cookie.domain.include?('example.com')
end
if session_cookie
# Create a modified version
modified_cookie = Mechanize::Cookie.new(
name: session_cookie.name,
value: session_cookie.value,
domain: '.example.com', # Change to allow subdomains
path: '/', # Broaden the path
secure: true,
httponly: true,
expires: Time.now + 7200 # Extend expiration
)
# Remove old cookie and add modified one
agent.cookie_jar.delete(session_cookie)
agent.cookie_jar.add(URI('https://example.com'), modified_cookie)
end
Cookie Persistence and Management
Saving and Loading Cookie Files
# Save cookies to file
agent.cookie_jar.save('cookies.txt')
# Load cookies from file in a new session
new_agent = Mechanize.new
new_agent.cookie_jar.load('cookies.txt')
# Custom cookie serialization
def export_cookies_to_json(agent, filename)
cookies_data = agent.cookie_jar.map do |cookie|
{
name: cookie.name,
value: cookie.value,
domain: cookie.domain,
path: cookie.path,
secure: cookie.secure?,
httponly: cookie.httponly?,
expires: cookie.expires&.to_i
}
end
File.write(filename, JSON.pretty_generate(cookies_data))
end
def import_cookies_from_json(agent, filename)
cookies_data = JSON.parse(File.read(filename))
cookies_data.each do |cookie_data|
cookie = Mechanize::Cookie.new(
name: cookie_data['name'],
value: cookie_data['value'],
domain: cookie_data['domain'],
path: cookie_data['path'],
secure: cookie_data['secure'],
httponly: cookie_data['httponly'],
expires: cookie_data['expires'] ? Time.at(cookie_data['expires']) : nil
)
agent.cookie_jar.add(URI("https://#{cookie_data['domain'].sub(/^\./, '')}"), cookie)
end
end
Debugging Cookie Issues
Cookie Inspection and Troubleshooting
def debug_cookie_behavior(agent, url)
uri = URI(url)
puts "Debugging cookies for: #{url}"
puts "Host: #{uri.host}"
puts "Path: #{uri.path}"
puts ""
# Show all cookies in jar
puts "All cookies in jar:"
agent.cookie_jar.each_with_index do |cookie, index|
puts "#{index + 1}. #{cookie.name} = #{cookie.value}"
puts " Domain: #{cookie.domain} | Path: #{cookie.path}"
puts " Secure: #{cookie.secure?} | HttpOnly: #{cookie.httponly?}"
puts ""
end
# Show cookies that would be sent to this URL
applicable_cookies = agent.cookie_jar.cookies(uri)
puts "Cookies that would be sent to #{url}:"
if applicable_cookies.empty?
puts " None"
else
applicable_cookies.each do |cookie|
puts " #{cookie.name} = #{cookie.value}"
end
end
puts "=" * 50
end
# Usage
agent = Mechanize.new
debug_cookie_behavior(agent, 'https://www.example.com/admin/dashboard')
Integration with Authentication Systems
Multi-Domain Authentication Flow
class MultiDomainAuthenticator
def initialize
@agent = Mechanize.new
@agent.user_agent_alias = 'Mac Safari'
end
def authenticate_main_domain(username, password)
# Login to main domain
login_page = @agent.get('https://auth.example.com/login')
login_form = login_page.form_with(action: /login/)
login_form.username = username
login_form.password = password
result = @agent.submit(login_form)
# Extract authentication token from response
auth_token = extract_auth_token(result)
# Set cross-domain authentication cookie
auth_cookie = Mechanize::Cookie.new(
name: 'auth_token',
value: auth_token,
domain: '.example.com', # Available to all subdomains
path: '/',
secure: true,
httponly: true,
expires: Time.now + 3600
)
@agent.cookie_jar.add(URI('https://example.com'), auth_cookie)
end
def access_protected_resource(subdomain, path)
url = "https://#{subdomain}.example.com#{path}"
# When working with complex authentication flows that span multiple domains,
# you might also need to handle browser sessions in Puppeteer for JavaScript-heavy applications
@agent.get(url)
end
private
def extract_auth_token(response)
# Extract token from response headers, body, or cookies
response.body.match(/auth_token['"]:['"]([^'"]+)/)[1]
rescue
nil
end
end
Best Practices and Security Considerations
Secure Cookie Handling
class SecureCookieManager
def initialize(agent)
@agent = agent
end
def create_secure_session_cookie(domain, session_id)
# Always use secure settings for production
cookie = Mechanize::Cookie.new(
name: 'secure_session',
value: encrypt_session_id(session_id),
domain: domain,
path: '/',
secure: true, # Only send over HTTPS
httponly: true, # Prevent XSS attacks
expires: Time.now + 1800 # 30 minutes
)
@agent.cookie_jar.add(URI("https://#{domain}"), cookie)
end
def rotate_session_cookies
# Find existing session cookies
session_cookies = @agent.cookie_jar.select do |cookie|
cookie.name.include?('session')
end
session_cookies.each do |old_cookie|
# Create new cookie with updated value
new_cookie = Mechanize::Cookie.new(
name: old_cookie.name,
value: generate_new_session_value,
domain: old_cookie.domain,
path: old_cookie.path,
secure: old_cookie.secure?,
httponly: old_cookie.httponly?,
expires: Time.now + 1800
)
# Replace old with new
@agent.cookie_jar.delete(old_cookie)
@agent.cookie_jar.add(URI("https://#{old_cookie.domain}"), new_cookie)
end
end
private
def encrypt_session_id(session_id)
# Implement your encryption logic here
Base64.encode64(session_id).strip
end
def generate_new_session_value
SecureRandom.hex(32)
end
end
Conclusion
Handling cookies with specific domain and path settings in Mechanize requires understanding both the HTTP cookie specification and Mechanize's cookie jar implementation. By properly configuring domain and path attributes, you can create robust web scraping applications that maintain session state across complex multi-domain architectures.
Key takeaways for effective cookie management:
- Use domain prefixes (
.example.com
) for subdomain compatibility - Set appropriate path restrictions for security
- Always use secure flags for production environments
- Implement proper cookie persistence for long-running scrapers
- Debug cookie behavior thoroughly when troubleshooting authentication issues
For applications requiring JavaScript execution alongside cookie management, consider complementing Mechanize with tools that can handle browser sessions in Puppeteer for comprehensive web automation solutions.