How do you save and load cookies to maintain session state across script runs?
Maintaining session state across multiple script runs is crucial for web scraping scenarios where you need to stay logged in or preserve user preferences. Mechanize provides several methods to save and load cookies, allowing you to persist authentication sessions and avoid repeated login processes.
Understanding Cookie Persistence in Mechanize
When you run a Mechanize script, cookies are stored in memory and lost when the script terminates. To maintain session state across runs, you need to serialize cookies to disk and reload them in subsequent executions.
Basic Cookie Jar Management
Saving Cookies to a File
The most straightforward approach is to save the entire cookie jar to a file using YAML serialization:
require 'mechanize'
require 'yaml'
# Create a new Mechanize agent
agent = Mechanize.new
# Perform login or other authentication
agent.get('https://example.com/login')
form = agent.page.forms.first
form.username = 'your_username'
form.password = 'your_password'
agent.submit(form)
# Save cookies to file
File.open('cookies.yml', 'w') do |file|
YAML.dump(agent.cookie_jar, file)
end
puts "Cookies saved successfully"
Loading Cookies from a File
To restore the session in a new script run:
require 'mechanize'
require 'yaml'
# Create a new Mechanize agent
agent = Mechanize.new
# Load cookies from file if it exists
if File.exist?('cookies.yml')
agent.cookie_jar = YAML.load_file('cookies.yml')
puts "Cookies loaded successfully"
else
puts "No cookie file found, starting fresh session"
end
# Now you can access protected pages without re-authenticating
agent.get('https://example.com/dashboard')
Advanced Cookie Management Techniques
Using a Custom Cookie Store Class
For more control over cookie persistence, you can create a custom cookie store:
require 'mechanize'
require 'json'
class PersistentCookieStore
attr_accessor :file_path
def initialize(file_path = 'cookies.json')
@file_path = file_path
end
def save_cookies(cookie_jar)
cookies_data = []
cookie_jar.each do |cookie|
cookies_data << {
name: cookie.name,
value: cookie.value,
domain: cookie.domain,
path: cookie.path,
expires: cookie.expires&.to_s,
secure: cookie.secure,
httponly: cookie.httponly
}
end
File.write(@file_path, JSON.pretty_generate(cookies_data))
end
def load_cookies(agent)
return unless File.exist?(@file_path)
cookies_data = JSON.parse(File.read(@file_path))
cookies_data.each do |cookie_data|
cookie = Mechanize::Cookie.new(
cookie_data['name'],
cookie_data['value']
)
cookie.domain = cookie_data['domain']
cookie.path = cookie_data['path']
cookie.expires = Time.parse(cookie_data['expires']) if cookie_data['expires']
cookie.secure = cookie_data['secure']
cookie.httponly = cookie_data['httponly']
agent.cookie_jar.add(cookie)
end
end
end
# Usage example
cookie_store = PersistentCookieStore.new('session_cookies.json')
agent = Mechanize.new
# Load existing cookies
cookie_store.load_cookies(agent)
# Perform web scraping operations
agent.get('https://example.com/protected-page')
# Save cookies before script ends
cookie_store.save_cookies(agent.cookie_jar)
Session Management with Automatic Cookie Persistence
Create a wrapper class that automatically handles cookie persistence:
require 'mechanize'
require 'yaml'
class PersistentMechanizeAgent
attr_reader :agent
def initialize(cookie_file = 'mechanize_cookies.yml')
@cookie_file = cookie_file
@agent = Mechanize.new
load_cookies
# Set up automatic cookie saving on exit
at_exit { save_cookies }
end
def get(url, parameters = [], referer = nil, headers = {})
result = @agent.get(url, parameters, referer, headers)
save_cookies # Save after each request
result
end
def post(url, query = {}, headers = {})
result = @agent.post(url, query, headers)
save_cookies # Save after each request
result
end
def submit(form, button = nil)
result = @agent.submit(form, button)
save_cookies # Save after form submission
result
end
private
def load_cookies
if File.exist?(@cookie_file)
@agent.cookie_jar = YAML.load_file(@cookie_file)
puts "Loaded cookies from #{@cookie_file}"
end
rescue => e
puts "Error loading cookies: #{e.message}"
# Continue with empty cookie jar
end
def save_cookies
File.open(@cookie_file, 'w') do |file|
YAML.dump(@agent.cookie_jar, file)
end
rescue => e
puts "Error saving cookies: #{e.message}"
end
end
# Usage
scraper = PersistentMechanizeAgent.new('my_session.yml')
# First run - login and save session
scraper.get('https://example.com/login')
form = scraper.agent.page.forms.first
form.username = 'user@example.com'
form.password = 'password123'
scraper.submit(form)
# Subsequent runs will automatically load the saved session
scraper.get('https://example.com/dashboard')
Handling Cookie Expiration and Validation
Checking Cookie Validity
Before using saved cookies, verify they haven't expired:
require 'mechanize'
require 'yaml'
def cookies_valid?(cookie_jar)
return false if cookie_jar.empty?
# Check if any critical cookies have expired
cookie_jar.each do |cookie|
if cookie.expires && cookie.expires < Time.now
puts "Cookie #{cookie.name} has expired"
return false
end
end
true
end
def test_session_validity(agent)
# Try to access a protected page to verify session
begin
response = agent.get('https://example.com/dashboard')
# Check if we're redirected to login page
return !response.uri.to_s.include?('/login')
rescue => e
puts "Session test failed: #{e.message}"
return false
end
end
# Load and validate cookies
agent = Mechanize.new
if File.exist?('cookies.yml')
agent.cookie_jar = YAML.load_file('cookies.yml')
if cookies_valid?(agent.cookie_jar) && test_session_validity(agent)
puts "Valid session restored"
else
puts "Session expired or invalid, need to re-authenticate"
# Clear expired cookies and re-login
agent.cookie_jar.clear
# Perform login process here
end
else
puts "No saved session found"
end
Working with Different Cookie Formats
Browser Cookie Import/Export
Sometimes you need to import cookies from a browser or export them for use in other tools:
# Import cookies from Netscape format (used by browsers)
def import_netscape_cookies(agent, file_path)
File.readlines(file_path).each do |line|
next if line.start_with?('#') || line.strip.empty?
parts = line.strip.split("\t")
next unless parts.length >= 7
domain, flag, path, secure, expires, name, value = parts
cookie = Mechanize::Cookie.new(name, value)
cookie.domain = domain
cookie.path = path
cookie.secure = secure == 'TRUE'
cookie.expires = Time.at(expires.to_i) if expires.to_i > 0
agent.cookie_jar.add(cookie)
end
end
# Export cookies to Netscape format
def export_netscape_cookies(cookie_jar, file_path)
File.open(file_path, 'w') do |file|
file.puts "# Netscape HTTP Cookie File"
cookie_jar.each do |cookie|
expires = cookie.expires ? cookie.expires.to_i : 0
secure = cookie.secure ? 'TRUE' : 'FALSE'
file.puts [
cookie.domain,
'TRUE',
cookie.path,
secure,
expires,
cookie.name,
cookie.value
].join("\t")
end
end
end
Security Considerations
Encrypting Cookie Files
For sensitive applications, encrypt your cookie files:
require 'openssl'
require 'base64'
require 'digest'
class EncryptedCookieStore
def initialize(password, file_path = 'encrypted_cookies.dat')
@password = password
@file_path = file_path
end
def save_cookies(cookie_jar)
data = YAML.dump(cookie_jar)
encrypted_data = encrypt(data)
File.write(@file_path, encrypted_data)
end
def load_cookies
return nil unless File.exist?(@file_path)
encrypted_data = File.read(@file_path)
decrypted_data = decrypt(encrypted_data)
YAML.load(decrypted_data)
rescue => e
puts "Error loading encrypted cookies: #{e.message}"
nil
end
private
def encrypt(data)
cipher = OpenSSL::Cipher.new('AES-256-CBC')
cipher.encrypt
cipher.key = Digest::SHA256.digest(@password)
iv = cipher.random_iv
encrypted = cipher.update(data) + cipher.final
Base64.encode64(iv + encrypted)
end
def decrypt(encrypted_data)
data = Base64.decode64(encrypted_data)
iv = data[0..15]
encrypted = data[16..-1]
cipher = OpenSSL::Cipher.new('AES-256-CBC')
cipher.decrypt
cipher.key = Digest::SHA256.digest(@password)
cipher.iv = iv
cipher.update(encrypted) + cipher.final
end
end
Setting Secure File Permissions
Protect your cookie files with appropriate permissions:
# Make cookie files readable only by owner
chmod 600 cookies.yml
# Create a secure directory for cookies
mkdir -p ~/.mechanize_cookies
chmod 700 ~/.mechanize_cookies
Integration with Other Tools
When working with web scraping workflows that involve multiple tools, you might need to share cookie data between different libraries. For instance, when combining Mechanize with browser automation tools for handling JavaScript-heavy sites, you can export cookies from Mechanize and import them into browser automation tools.
Converting Cookies Between Tools
# Convert Mechanize cookies to Puppeteer format
def mechanize_to_puppeteer_cookies(cookie_jar)
cookie_jar.map do |cookie|
{
name: cookie.name,
value: cookie.value,
domain: cookie.domain,
path: cookie.path,
expires: cookie.expires&.to_i,
httpOnly: cookie.httponly,
secure: cookie.secure
}
end
end
# Save cookies in Puppeteer-compatible format
def save_puppeteer_cookies(cookie_jar, file_path)
puppeteer_cookies = mechanize_to_puppeteer_cookies(cookie_jar)
File.write(file_path, JSON.pretty_generate(puppeteer_cookies))
end
Best Practices for Cookie Management
1. File Organization
Store cookie files in a dedicated directory:
# Create cookies directory if it doesn't exist
Dir.mkdir('cookies') unless Dir.exist?('cookies')
# Use descriptive filenames
cookie_file = "cookies/#{website_name}_#{username}.yml"
2. Error Handling
Always include proper error handling:
def safe_load_cookies(agent, file_path)
return unless File.exist?(file_path)
begin
agent.cookie_jar = YAML.load_file(file_path)
puts "Cookies loaded from #{file_path}"
rescue Psych::SyntaxError => e
puts "Invalid YAML in cookie file: #{e.message}"
File.delete(file_path) # Remove corrupted file
rescue => e
puts "Error loading cookies: #{e.message}"
end
end
3. Cookie Cleanup
Implement cookie cleanup for expired or invalid cookies:
def cleanup_expired_cookies(cookie_jar)
cookie_jar.select! do |cookie|
!cookie.expires || cookie.expires > Time.now
end
end
4. Session Validation
Always validate sessions before proceeding:
def validate_session(agent, test_url)
begin
response = agent.get(test_url)
return response.code == '200' && !response.body.include?('login')
rescue
return false
end
end
Troubleshooting Common Issues
Issue: Cookies Not Persisting
- Ensure you're saving cookies after authentication
- Check file permissions in the target directory
- Verify the cookie file isn't being deleted between runs
Issue: Session Still Invalid After Loading Cookies
- Test if the target website has additional session validation
- Check if the website requires specific user-agent strings
- Verify that all required cookies are being saved
Issue: Permission Errors
# Check if directory is writable
cookie_dir = 'cookies'
if Dir.exist?(cookie_dir) && File.writable?(cookie_dir)
# Safe to save cookies
else
puts "Warning: Cannot write to cookie directory"
end
Issue: Cookie Format Corruption
def validate_cookie_file(file_path)
return false unless File.exist?(file_path)
begin
YAML.load_file(file_path)
true
rescue Psych::SyntaxError
puts "Cookie file is corrupted, removing..."
File.delete(file_path)
false
end
end
Performance Considerations
For high-volume scraping operations, consider these optimizations:
Lazy Cookie Loading
Only load cookies when needed:
class LazyMechanizeAgent
def initialize(cookie_file)
@cookie_file = cookie_file
@agent = Mechanize.new
@cookies_loaded = false
end
def get(url, *args)
load_cookies_if_needed
@agent.get(url, *args)
end
private
def load_cookies_if_needed
return if @cookies_loaded
if File.exist?(@cookie_file)
@agent.cookie_jar = YAML.load_file(@cookie_file)
end
@cookies_loaded = true
end
end
Batch Cookie Operations
Save cookies in batches rather than after every request:
class BatchCookieSaver
def initialize(agent, cookie_file, batch_size = 10)
@agent = agent
@cookie_file = cookie_file
@batch_size = batch_size
@request_count = 0
end
def after_request
@request_count += 1
if @request_count >= @batch_size
save_cookies
@request_count = 0
end
end
def save_cookies
File.open(@cookie_file, 'w') do |file|
YAML.dump(@agent.cookie_jar, file)
end
end
end
By implementing proper cookie persistence in your Mechanize scripts, you can maintain session state across multiple runs, reduce server load from repeated authentication, and create more efficient web scraping workflows. Remember to handle edge cases like expired cookies and implement appropriate security measures for sensitive cookie data.