Table of contents

How do you save and load cookies to maintain session state across script runs?

Maintaining session state across multiple script runs is crucial for web scraping scenarios where you need to stay logged in or preserve user preferences. Mechanize provides several methods to save and load cookies, allowing you to persist authentication sessions and avoid repeated login processes.

Understanding Cookie Persistence in Mechanize

When you run a Mechanize script, cookies are stored in memory and lost when the script terminates. To maintain session state across runs, you need to serialize cookies to disk and reload them in subsequent executions.

Basic Cookie Jar Management

Saving Cookies to a File

The most straightforward approach is to save the entire cookie jar to a file using YAML serialization:

require 'mechanize'
require 'yaml'

# Create a new Mechanize agent
agent = Mechanize.new

# Perform login or other authentication
agent.get('https://example.com/login')
form = agent.page.forms.first
form.username = 'your_username'
form.password = 'your_password'
agent.submit(form)

# Save cookies to file
File.open('cookies.yml', 'w') do |file|
  YAML.dump(agent.cookie_jar, file)
end

puts "Cookies saved successfully"

Loading Cookies from a File

To restore the session in a new script run:

require 'mechanize'
require 'yaml'

# Create a new Mechanize agent
agent = Mechanize.new

# Load cookies from file if it exists
if File.exist?('cookies.yml')
  agent.cookie_jar = YAML.load_file('cookies.yml')
  puts "Cookies loaded successfully"
else
  puts "No cookie file found, starting fresh session"
end

# Now you can access protected pages without re-authenticating
agent.get('https://example.com/dashboard')

Advanced Cookie Management Techniques

Using a Custom Cookie Store Class

For more control over cookie persistence, you can create a custom cookie store:

require 'mechanize'
require 'json'

class PersistentCookieStore
  attr_accessor :file_path

  def initialize(file_path = 'cookies.json')
    @file_path = file_path
  end

  def save_cookies(cookie_jar)
    cookies_data = []
    cookie_jar.each do |cookie|
      cookies_data << {
        name: cookie.name,
        value: cookie.value,
        domain: cookie.domain,
        path: cookie.path,
        expires: cookie.expires&.to_s,
        secure: cookie.secure,
        httponly: cookie.httponly
      }
    end

    File.write(@file_path, JSON.pretty_generate(cookies_data))
  end

  def load_cookies(agent)
    return unless File.exist?(@file_path)

    cookies_data = JSON.parse(File.read(@file_path))
    cookies_data.each do |cookie_data|
      cookie = Mechanize::Cookie.new(
        cookie_data['name'],
        cookie_data['value']
      )
      cookie.domain = cookie_data['domain']
      cookie.path = cookie_data['path']
      cookie.expires = Time.parse(cookie_data['expires']) if cookie_data['expires']
      cookie.secure = cookie_data['secure']
      cookie.httponly = cookie_data['httponly']

      agent.cookie_jar.add(cookie)
    end
  end
end

# Usage example
cookie_store = PersistentCookieStore.new('session_cookies.json')
agent = Mechanize.new

# Load existing cookies
cookie_store.load_cookies(agent)

# Perform web scraping operations
agent.get('https://example.com/protected-page')

# Save cookies before script ends
cookie_store.save_cookies(agent.cookie_jar)

Session Management with Automatic Cookie Persistence

Create a wrapper class that automatically handles cookie persistence:

require 'mechanize'
require 'yaml'

class PersistentMechanizeAgent
  attr_reader :agent

  def initialize(cookie_file = 'mechanize_cookies.yml')
    @cookie_file = cookie_file
    @agent = Mechanize.new
    load_cookies

    # Set up automatic cookie saving on exit
    at_exit { save_cookies }
  end

  def get(url, parameters = [], referer = nil, headers = {})
    result = @agent.get(url, parameters, referer, headers)
    save_cookies # Save after each request
    result
  end

  def post(url, query = {}, headers = {})
    result = @agent.post(url, query, headers)
    save_cookies # Save after each request
    result
  end

  def submit(form, button = nil)
    result = @agent.submit(form, button)
    save_cookies # Save after form submission
    result
  end

  private

  def load_cookies
    if File.exist?(@cookie_file)
      @agent.cookie_jar = YAML.load_file(@cookie_file)
      puts "Loaded cookies from #{@cookie_file}"
    end
  rescue => e
    puts "Error loading cookies: #{e.message}"
    # Continue with empty cookie jar
  end

  def save_cookies
    File.open(@cookie_file, 'w') do |file|
      YAML.dump(@agent.cookie_jar, file)
    end
  rescue => e
    puts "Error saving cookies: #{e.message}"
  end
end

# Usage
scraper = PersistentMechanizeAgent.new('my_session.yml')

# First run - login and save session
scraper.get('https://example.com/login')
form = scraper.agent.page.forms.first
form.username = 'user@example.com'
form.password = 'password123'
scraper.submit(form)

# Subsequent runs will automatically load the saved session
scraper.get('https://example.com/dashboard')

Handling Cookie Expiration and Validation

Checking Cookie Validity

Before using saved cookies, verify they haven't expired:

require 'mechanize'
require 'yaml'

def cookies_valid?(cookie_jar)
  return false if cookie_jar.empty?

  # Check if any critical cookies have expired
  cookie_jar.each do |cookie|
    if cookie.expires && cookie.expires < Time.now
      puts "Cookie #{cookie.name} has expired"
      return false
    end
  end

  true
end

def test_session_validity(agent)
  # Try to access a protected page to verify session
  begin
    response = agent.get('https://example.com/dashboard')
    # Check if we're redirected to login page
    return !response.uri.to_s.include?('/login')
  rescue => e
    puts "Session test failed: #{e.message}"
    return false
  end
end

# Load and validate cookies
agent = Mechanize.new

if File.exist?('cookies.yml')
  agent.cookie_jar = YAML.load_file('cookies.yml')

  if cookies_valid?(agent.cookie_jar) && test_session_validity(agent)
    puts "Valid session restored"
  else
    puts "Session expired or invalid, need to re-authenticate"
    # Clear expired cookies and re-login
    agent.cookie_jar.clear
    # Perform login process here
  end
else
  puts "No saved session found"
end

Working with Different Cookie Formats

Browser Cookie Import/Export

Sometimes you need to import cookies from a browser or export them for use in other tools:

# Import cookies from Netscape format (used by browsers)
def import_netscape_cookies(agent, file_path)
  File.readlines(file_path).each do |line|
    next if line.start_with?('#') || line.strip.empty?

    parts = line.strip.split("\t")
    next unless parts.length >= 7

    domain, flag, path, secure, expires, name, value = parts

    cookie = Mechanize::Cookie.new(name, value)
    cookie.domain = domain
    cookie.path = path
    cookie.secure = secure == 'TRUE'
    cookie.expires = Time.at(expires.to_i) if expires.to_i > 0

    agent.cookie_jar.add(cookie)
  end
end

# Export cookies to Netscape format
def export_netscape_cookies(cookie_jar, file_path)
  File.open(file_path, 'w') do |file|
    file.puts "# Netscape HTTP Cookie File"

    cookie_jar.each do |cookie|
      expires = cookie.expires ? cookie.expires.to_i : 0
      secure = cookie.secure ? 'TRUE' : 'FALSE'

      file.puts [
        cookie.domain,
        'TRUE',
        cookie.path,
        secure,
        expires,
        cookie.name,
        cookie.value
      ].join("\t")
    end
  end
end

Security Considerations

Encrypting Cookie Files

For sensitive applications, encrypt your cookie files:

require 'openssl'
require 'base64'
require 'digest'

class EncryptedCookieStore
  def initialize(password, file_path = 'encrypted_cookies.dat')
    @password = password
    @file_path = file_path
  end

  def save_cookies(cookie_jar)
    data = YAML.dump(cookie_jar)
    encrypted_data = encrypt(data)
    File.write(@file_path, encrypted_data)
  end

  def load_cookies
    return nil unless File.exist?(@file_path)

    encrypted_data = File.read(@file_path)
    decrypted_data = decrypt(encrypted_data)
    YAML.load(decrypted_data)
  rescue => e
    puts "Error loading encrypted cookies: #{e.message}"
    nil
  end

  private

  def encrypt(data)
    cipher = OpenSSL::Cipher.new('AES-256-CBC')
    cipher.encrypt
    cipher.key = Digest::SHA256.digest(@password)
    iv = cipher.random_iv

    encrypted = cipher.update(data) + cipher.final
    Base64.encode64(iv + encrypted)
  end

  def decrypt(encrypted_data)
    data = Base64.decode64(encrypted_data)
    iv = data[0..15]
    encrypted = data[16..-1]

    cipher = OpenSSL::Cipher.new('AES-256-CBC')
    cipher.decrypt
    cipher.key = Digest::SHA256.digest(@password)
    cipher.iv = iv

    cipher.update(encrypted) + cipher.final
  end
end

Setting Secure File Permissions

Protect your cookie files with appropriate permissions:

# Make cookie files readable only by owner
chmod 600 cookies.yml

# Create a secure directory for cookies
mkdir -p ~/.mechanize_cookies
chmod 700 ~/.mechanize_cookies

Integration with Other Tools

When working with web scraping workflows that involve multiple tools, you might need to share cookie data between different libraries. For instance, when combining Mechanize with browser automation tools for handling JavaScript-heavy sites, you can export cookies from Mechanize and import them into browser automation tools.

Converting Cookies Between Tools

# Convert Mechanize cookies to Puppeteer format
def mechanize_to_puppeteer_cookies(cookie_jar)
  cookie_jar.map do |cookie|
    {
      name: cookie.name,
      value: cookie.value,
      domain: cookie.domain,
      path: cookie.path,
      expires: cookie.expires&.to_i,
      httpOnly: cookie.httponly,
      secure: cookie.secure
    }
  end
end

# Save cookies in Puppeteer-compatible format
def save_puppeteer_cookies(cookie_jar, file_path)
  puppeteer_cookies = mechanize_to_puppeteer_cookies(cookie_jar)
  File.write(file_path, JSON.pretty_generate(puppeteer_cookies))
end

Best Practices for Cookie Management

1. File Organization

Store cookie files in a dedicated directory:

# Create cookies directory if it doesn't exist
Dir.mkdir('cookies') unless Dir.exist?('cookies')

# Use descriptive filenames
cookie_file = "cookies/#{website_name}_#{username}.yml"

2. Error Handling

Always include proper error handling:

def safe_load_cookies(agent, file_path)
  return unless File.exist?(file_path)

  begin
    agent.cookie_jar = YAML.load_file(file_path)
    puts "Cookies loaded from #{file_path}"
  rescue Psych::SyntaxError => e
    puts "Invalid YAML in cookie file: #{e.message}"
    File.delete(file_path) # Remove corrupted file
  rescue => e
    puts "Error loading cookies: #{e.message}"
  end
end

3. Cookie Cleanup

Implement cookie cleanup for expired or invalid cookies:

def cleanup_expired_cookies(cookie_jar)
  cookie_jar.select! do |cookie|
    !cookie.expires || cookie.expires > Time.now
  end
end

4. Session Validation

Always validate sessions before proceeding:

def validate_session(agent, test_url)
  begin
    response = agent.get(test_url)
    return response.code == '200' && !response.body.include?('login')
  rescue
    return false
  end
end

Troubleshooting Common Issues

Issue: Cookies Not Persisting

  • Ensure you're saving cookies after authentication
  • Check file permissions in the target directory
  • Verify the cookie file isn't being deleted between runs

Issue: Session Still Invalid After Loading Cookies

  • Test if the target website has additional session validation
  • Check if the website requires specific user-agent strings
  • Verify that all required cookies are being saved

Issue: Permission Errors

# Check if directory is writable
cookie_dir = 'cookies'
if Dir.exist?(cookie_dir) && File.writable?(cookie_dir)
  # Safe to save cookies
else
  puts "Warning: Cannot write to cookie directory"
end

Issue: Cookie Format Corruption

def validate_cookie_file(file_path)
  return false unless File.exist?(file_path)

  begin
    YAML.load_file(file_path)
    true
  rescue Psych::SyntaxError
    puts "Cookie file is corrupted, removing..."
    File.delete(file_path)
    false
  end
end

Performance Considerations

For high-volume scraping operations, consider these optimizations:

Lazy Cookie Loading

Only load cookies when needed:

class LazyMechanizeAgent
  def initialize(cookie_file)
    @cookie_file = cookie_file
    @agent = Mechanize.new
    @cookies_loaded = false
  end

  def get(url, *args)
    load_cookies_if_needed
    @agent.get(url, *args)
  end

  private

  def load_cookies_if_needed
    return if @cookies_loaded

    if File.exist?(@cookie_file)
      @agent.cookie_jar = YAML.load_file(@cookie_file)
    end
    @cookies_loaded = true
  end
end

Batch Cookie Operations

Save cookies in batches rather than after every request:

class BatchCookieSaver
  def initialize(agent, cookie_file, batch_size = 10)
    @agent = agent
    @cookie_file = cookie_file
    @batch_size = batch_size
    @request_count = 0
  end

  def after_request
    @request_count += 1

    if @request_count >= @batch_size
      save_cookies
      @request_count = 0
    end
  end

  def save_cookies
    File.open(@cookie_file, 'w') do |file|
      YAML.dump(@agent.cookie_jar, file)
    end
  end
end

By implementing proper cookie persistence in your Mechanize scripts, you can maintain session state across multiple runs, reduce server load from repeated authentication, and create more efficient web scraping workflows. Remember to handle edge cases like expired cookies and implement appropriate security measures for sensitive cookie data.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon