How to handle authentication in Puppeteer?

Authentication in Puppeteer can be handled through several methods depending on the authentication type used by the target website. Here are the most common approaches:

1. HTTP Authentication

Use page.authenticate() for basic HTTP authentication (when the browser shows a login dialog):

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Set HTTP authentication credentials
  await page.authenticate({
    username: 'your-username',
    password: 'your-password',
  });

  await page.goto('https://example.com/protected');
  await browser.close();
})();

2. Form-Based Authentication

For websites with login forms, use form interaction methods:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  // Navigate to login page
  await page.goto('https://example.com/login');

  // Wait for form elements to load
  await page.waitForSelector('#username');
  await page.waitForSelector('#password');

  // Fill in credentials
  await page.type('#username', 'your-username');
  await page.type('#password', 'your-password');

  // Submit form and wait for navigation
  await Promise.all([
    page.waitForNavigation(),
    page.click('#login-button')
  ]);

  // Now access protected content
  await page.goto('https://example.com/dashboard');

  await browser.close();
})();

3. Cookie-Based Authentication

If you have authentication cookies, set them before navigating:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Set authentication cookies
  await page.setCookie({
    name: 'session_token',
    value: 'your-session-token',
    domain: 'example.com',
    httpOnly: true,
    secure: true
  });

  await page.goto('https://example.com/protected');
  await browser.close();
})();

4. Token-Based Authentication

For APIs or applications using bearer tokens, set headers:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Set authorization header
  await page.setExtraHTTPHeaders({
    'Authorization': 'Bearer your-jwt-token'
  });

  await page.goto('https://api.example.com/protected');
  await browser.close();
})();

5. Handling Two-Factor Authentication

For 2FA, you might need to handle additional input fields:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({ headless: false });
  const page = await browser.newPage();

  await page.goto('https://example.com/login');

  // Initial login
  await page.type('#username', 'your-username');
  await page.type('#password', 'your-password');
  await page.click('#login-button');

  // Wait for 2FA prompt
  await page.waitForSelector('#two-factor-code', { timeout: 30000 });

  // Enter 2FA code (you'd need to implement code retrieval)
  const twoFactorCode = await getTwoFactorCode(); // Custom function
  await page.type('#two-factor-code', twoFactorCode);
  await page.click('#verify-button');

  await page.waitForNavigation();
  await browser.close();
})();

6. Session Persistence

To maintain authentication across multiple runs, save and restore cookies:

const puppeteer = require('puppeteer');
const fs = require('fs').promises;

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Try to load saved cookies
  try {
    const cookies = JSON.parse(await fs.readFile('cookies.json'));
    await page.setCookie(...cookies);
  } catch (error) {
    console.log('No saved cookies found, will login');
  }

  await page.goto('https://example.com/login');

  // Check if already logged in
  try {
    await page.waitForSelector('#dashboard', { timeout: 3000 });
    console.log('Already logged in');
  } catch {
    // Perform login
    await page.type('#username', 'your-username');
    await page.type('#password', 'your-password');
    await page.click('#login-button');
    await page.waitForNavigation();

    // Save cookies for next time
    const cookies = await page.cookies();
    await fs.writeFile('cookies.json', JSON.stringify(cookies));
  }

  await browser.close();
})();

Best Practices

  1. Wait for elements: Always use waitForSelector() before interacting with form elements
  2. Handle errors: Wrap authentication in try-catch blocks to handle failures gracefully
  3. Respect rate limits: Add delays between attempts to avoid being blocked
  4. Use headless mode carefully: For debugging, run with headless: false to see what's happening
  5. Secure credentials: Never hardcode credentials; use environment variables or secure storage

Security Considerations

  • Store credentials securely using environment variables
  • Use HTTPS whenever possible
  • Implement proper error handling to avoid credential leakage
  • Respect website terms of service and rate limits
  • Consider using official APIs instead of scraping when available

Remember that web scraping should always comply with the website's robots.txt file and terms of service.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon