Authentication in Puppeteer can be handled through several methods depending on the authentication type used by the target website. Here are the most common approaches:
1. HTTP Authentication
Use page.authenticate()
for basic HTTP authentication (when the browser shows a login dialog):
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set HTTP authentication credentials
await page.authenticate({
username: 'your-username',
password: 'your-password',
});
await page.goto('https://example.com/protected');
await browser.close();
})();
2. Form-Based Authentication
For websites with login forms, use form interaction methods:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
// Navigate to login page
await page.goto('https://example.com/login');
// Wait for form elements to load
await page.waitForSelector('#username');
await page.waitForSelector('#password');
// Fill in credentials
await page.type('#username', 'your-username');
await page.type('#password', 'your-password');
// Submit form and wait for navigation
await Promise.all([
page.waitForNavigation(),
page.click('#login-button')
]);
// Now access protected content
await page.goto('https://example.com/dashboard');
await browser.close();
})();
3. Cookie-Based Authentication
If you have authentication cookies, set them before navigating:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set authentication cookies
await page.setCookie({
name: 'session_token',
value: 'your-session-token',
domain: 'example.com',
httpOnly: true,
secure: true
});
await page.goto('https://example.com/protected');
await browser.close();
})();
4. Token-Based Authentication
For APIs or applications using bearer tokens, set headers:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Set authorization header
await page.setExtraHTTPHeaders({
'Authorization': 'Bearer your-jwt-token'
});
await page.goto('https://api.example.com/protected');
await browser.close();
})();
5. Handling Two-Factor Authentication
For 2FA, you might need to handle additional input fields:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com/login');
// Initial login
await page.type('#username', 'your-username');
await page.type('#password', 'your-password');
await page.click('#login-button');
// Wait for 2FA prompt
await page.waitForSelector('#two-factor-code', { timeout: 30000 });
// Enter 2FA code (you'd need to implement code retrieval)
const twoFactorCode = await getTwoFactorCode(); // Custom function
await page.type('#two-factor-code', twoFactorCode);
await page.click('#verify-button');
await page.waitForNavigation();
await browser.close();
})();
6. Session Persistence
To maintain authentication across multiple runs, save and restore cookies:
const puppeteer = require('puppeteer');
const fs = require('fs').promises;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Try to load saved cookies
try {
const cookies = JSON.parse(await fs.readFile('cookies.json'));
await page.setCookie(...cookies);
} catch (error) {
console.log('No saved cookies found, will login');
}
await page.goto('https://example.com/login');
// Check if already logged in
try {
await page.waitForSelector('#dashboard', { timeout: 3000 });
console.log('Already logged in');
} catch {
// Perform login
await page.type('#username', 'your-username');
await page.type('#password', 'your-password');
await page.click('#login-button');
await page.waitForNavigation();
// Save cookies for next time
const cookies = await page.cookies();
await fs.writeFile('cookies.json', JSON.stringify(cookies));
}
await browser.close();
})();
Best Practices
- Wait for elements: Always use
waitForSelector()
before interacting with form elements - Handle errors: Wrap authentication in try-catch blocks to handle failures gracefully
- Respect rate limits: Add delays between attempts to avoid being blocked
- Use headless mode carefully: For debugging, run with
headless: false
to see what's happening - Secure credentials: Never hardcode credentials; use environment variables or secure storage
Security Considerations
- Store credentials securely using environment variables
- Use HTTPS whenever possible
- Implement proper error handling to avoid credential leakage
- Respect website terms of service and rate limits
- Consider using official APIs instead of scraping when available
Remember that web scraping should always comply with the website's robots.txt file and terms of service.