How to deal with CAPTCHAs in Puppeteer?

Dealing with CAPTCHAs in Puppeteer can be a bit tricky as they are specifically designed to prevent automation and bots. However, there are a few ways to bypass CAPTCHAs in Puppeteer.

Using a Third-party Service

The most common way is to use third-party services like 2Captcha, Anti-Captcha, etc. These services have APIs that allow you to send the CAPTCHA to them, they solve it and send you back the response which you can input into the website.

Here is an example of how you can use the 2Captcha service to solve a Google reCAPTCHA with Puppeteer in JavaScript:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.example.com');

    // Here we get site key from the page
    const siteKey = await page.evaluate(
        () => document.querySelector('div[g-recaptcha-response]').getAttribute('data-sitekey')
    );

    // Request to 2Captcha service to solve the CAPTCHA
    const captchaSolved = await request.post('http://2captcha.com/in.php', {
        form: {
            method: 'userrecaptcha',
            key: 'YOUR_2CAPTCHA_KEY', // replace with your 2Captcha key
            googlekey: siteKey,
            pageurl: 'https://www.example.com', // replace with the url of the page
            json: 1
        }
    });

    // Now we get the response from 2Captcha and input it into the CAPTCHA form
    const gRecaptchaResponse = await request.get(`http://2captcha.com/res.php?key=YOUR_2CAPTCHA_KEY&action=get&id=${captchaSolved.request}`);
    await page.evaluate(`document.getElementById("g-recaptcha-response").innerHTML="${gRecaptchaResponse}";`);
    await page.click('#submit'); // click the submit button

    await browser.close();
})();

Replace 'YOUR_2CAPTCHA_KEY' with your actual 2Captcha API key.

Please note that these services are not free and you will need to pay for their CAPTCHA solving service.

Using CAPTCHA Solving Libraries

Another way to solve CAPTCHAs is to use libraries which are designed to solve CAPTCHAs. However, these libraries can only solve simple CAPTCHAs and they may not be able to solve complex ones.

Manual CAPTCHA Solving

If you are running your Puppeteer scripts in a non-headless mode, you can also manually solve the CAPTCHAs. This is not suitable for large scale scraping but can work for small tasks.

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({headless: false}); // Launch Puppeteer in non-headless mode
    const page = await browser.newPage();
    await page.goto('https://www.example.com'); // Navigate to the page with CAPTCHA

    // Wait for you to manually solve the CAPTCHA
    await page.waitForNavigation(); // Wait for navigation to complete

    // Continue with your scraping tasks

    await browser.close();
})();

In this case, Puppeteer will open a browser window where you can manually solve the CAPTCHA and then continue with the rest of the script.

Keep in mind that using bots to bypass CAPTCHAs may violate the terms of service of some websites.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon