Dealing with CAPTCHAs in Puppeteer can be a bit tricky as they are specifically designed to prevent automation and bots. However, there are a few ways to bypass CAPTCHAs in Puppeteer.
Using a Third-party Service
The most common way is to use third-party services like 2Captcha, Anti-Captcha, etc. These services have APIs that allow you to send the CAPTCHA to them, they solve it and send you back the response which you can input into the website.
Here is an example of how you can use the 2Captcha service to solve a Google reCAPTCHA with Puppeteer in JavaScript:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.example.com');
// Here we get site key from the page
const siteKey = await page.evaluate(
() => document.querySelector('div[g-recaptcha-response]').getAttribute('data-sitekey')
);
// Request to 2Captcha service to solve the CAPTCHA
const captchaSolved = await request.post('http://2captcha.com/in.php', {
form: {
method: 'userrecaptcha',
key: 'YOUR_2CAPTCHA_KEY', // replace with your 2Captcha key
googlekey: siteKey,
pageurl: 'https://www.example.com', // replace with the url of the page
json: 1
}
});
// Now we get the response from 2Captcha and input it into the CAPTCHA form
const gRecaptchaResponse = await request.get(`http://2captcha.com/res.php?key=YOUR_2CAPTCHA_KEY&action=get&id=${captchaSolved.request}`);
await page.evaluate(`document.getElementById("g-recaptcha-response").innerHTML="${gRecaptchaResponse}";`);
await page.click('#submit'); // click the submit button
await browser.close();
})();
Replace 'YOUR_2CAPTCHA_KEY'
with your actual 2Captcha API key.
Please note that these services are not free and you will need to pay for their CAPTCHA solving service.
Using CAPTCHA Solving Libraries
Another way to solve CAPTCHAs is to use libraries which are designed to solve CAPTCHAs. However, these libraries can only solve simple CAPTCHAs and they may not be able to solve complex ones.
Manual CAPTCHA Solving
If you are running your Puppeteer scripts in a non-headless mode, you can also manually solve the CAPTCHAs. This is not suitable for large scale scraping but can work for small tasks.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({headless: false}); // Launch Puppeteer in non-headless mode
const page = await browser.newPage();
await page.goto('https://www.example.com'); // Navigate to the page with CAPTCHA
// Wait for you to manually solve the CAPTCHA
await page.waitForNavigation(); // Wait for navigation to complete
// Continue with your scraping tasks
await browser.close();
})();
In this case, Puppeteer will open a browser window where you can manually solve the CAPTCHA and then continue with the rest of the script.
Keep in mind that using bots to bypass CAPTCHAs may violate the terms of service of some websites.