Infinite scroll is a common feature found on many websites where more content is loaded as you scroll down the page. This can pose a challenge when you're trying to scrape data from the website because you need to simulate this scrolling behaviour to load all the content.
Here's how you can deal with infinite scrolling in Puppeteer:
First, you need to launch Puppeteer and navigate to the page.
const puppeteer = require('puppeteer');
async function scrapeInfiniteScroll() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://example.com');
// ...
}
scrapeInfiniteScroll();
Next, you can use the page.evaluate
function to run JavaScript in the context of the page. You can use this to scroll to the bottom of the page.
await page.evaluate(() => {
window.scrollBy(0, window.innerHeight);
});
Puppeteer doesn't provide an event for when the scroll has finished loading new content, so you'll need to add a delay after scrolling to allow the new content to load.
await page.waitForTimeout(1000);
To scroll multiple times, you can put the scroll and delay in a loop. Here's an example of scrolling 10 times.
for(let i = 0; i < 10; i++) {
await page.evaluate(() => {
window.scrollBy(0, window.innerHeight);
});
await page.waitForTimeout(1000);
}
However, the number of times to scroll shouldn't be hardcoded. Instead, you can monitor the height of the body element, and stop scrolling when it stops increasing. This indicates that there's no more content to load.
let previousHeight;
do {
previousHeight = await page.evaluate('document.body.scrollHeight');
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`);
await page.waitForTimeout(1000); // Timeout for safe measure
} while (await page.evaluate('document.body.scrollHeight') > previousHeight);
After the loop, you can continue with the rest of your scraping.
Keep in mind that this is a basic example, and you might need to adjust the code depending on the specifics of the website you're scraping. For example, some websites might require you to scroll in a specific way, or they might have a "load more" button that you need to click instead of scrolling.