How do I deal with pagination on a website when using Nightmare?

When dealing with pagination using Nightmare, you need to create a loop or a recursive function that navigates through each page and collects the data you need. Nightmare.js is a high-level browser automation library for Node.js, which makes it suitable for web scraping tasks that require interaction with pages or JavaScript execution.

Here is a general approach to handle pagination with Nightmare:

  1. Define a function that will scrape the data from a single page.
  2. Check if there's a next page button or link and whether it's enabled.
  3. Click the next page button if it exists and is enabled, then wait for the new page to load.
  4. Repeat this process until you've reached the end of the pagination or collected all the required data.

Below is an example of a Node.js script using Nightmare to handle pagination:

const Nightmare = require('nightmare');
const nightmare = Nightmare({ show: false });

// Function to scrape data from the current page
const scrapeCurrentPage = async () => {
  // Define the logic to scrape data from the current page
  // Replace the selector with the actual selector for the items you are scraping
  const data = await nightmare.evaluate(() => {
    const items = document.querySelectorAll('.item-selector');
    const results = [];
    items.forEach(item => {
      // Extract the necessary information from each item
      results.push({
        title: item.querySelector('.title-selector').innerText,
        url: item.querySelector('.url-selector').href
      });
    });
    return results;
  });
  return data;
};

// Recursive function to navigate and scrape each page
const scrapePages = async (results = []) => {
  // Scrape the current page
  const dataFromCurrentPage = await scrapeCurrentPage();
  results = results.concat(dataFromCurrentPage);

  // Check if the next page exists and is clickable
  const hasNextPage = await nightmare.evaluate(() => {
    // Replace nextSelector with the actual selector for the "Next" button or link
    const nextPage = document.querySelector('.next-page-selector');
    return nextPage && !nextPage.classList.contains('disabled'); // Adjust this according to how the site indicates a disabled "Next" button
  });

  // If there is a next page, click it and continue scraping
  if (hasNextPage) {
    await nightmare.click('.next-page-selector') // Use the actual selector for the "Next" button or link
      .wait('body'); // Adjust the wait condition as necessary

    return scrapePages(results); // Recursively call to scrape the next page
  } else {
    return results; // No more pages, return the results
  }
};

// Start the scraping process
scrapePages()
  .then(results => {
    console.log('Scraped Data:', results);
    return nightmare.end();
  })
  .catch(error => {
    console.error('Scraping failed:', error);
  });

Please note the following when using the above script:

  • Replace .item-selector, .title-selector, .url-selector, and .next-page-selector with the actual CSS selectors for the items you want to scrape and the "Next" button or link.
  • The wait('body') part is to ensure that the page has loaded after clicking the "Next" button. This might need to be adjusted depending on how the website loads new content (e.g., waiting for a specific element to appear).
  • The hasNextPage logic might need to be adjusted depending on how the website disables or hides the "Next" button on the last page.
  • Since web scraping can affect the performance of the website and the experience of its users, always ensure you comply with the website's terms of service and robots.txt file to avoid any legal issues.

Remember that web scraping can be legally complex, and the legality of scraping a particular website or data set can depend on the website's terms of service, copyright law, and other legal considerations. Always ensure you have the legal right to scrape the data you are targeting.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon