How to navigate to different pages using Puppeteer?

Navigating to different pages using Puppeteer is straightforward. Puppeteer is a Node.js library which provides a high-level API to control a headless Chrome or Chromium browser over the DevTools Protocol.

Here is a basic guide on how to navigate to different pages using Puppeteer.

We'll start by installing Puppeteer. Use npm (Node Package Manager) to install it:

npm i puppeteer

Once you have Puppeteer installed, you can navigate to different pages using the page.goto() function.

Here is a simple example in JavaScript where we navigate to Google's homepage:

const puppeteer = require('puppeteer');

async function run() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.google.com');
    await page.screenshot({ path: 'google.png' });

    browser.close();
}

run();

In this script, we're first requiring the Puppeteer module, then we're launching a new browser instance. After that, we're creating a new page in that browser, and with page.goto() we're navigating to Google's homepage.

You can also navigate to different pages sequentially. Here's how you could navigate from Google's homepage to the Google's about page:

const puppeteer = require('puppeteer');

async function run() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.google.com');
    await page.goto('https://www.google.com/about');
    await page.screenshot({ path: 'google_about.png' });

    browser.close();
}

run();

In this script, we're first navigating to Google's homepage, then we're navigating to Google's about page. A screenshot is taken at the end of the navigation sequence.

Remember that page.goto() is an asynchronous function, which means it returns a Promise. You have to await the Promise to ensure the page has been fully loaded before taking any actions on the page.

You can also control navigation by waiting for certain events like 'load', 'domcontentloaded', 'networkidle0', 'networkidle2'.

await page.goto('https://www.google.com', {waitUntil: 'networkidle2'});

This makes Puppeteer wait until there are no more than 2 network connections for at least 500 ms.

Please refer to the Puppeteer API documentation for more detailed information.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon