Nightmare is a high-level browser automation library for Node.js, which uses Electron under the hood. When performing web scraping tasks, it's common to encounter dynamic content that loads asynchronously, such as AJAX calls that populate data after the initial page load. To handle these scenarios, you need to wait for certain conditions to be met before proceeding with your scraping task.
Here are some ways to handle wait conditions in Nightmare:
1. Wait for a specific amount of time
You can wait for a predetermined amount of time using .wait(ms)
where ms
is the time in milliseconds.
const Nightmare = require('nightmare');
const nightmare = Nightmare({ show: true });
nightmare
.goto('https://example.com')
.wait(3000) // wait for 3 seconds
.evaluate(() => {
// Perform the actions after the wait
})
.end()
.then(/* ... */)
.catch(error => {
console.error('An error occurred:', error);
});
2. Wait for a specific selector
You can wait for a specific DOM element to appear using .wait(selector)
.
nightmare
.goto('https://example.com')
.wait('.some-class') // wait until an element with class 'some-class' appears
.evaluate(() => {
// Perform actions after the element appears
})
// ...
3. Wait for a JavaScript condition
You can also wait for a certain JavaScript condition to be true using .wait(fn)
.
nightmare
.goto('https://example.com')
.wait(() => {
// Wait until this JavaScript condition returns true
return document.querySelectorAll('.dynamic-content').length > 0;
})
// ...
4. Custom wait function
For more complex wait conditions, you can define a custom wait function.
function customWaitFunction() {
// Return true when the condition is met
return document.querySelector('div.loaded') !== null;
}
nightmare
.goto('https://example.com')
.wait(customWaitFunction)
// ...
5. Handling AJAX or dynamically loaded content
To handle AJAX or dynamically loaded content, you can combine the above methods with .evaluate()
to check if the content has been loaded.
nightmare
.goto('https://example.com')
.wait(() => {
// Use evaluate to check the state of the content
const content = document.querySelector('.ajax-content');
return content && content.innerText.trim().length > 0;
})
// ...
Example Scenario: Scraping Content After AJAX Load
const Nightmare = require('nightmare');
const nightmare = Nightmare({ show: true });
nightmare
.goto('https://example.com')
.wait(() => {
// This is a custom condition that checks if an AJAX-loaded
// list has more than one item
const listItems = document.querySelectorAll('.ajax-list-item');
return listItems.length > 1;
})
.evaluate(() => {
// Now that we've waited for the list to load, we can scrape its contents
const listItems = document.querySelectorAll('.ajax-list-item');
return Array.from(listItems).map(item => item.textContent.trim());
})
.end()
.then(listContents => {
console.log('AJAX-loaded List Contents:', listContents);
})
.catch(error => {
console.error('Scraping failed:', error);
});
In this example, Nightmare waits for an AJAX call to populate a list with items and then scrapes the text content of each item.
When scraping with Nightmare or any similar tool, it's important to be aware of the legal and ethical considerations. Always check the website's terms of service and robots.txt file to understand what's allowed and respect rate limits to avoid putting undue strain on the server.