When scraping a website like Zoominfo, handling timeouts and retries is essential to ensure that your scraper can recover from temporary issues such as network problems or server overloads. However, it should be noted that scraping Zoominfo or any other site should be done in compliance with their terms of service and legal constraints.
Here's how you can handle timeouts and retries in both Python and JavaScript:
Python
You can use the requests
library in Python to manage timeouts, and the retrying
or backoff
library to implement retries with exponential backoff.
First, install the necessary packages if you haven't already:
pip install requests backoff
Then, you can write a function to scrape data with timeouts and retries:
import requests
import backoff
# This is a decorator that tells the function to retry with an exponential backoff
@backoff.on_exception(backoff.expo, requests.exceptions.RequestException, max_tries=8)
def fetch_data(url):
try:
# Set a timeout for the request
response = requests.get(url, timeout=(5, 14))
response.raise_for_status()
return response.content
except requests.exceptions.HTTPError as e:
# Handle HTTP errors (e.g., 4xx, 5xx)
print(f"HTTP error: {e}")
except requests.exceptions.ConnectionError as e:
# Handle connection-related errors
print(f"Connection error: {e}")
except requests.exceptions.Timeout as e:
# Handle timeouts
print(f"Timeout error: {e}")
except requests.exceptions.RequestException as e:
# Handle other request errors
print(f"Request error: {e}")
# Use the function
data = fetch_data("https://www.zoominfo.com/")
JavaScript
For Node.js, you can use the axios
library along with axios-retry
to manage timeouts and retries.
First, install the necessary packages:
npm install axios axios-retry
Then, create a script to handle timeouts and retries:
const axios = require('axios');
const axiosRetry = require('axios-retry');
// Configure axios to use a retry delay and max retries
axiosRetry(axios, { retries: 3, retryDelay: axiosRetry.exponentialDelay });
async function fetchData(url) {
try {
const response = await axios.get(url, { timeout: 5000 }); // 5 seconds timeout
return response.data;
} catch (error) {
if (error.code === 'ECONNABORTED') {
console.error('Timeout error:', error.message);
} else if (axiosRetry.isNetworkOrIdempotentRequestError(error)) {
console.error('Network or idempotent request error:', error.message);
} else {
console.error('Error:', error.message);
}
// You might want to throw the error or handle it further up the chain
throw error;
}
}
// Use the function
fetchData('https://www.zoominfo.com/')
.then(data => console.log(data))
.catch(error => console.error(error));
Both scripts above demonstrate how to perform a web request with a timeout and implement retries using an exponential backoff strategy, which gradually increases the delay between retries to avoid overwhelming the server.
Remember to use these techniques responsibly and respect Zoominfo's terms of service, which may prohibit or limit automated scraping of their data. If you are legitimately accessing Zoominfo's data for integration purposes, consider using their official API if one is available, as it would be a more reliable and legal method of accessing the data you need.