What user-agent should I use when scraping Realestate.com?

When scraping websites like Realestate.com, it's important to abide by the website's robots.txt file and terms of service. The robots.txt file specifies the rules about what parts of the site can be accessed by web crawlers.

Regarding the user-agent, many websites look for the user-agent string to determine the kind of visitor they are dealing with. Some sites may block or present different content to different user-agents, especially if they suspect it's a bot or scraper.

Before selecting a user-agent for web scraping purposes, you should consider the following points:

  1. Check robots.txt: Look for the robots.txt file of Realestate.com (usually found at https://www.realestate.com.au/robots.txt) to see if there are any specific instructions about scraping or user-agents.

  2. Follow the site’s terms of service: Also, read the terms of service to check if they have any restrictions on web scraping or the use of automated tools.

  3. Use a browser-like user-agent: If you decide to proceed with scraping, it is often a good practice to set your user-agent to a common web browser's user-agent. This can be the user-agent of Chrome, Firefox, or any other mainstream browser.

  4. Avoid overly generic or bot-like user-agents: Some sites may block user-agents that are non-standard or look like they belong to a bot.

  5. Respect the website's load: Make sure your scraping activities do not put excessive load on the website. It's ethical to scrape during off-peak hours and to make requests at a slower rate.

To set a user-agent in Python using the requests library, you can do the following:

import requests

url = 'https://www.realestate.com.au/buy'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}

response = requests.get(url, headers=headers)
print(response.text)

In JavaScript, using node-fetch or a similar library, you can set a user-agent like this:

const fetch = require('node-fetch');

const url = 'https://www.realestate.com.au/buy';
const headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
};

fetch(url, { headers: headers })
    .then(response => response.text())
    .then(body => console.log(body));

Remember that the user-agent string provided in these examples is just one example. You should use a user-agent string that is up-to-date for the browser you are emulating. User-agent strings can be found easily with a quick web search or by checking the browser's developer tools.

Always remember to scrape responsibly and ethically, respecting the website's rules and the legal implications of web scraping.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon