Web scraping can be performed using a variety of programming languages. Each language has its own set of libraries or tools that can be leveraged to extract data from websites like Yellow Pages. Below are some of the popular programming languages used for web scraping, along with examples of how they can be used for scraping Yellow Pages:
Python
Python is one of the most popular languages for web scraping due to its ease of use and powerful libraries. Two commonly used libraries for web scraping in Python are BeautifulSoup and Scrapy.
Example with BeautifulSoup:
import requests
from bs4 import BeautifulSoup
url = 'https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=New+York%2C+NY'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# Find business names
for business in soup.find_all('div', class_='info'):
name = business.find('a', class_='business-name').text
print(name)
Example with Scrapy:
import scrapy
class YellowPagesSpider(scrapy.Spider):
name = "yellowpages"
start_urls = [
'https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=New+York%2C+NY',
]
def parse(self, response):
for business in response.css('div.info'):
yield {
'name': business.css('a.business-name::text').get(),
# Add more fields to scrape as needed
}
# To run the spider, you would save this as a file and use the Scrapy command line interface:
# scrapy runspider yellowpages_spider.py
JavaScript (Node.js)
Node.js can be used for web scraping with the help of libraries like axios for HTTP requests and cheerio for parsing HTML.
Example with axios and cheerio:
const axios = require('axios');
const cheerio = require('cheerio');
const url = 'https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=New+York%2C+NY';
axios.get(url)
.then(response => {
const $ = cheerio.load(response.data);
$('.info').each((index, element) => {
const name = $(element).find('.business-name').text();
console.log(name);
});
})
.catch(error => {
console.error(error);
});
Ruby
Ruby also has libraries for web scraping, such as Nokogiri.
Example with Nokogiri:
require 'nokogiri'
require 'open-uri'
url = 'https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=New+York%2C+NY'
document = Nokogiri::HTML(URI.open(url))
document.css('div.info').each do |business|
name = business.css('a.business-name').text
puts name
end
PHP
PHP offers several libraries for web scraping, such as Goutte.
Example with Goutte:
<?php
require_once 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=New+York%2C+NY');
$crawler->filter('div.info')->each(function ($node) {
$name = $node->filter('a.business-name')->text();
echo $name . "\n";
});
Java
Java can be used for web scraping using libraries like Jsoup.
Example with Jsoup:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class YellowPagesScraper {
public static void main(String[] args) throws Exception {
String url = "https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=New+York%2C+NY";
Document doc = Jsoup.connect(url).get();
Elements businesses = doc.select("div.info");
for (Element business : businesses) {
String name = business.select("a.business-name").text();
System.out.println(name);
}
}
}
When using any language for web scraping, it's important to be mindful of the website's robots.txt
file and terms of service to ensure that you're allowed to scrape their data. Additionally, be considerate of the website's resources by not overloading their servers with too many requests in a short period of time.