How can I extract Google Search related queries and suggestions?
Google Search provides several types of related queries and suggestions that can be valuable for SEO research, content strategy, and market analysis. These include autocomplete suggestions, "People also ask" questions, and related searches at the bottom of search results pages. This guide will show you how to extract these elements using various web scraping techniques.
Understanding Google's Suggestion Types
Google offers several types of related queries and suggestions:
- Autocomplete suggestions - Appear as you type in the search box
- People also ask - Expandable questions related to your search
- Related searches - Keywords shown at the bottom of search results
- Search refinements - Alternative query suggestions displayed alongside results
Method 1: Using Python with Requests and BeautifulSoup
Here's how to extract related searches from Google search results pages:
import requests
from bs4 import BeautifulSoup
import time
import random
def get_google_suggestions(query):
"""Extract related searches from Google search results"""
# Set up headers to mimic a real browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
# Construct search URL
search_url = f"https://www.google.com/search?q={query.replace(' ', '+')}"
try:
# Add random delay to avoid rate limiting
time.sleep(random.uniform(1, 3))
response = requests.get(search_url, headers=headers)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Extract "People also ask" questions
people_also_ask = []
paa_elements = soup.find_all('div', {'class': lambda x: x and 'related-question-pair' in x})
for element in paa_elements:
question = element.find('span')
if question:
people_also_ask.append(question.get_text().strip())
# Extract related searches (bottom of page)
related_searches = []
related_elements = soup.find_all('div', {'class': lambda x: x and 'BNeawe' in str(x)})
for element in related_elements:
text = element.get_text().strip()
if text and len(text.split()) <= 6: # Filter reasonable length suggestions
related_searches.append(text)
return {
'query': query,
'people_also_ask': people_also_ask[:5], # Limit to first 5
'related_searches': list(set(related_searches))[:10] # Remove duplicates, limit to 10
}
except requests.exceptions.RequestException as e:
print(f"Error fetching data: {e}")
return None
# Example usage
suggestions = get_google_suggestions("web scraping python")
print("People Also Ask:")
for question in suggestions['people_also_ask']:
print(f"- {question}")
print("\nRelated Searches:")
for search in suggestions['related_searches']:
print(f"- {search}")
Method 2: Using Google Autocomplete API
Google provides an unofficial autocomplete API that you can use to get search suggestions:
import requests
import json
def get_autocomplete_suggestions(query, num_suggestions=10):
"""Get Google autocomplete suggestions using the suggestion API"""
url = "http://suggestqueries.google.com/complete/search"
params = {
'client': 'firefox',
'q': query,
'hl': 'en' # Language
}
try:
response = requests.get(url, params=params)
response.raise_for_status()
# Parse JSON response
suggestions_data = json.loads(response.text)
suggestions = suggestions_data[1][:num_suggestions]
return {
'query': query,
'autocomplete_suggestions': suggestions
}
except Exception as e:
print(f"Error getting autocomplete suggestions: {e}")
return None
# Example usage
autocomplete = get_autocomplete_suggestions("machine learning")
print("Autocomplete Suggestions:")
for suggestion in autocomplete['autocomplete_suggestions']:
print(f"- {suggestion}")
Method 3: Using Puppeteer (JavaScript)
For dynamic content and more reliable extraction, use Puppeteer to render the page completely:
const puppeteer = require('puppeteer');
async function getGoogleSuggestions(query) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
try {
const page = await browser.newPage();
// Set user agent to avoid detection
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');
// Navigate to Google search
const searchUrl = `https://www.google.com/search?q=${encodeURIComponent(query)}`;
await page.goto(searchUrl, { waitUntil: 'networkidle2' });
// Extract "People also ask" questions
const peopleAlsoAsk = await page.evaluate(() => {
const questions = [];
const paaElements = document.querySelectorAll('[data-initq]');
paaElements.forEach(element => {
const questionText = element.textContent?.trim();
if (questionText) {
questions.push(questionText);
}
});
return questions;
});
// Extract related searches
const relatedSearches = await page.evaluate(() => {
const searches = [];
const relatedElements = document.querySelectorAll('a[data-ved] span');
relatedElements.forEach(element => {
const searchText = element.textContent?.trim();
if (searchText && searchText.length > 3 && searchText.length < 100) {
searches.push(searchText);
}
});
// Remove duplicates and return unique searches
return [...new Set(searches)];
});
return {
query,
peopleAlsoAsk: peopleAlsoAsk.slice(0, 5),
relatedSearches: relatedSearches.slice(0, 10)
};
} catch (error) {
console.error('Error extracting suggestions:', error);
return null;
} finally {
await browser.close();
}
}
// Example usage
(async () => {
const suggestions = await getGoogleSuggestions('web scraping tools');
console.log('People Also Ask:');
suggestions.peopleAlsoAsk.forEach(question => {
console.log(`- ${question}`);
});
console.log('\nRelated Searches:');
suggestions.relatedSearches.forEach(search => {
console.log(`- ${search}`);
});
})();
Method 4: Extracting Autocomplete Suggestions with Puppeteer
To capture real-time autocomplete suggestions as users type:
async function getAutocompleteSuggestions(query) {
const browser = await puppeteer.launch({ headless: false }); // Set to true for production
const page = await browser.newPage();
try {
await page.goto('https://www.google.com');
// Wait for search box and focus on it
await page.waitForSelector('input[name="q"]');
const searchBox = await page.$('input[name="q"]');
// Type the query character by character to trigger autocomplete
await searchBox.type(query, { delay: 100 });
// Wait for suggestions to appear
await page.waitForSelector('.wM6W7d', { timeout: 5000 });
// Extract autocomplete suggestions
const suggestions = await page.evaluate(() => {
const suggestionElements = document.querySelectorAll('.wM6W7d span');
return Array.from(suggestionElements).map(el => el.textContent.trim());
});
return suggestions.filter(s => s.length > 0);
} catch (error) {
console.error('Error getting autocomplete:', error);
return [];
} finally {
await browser.close();
}
}
Best Practices and Considerations
1. Rate Limiting and Respectful Scraping
Always implement proper rate limiting to avoid being blocked:
import time
import random
def respectful_delay():
"""Add random delay between requests"""
delay = random.uniform(2, 5) # 2-5 seconds
time.sleep(delay)
# Use between requests
respectful_delay()
2. Rotating User Agents and Headers
Use different user agents to appear more like organic traffic:
USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
]
headers = {
'User-Agent': random.choice(USER_AGENTS),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'keep-alive'
}
3. Error Handling and Resilience
Implement robust error handling for production use:
def safe_extract_suggestions(query, max_retries=3):
"""Extract suggestions with retry logic"""
for attempt in range(max_retries):
try:
result = get_google_suggestions(query)
if result:
return result
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
return None
Advanced Techniques
Extracting Search Refinements
Google often shows search refinements and filters. Here's how to extract them:
def extract_search_refinements(soup):
"""Extract search refinement suggestions"""
refinements = []
# Look for refinement chips or buttons
refinement_elements = soup.find_all('div', {'class': lambda x: x and 'refinement' in str(x).lower()})
for element in refinement_elements:
refinement_text = element.get_text().strip()
if refinement_text:
refinements.append(refinement_text)
return refinements
Handling JavaScript-Heavy Content
For pages with dynamic content, consider using browser automation techniques with Puppeteer to ensure all suggestions load properly. You may also need to handle AJAX requests that populate suggestion data.
Legal and Ethical Considerations
When scraping Google search suggestions:
- Respect robots.txt - Check Google's robots.txt file
- Use reasonable request rates - Don't overwhelm Google's servers
- Consider Google's Custom Search API - For commercial use, consider official APIs
- Review terms of service - Ensure compliance with Google's terms
Troubleshooting Common Issues
Issue 1: Empty Results
- Verify CSS selectors are current (Google frequently updates their HTML structure)
- Check if JavaScript is required to load content
- Ensure proper headers are set
Issue 2: Getting Blocked
- Implement longer delays between requests
- Use proxy rotation
- Vary request patterns and headers
Issue 3: Inconsistent Data
- Google personalizes results based on location and search history
- Use incognito mode or clear cookies between requests
- Consider using VPN services for consistent geographic results
Conclusion
Extracting Google search suggestions and related queries can provide valuable insights for SEO and content strategy. The methods shown above range from simple HTTP requests to sophisticated browser automation. Choose the approach that best fits your needs while respecting Google's terms of service and implementing proper rate limiting.
For complex scenarios requiring browser session management or handling dynamic content, Puppeteer provides the most reliable solution, though it requires more resources than simple HTTP requests.
Remember to always test your scraping code thoroughly and monitor for changes in Google's page structure, as these can break your extraction logic without warning.