Table of contents

How to Handle Geolocation-Based Content with Selenium

Many modern web applications provide location-specific content based on the user's geographic position. When web scraping such applications, you need to control the browser's geolocation settings to access content from different regions. This guide demonstrates how to handle geolocation-based content effectively using Selenium WebDriver.

Understanding Geolocation in Web Browsers

Web browsers use the Geolocation API to determine a user's location through various methods including GPS, IP address, WiFi networks, and cellular towers. Websites can request location permissions and display content based on the user's geographic position, such as:

  • Local business listings and reviews
  • Weather information
  • Regional pricing and availability
  • Location-specific news and events
  • Delivery zones and shipping options

Setting Up Geolocation with Chrome WebDriver

Python Implementation

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

def create_driver_with_geolocation(latitude, longitude):
    """Create Chrome driver with specific geolocation coordinates"""
    chrome_options = Options()

    # Set geolocation coordinates
    prefs = {
        "profile.default_content_setting_values.geolocation": 1,  # Allow location access
        "profile.managed_default_content_settings.geolocation": 1
    }
    chrome_options.add_experimental_option("prefs", prefs)

    # Initialize driver
    driver = webdriver.Chrome(options=chrome_options)

    # Override geolocation using Chrome DevTools Protocol
    driver.execute_cdp_cmd("Emulation.setGeolocationOverride", {
        "latitude": latitude,
        "longitude": longitude,
        "accuracy": 100
    })

    return driver

# Example: Set location to New York City
driver = create_driver_with_geolocation(40.7128, -74.0060)

try:
    # Navigate to a location-aware website
    driver.get("https://www.google.com/maps")

    # Wait for page to load and detect location
    time.sleep(3)

    # Your scraping logic here
    print("Location set successfully!")

finally:
    driver.quit()

JavaScript/Node.js Implementation

const { Builder, By, until } = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');

async function createDriverWithGeolocation(latitude, longitude) {
    const options = new chrome.Options();

    // Set geolocation preferences
    options.setUserPreferences({
        'profile.default_content_setting_values.geolocation': 1,
        'profile.managed_default_content_settings.geolocation': 1
    });

    const driver = await new Builder()
        .forBrowser('chrome')
        .setChromeOptions(options)
        .build();

    // Override geolocation
    await driver.executeCdpCommand('Emulation.setGeolocationOverride', {
        latitude: latitude,
        longitude: longitude,
        accuracy: 100
    });

    return driver;
}

// Example usage
async function scrapeLocationContent() {
    const driver = await createDriverWithGeolocation(51.5074, -0.1278); // London

    try {
        await driver.get('https://www.weather.com');

        // Wait for location-based content to load
        await driver.sleep(3000);

        // Extract location-specific data
        const locationElement = await driver.findElement(By.css('.location-name'));
        const location = await locationElement.getText();
        console.log(`Current location: ${location}`);

    } finally {
        await driver.quit();
    }
}

scrapeLocationContent();

Advanced Geolocation Handling

Managing Location Permissions

def setup_geolocation_permissions(driver, allow_location=True):
    """Set up location permissions for the current session"""

    # Navigate to chrome://settings/content/location (requires special handling)
    driver.get("chrome://settings/content/location")

    # Alternative: Use CDP commands to manage permissions
    if allow_location:
        driver.execute_cdp_cmd("Browser.grantPermissions", {
            "permissions": ["geolocation"],
            "origin": driver.current_url
        })
    else:
        driver.execute_cdp_cmd("Browser.revokePermissions", {
            "permissions": ["geolocation"],
            "origin": driver.current_url
        })

Dynamic Location Changes

def change_location_during_session(driver, new_lat, new_lng):
    """Change geolocation coordinates during an active session"""

    driver.execute_cdp_cmd("Emulation.setGeolocationOverride", {
        "latitude": new_lat,
        "longitude": new_lng,
        "accuracy": 100
    })

    # Refresh the page to apply new location
    driver.refresh()

    # Wait for location update
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.TAG_NAME, "body"))
    )

# Example: Test different locations
locations = [
    {"name": "New York", "lat": 40.7128, "lng": -74.0060},
    {"name": "London", "lat": 51.5074, "lng": -0.1278},
    {"name": "Tokyo", "lat": 35.6762, "lng": 139.6503}
]

driver = create_driver_with_geolocation(0, 0)

for location in locations:
    print(f"Testing location: {location['name']}")
    change_location_during_session(driver, location['lat'], location['lng'])
    # Perform scraping for this location
    time.sleep(2)

Handling Location-Based Content Patterns

Waiting for Location Detection

def wait_for_location_detection(driver, timeout=30):
    """Wait for the website to detect and process location"""

    try:
        # Wait for location indicator or content to appear
        WebDriverWait(driver, timeout).until(
            EC.any_of(
                EC.presence_of_element_located((By.CSS_SELECTOR, ".location-detected")),
                EC.presence_of_element_located((By.CSS_SELECTOR, "[data-location]")),
                EC.text_to_be_present_in_element((By.TAG_NAME, "body"), "location")
            )
        )
        return True
    except:
        print("Location detection timed out")
        return False

Extracting Location-Specific Data

def scrape_location_dependent_content(driver, url):
    """Scrape content that varies by location"""

    driver.get(url)

    # Wait for location-based content to load
    if wait_for_location_detection(driver):

        # Extract location information
        try:
            location_element = driver.find_element(By.CSS_SELECTOR, ".current-location")
            current_location = location_element.text
            print(f"Detected location: {current_location}")
        except:
            print("Could not extract location information")

        # Extract location-specific content
        results = []
        content_elements = driver.find_elements(By.CSS_SELECTOR, ".location-content")

        for element in content_elements:
            results.append({
                'title': element.find_element(By.TAG_NAME, "h3").text,
                'description': element.find_element(By.CSS_SELECTOR, ".description").text,
                'location': current_location
            })

        return results

    return []

Working with Firefox WebDriver

from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.webdriver.firefox.service import Service

def create_firefox_driver_with_geolocation(latitude, longitude):
    """Create Firefox driver with geolocation settings"""

    options = FirefoxOptions()

    # Set geolocation preferences
    options.set_preference("geo.enabled", True)
    options.set_preference("geo.prompt.testing", True)
    options.set_preference("geo.prompt.testing.allow", True)

    driver = webdriver.Firefox(options=options)

    # Firefox requires JavaScript execution for geolocation override
    geolocation_script = f"""
    navigator.geolocation.getCurrentPosition = function(success, error) {{
        success({{
            coords: {{
                latitude: {latitude},
                longitude: {longitude},
                accuracy: 100
            }}
        }});
    }};
    """

    driver.execute_script(geolocation_script)

    return driver

Best Practices and Considerations

Performance Optimization

When dealing with geolocation-based content, consider these performance optimizations:

# Use headless mode for faster execution
chrome_options.add_argument("--headless")

# Disable unnecessary features
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

# Set appropriate timeouts
driver.set_page_load_timeout(30)
driver.implicitly_wait(10)

Error Handling

def robust_geolocation_scraping(locations, target_url):
    """Robust scraping with proper error handling"""

    results = []

    for location in locations:
        driver = None
        try:
            driver = create_driver_with_geolocation(
                location['lat'], 
                location['lng']
            )

            driver.get(target_url)

            # Wait for content with timeout
            if wait_for_location_detection(driver, timeout=15):
                data = scrape_location_dependent_content(driver, target_url)
                results.extend(data)
            else:
                print(f"Location detection failed for {location['name']}")

        except Exception as e:
            print(f"Error processing location {location['name']}: {str(e)}")

        finally:
            if driver:
                driver.quit()

    return results

Integration with WebScraping.AI

For more complex geolocation-based scraping scenarios, you might want to consider using specialized tools. Similar to how you might handle authentication in Puppeteer for secure content, geolocation handling can be simplified with cloud-based solutions.

WebScraping.AI provides built-in geolocation support through proxy networks, allowing you to scrape location-dependent content without complex browser configurations. This is particularly useful when you need to handle dynamic content that loads after page navigation, as location-based content often loads asynchronously.

Troubleshooting Common Issues

Location Not Detected

def troubleshoot_location_detection(driver):
    """Debug location detection issues"""

    # Check if geolocation is available
    geo_available = driver.execute_script("""
        return navigator.geolocation !== undefined;
    """)

    if not geo_available:
        print("Geolocation API not available")
        return False

    # Check current position
    position = driver.execute_script("""
        return new Promise((resolve) => {
            navigator.geolocation.getCurrentPosition(
                (pos) => resolve(pos.coords),
                (err) => resolve(null)
            );
        });
    """)

    if position:
        print(f"Current position: {position['latitude']}, {position['longitude']}")
        return True
    else:
        print("Could not get current position")
        return False

Content Not Loading

# Wait for specific location-based elements
WebDriverWait(driver, 30).until(
    EC.any_of(
        EC.presence_of_element_located((By.CSS_SELECTOR, "[data-lat]")),
        EC.presence_of_element_located((By.CSS_SELECTOR, ".geo-content")),
        EC.text_to_be_present_in_element((By.TAG_NAME, "body"), "your location")
    )
)

Conclusion

Handling geolocation-based content with Selenium requires careful configuration of browser settings, proper permission management, and robust error handling. By using the Chrome DevTools Protocol for geolocation override and implementing appropriate waiting strategies, you can effectively scrape location-dependent content from various geographic perspectives.

Remember to respect websites' terms of service and rate limits when scraping geolocation-based content, and consider using specialized tools for large-scale operations that require handling multiple locations simultaneously.

Try WebScraping.AI for Your Web Scraping Needs

Looking for a powerful web scraping solution? WebScraping.AI provides an LLM-powered API that combines Chromium JavaScript rendering with rotating proxies for reliable data extraction.

Key Features:

  • AI-powered extraction: Ask questions about web pages or extract structured data fields
  • JavaScript rendering: Full Chromium browser support for dynamic content
  • Rotating proxies: Datacenter and residential proxies from multiple countries
  • Easy integration: Simple REST API with SDKs for Python, Ruby, PHP, and more
  • Reliable & scalable: Built for developers who need consistent results

Getting Started:

Get page content with AI analysis:

curl "https://api.webscraping.ai/ai/question?url=https://example.com&question=What is the main topic?&api_key=YOUR_API_KEY"

Extract structured data:

curl "https://api.webscraping.ai/ai/fields?url=https://example.com&fields[title]=Page title&fields[price]=Product price&api_key=YOUR_API_KEY"

Try in request builder

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon