How do I rotate user agents in Scrapy?

Rotating user agents in Scrapy is a common technique used to prevent getting blocked while scraping websites. Here is how you can do it:

  • First, you need to create a list of user agents. You can find a list of user agents on the internet or you can create one yourself. Here is a sample list:
USER_AGENT_LIST = [
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
    "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
    ...
]
  • Then, you need to create a middleware that will rotate the user agents. You can do this by creating a new python file, let's call it middlewares.py, and add the following code:
import random
from scrapy import signals
from scrapy.downloadermiddlewares.useragent import UserAgentMiddleware

class RotateUserAgentMiddleware(UserAgentMiddleware):
    def __init__(self, user_agent=''):
        self.user_agent = user_agent

    def process_request(self, request, spider):
        ua = random.choice(spider.settings.getattr('USER_AGENT_LIST'))
        if ua:
            request.headers.setdefault('User-Agent', ua)

This code creates a new middleware class that inherits from UserAgentMiddleware. In the process_request method, it selects a random user agent from the list and sets it in the headers of the request.

  • Finally, you need to enable this middleware in your settings. You can do this by adding the following lines to your settings.py file:
DOWNLOADER_MIDDLEWARES = {
    'myproject.middlewares.RotateUserAgentMiddleware': 110,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
}

In the DOWNLOADER_MIDDLEWARES setting, you need to disable the default UserAgentMiddleware and enable your custom middleware. The number 110 is the order in which the middlewares are processed. The lower the number, the sooner it will be processed.

Now, every time you make a request, Scrapy will use a different user agent from your list.

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon