How do I write a Scrapy spider?

Scrapy is an open-source web crawling framework that allows you to write spiders to scrape web content and use it in your applications. Here's how you write a Scrapy spider:

To create a Scrapy project, navigate to your project directory and run the following command:

scrapy startproject myproject

This will create a new Scrapy project named "myproject". Now, navigate to the "spiders" directory inside the new project:

cd myproject/myproject/spiders

A Scrapy spider is a Python class that subclasses scrapy.Spider. Here is a basic example:

import scrapy

class MySpider(scrapy.Spider):
    name = 'my_spider'

    start_urls = ['http://example.com']

    def parse(self, response):
        self.log('Visited %s' % response.url)
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('span small::text').get(),
            }

In this example, name is the name of the spider, which Scrapy uses to start the spider. start_urls is a list of URLs to start crawling from. parse is a method that will be called to handle the response downloaded for each of the requests made.

To run the spider, you would use the scrapy crawl command, as follows:

scrapy crawl my_spider

Related Questions

Get Started Now

WebScraping.AI provides rotating proxies, Chromium rendering and built-in HTML parser for web scraping
Icon