In Scrapy, an item is a simple container for collecting scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.
Handling different item types in Scrapy is quite straightforward. You just need to define different Item classes for different types of data you want to scrape.
Here is an example in Python:
import scrapy
class BookItem(scrapy.Item):
title = scrapy.Field()
author = scrapy.Field()
price = scrapy.Field()
class MovieItem(scrapy.Item):
title = scrapy.Field()
director = scrapy.Field()
rating = scrapy.Field()
In the above example, BookItem
and MovieItem
are two different item types. Each type has its own fields.
In your spider, you can use these items like this:
def parse(self, response):
for book in response.css('div.book'):
item = BookItem()
item['title'] = book.css('h1 ::text').get()
item['author'] = book.css('h2 ::text').get()
item['price'] = book.css('p.price ::text').get()
yield item
for movie in response.css('div.movie'):
item = MovieItem()
item['title'] = movie.css('h1 ::text').get()
item['director'] = movie.css('h2 ::text').get()
item['rating'] = movie.css('div.rating ::text').get()
yield item
In this scenario, each item will be processed separately in your item pipeline. If you have a different processing logic for each item type, you can check the item type in your pipeline like this:
def process_item(self, item, spider):
if isinstance(item, BookItem):
# Process a book item
elif isinstance(item, MovieItem):
# Process a movie item
The isinstance()
function is used to check if the item
is an instance of BookItem
or MovieItem
. You can implement the desired processing logic in the corresponding if or elif block.
As a tip, it's a good practice to define your item fields as clearly as possible. In a real project, you may want to design them according to the data schema of your storage system (like your database tables).
Remember, you can always refer to Scrapy's official documentation for more advanced usage of items.