Scrapy uses settings to customize the behavior of all Scrapy components, including the core, extensions, pipelines and spiders themselves.
Settings are configured in a Python module, and Scrapy uses scrapy.settings.Settings
class for that. These settings are then used by internal Scrapy components through the scrapy.settings.Settings
object.
Scrapy looks for configuration parameters in the settings.py
file located in your project's directory. You can define any of your settings in this file.
Here is an example of how to define a custom setting:
# settings.py
BOT_NAME = 'my_bot'
SPIDER_MODULES = ['my_project.spiders']
NEWSPIDER_MODULE = 'my_project.spiders'
You can also configure the settings in your spider itself, using the custom_settings
attribute. This allows you to set (or override) any setting directly in your spider.
# my_spider.py
class MySpider(scrapy.Spider):
name = 'my_spider'
custom_settings = {
'DOWNLOAD_DELAY': 2.0,
}
In this example, DOWNLOAD_DELAY
setting is overridden for this spider only.
If you need to access the settings in your spider, you can use the self.settings
attribute:
class MySpider(scrapy.Spider):
name = 'my_spider'
def parse(self, response):
some_setting = self.settings.get('SOME_SETTING')
# ...
In addition, you can override your settings using command line options when running your spider:
scrapy crawl my_spider -s DOWNLOAD_DELAY=2
In this case, the DOWNLOAD_DELAY
setting will be overridden for this particular run.
Remember, the settings are case sensitive, so ensure you use all capital letters for your setting names. You can find a list of available settings in the Scrapy documentation.
In terms of priority, settings are applied in the following order, with the lower numbers taking precedence:
- Command line options
- Settings defined in the spider
- Settings defined in the
settings.py
file - Default settings
This means that if you define a setting in your spider, it will override the same setting defined in the settings.py
file. Similarly, a command line option will override the same setting defined both in the spider and in the settings.py
file.