Backend Development 5 min read

Run All Scrapy Spiders Together and Fix Video Download Errors

This guide shows how to create a custom Scrapy command to launch every spider at once, separate each spider's settings for better modularity, and resolve video download problems by adjusting request headers and handling file saving correctly.

FunTester

Dec 15, 2020

Run All Scrapy Spiders Together and Fix Video Download Errors

1. Launch All Spiders with a Custom Command

Define a new Scrapy command crawlall that iterates over the list of registered spiders and starts them sequentially. The command is placed in crawlall.py and referenced via COMMANDS_MODULE in settings.py.

from scrapy.commands import ScrapyCommand

class Command(ScrapyCommand):
    requires_project = True
    def syntax(self):
        return '[options]'
    def short_desc(self):
        return 'Runs all of the spiders'
    def run(self, args, opts):
        spider_list = self.crawler_process.spiders.list()
        for name in spider_list:
            self.crawler_process.crawl(name, **opts.__dict__)
        self.crawler_process.start()

Execute the command with a small wrapper script:

from scrapy.cmdline import execute
execute('scrapy crawlall'.split())

2. Separate Settings for Each Spider

Move spider‑specific configuration into custom_settings inside each spider file. Example settings include request headers, Redis‑based scheduler and dupefilter, download delay, and Redis connection URL.

custom_settings = {
    'DEFAULT_REQUEST_HEADERS': {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "zh,zh-CN;q=0.9",
        "Cache-Control": "max-age=0",
        "Connection": "keep-alive",
        "Host": "www.baikemy.com",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
    },
    'DOWNLOADER_MIDDLEWARES': {},
    'SCHEDULER': "scrapy_redis.scheduler.Scheduler",
    'DUPEFILTER_CLASS': "scrapy_redis.dupefilter.RFPDupeFilter",
    'REDIS_URL': "redis://@192.168.2.196:6379",
    'SCHEDULER_QUEUE_CLASS': "scrapy_redis.queue.SpiderPriorityQueue",
    'DOWNLOAD_DELAY': 0.3,
}

3. Fix Video Download That Results in Unplayable Files

The issue is solved by adding cache‑bypass headers ( Pragma: no-cache and Cache-Control: no-cache) and ensuring the response body is written correctly to a .mp4 file.

headers = {
    "Accept": "*/*",
    "Accept-Encoding": "identity;q=1, *;q=0",
    "Accept-Language": "zh,zh-CN;q=0.9",
    "Connection": "keep-alive",
    "Cache-Control": "no-cache",
    "Host": "v.baikemy.com",
    "Pragma": "no-cache",
    "Range": "bytes=0-",
    "Referer": meta["video_source"],
    "Sec-Fetch-Mode": "no-cors",
    "Sec-Fetch-Site": "same-origin",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36"
}

video_dir = r'e:\baikemy\video'
meta['video_location'] = meta['first_level'] + '\\' + meta['second_level'] + '\\' + meta['disease_name'] + '\\' + meta['title'] + '.mp4'
video_filepath = os.path.join(video_dir, meta['video_location'])

if os.path.isfile(video_filepath):
    logging.info('[视频]已存在')
else:
    if not os.path.exists(os.path.dirname(video_filepath)):
        os.makedirs(os.path.dirname(video_filepath))
    data = response.body
    with open(video_filepath, 'wb') as f:
        logging.info('[视频][正在下载]: ' + meta['title'])
        f.write(data)
        logging.info('[视频][下载完成]: ' + meta['title'] + '
')

These steps ensure all spiders are started with a single command, each spider maintains its own configuration, and downloaded video files are saved correctly and are playable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Python Redis Scrapy web crawling video download Custom Command

Written by

FunTester

10k followers, 1k articles | completely useless

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.