Fundamentals 6 min read

How to Schedule Python Web Crawlers: 3 Simple Methods Explained

This article demonstrates three practical ways to schedule Python web‑crawling tasks—using an infinite while loop, the Timer module, and the sched module—providing code snippets, usage tips, and considerations for handling multiple runs and resource constraints.

Python Crawling & Data Mining

Mar 10, 2021

How to Schedule Python Web Crawlers: 3 Simple Methods Explained

Introduction

Previously Windows Task Scheduler worked, but now the author records Python web‑crawling scheduling methods.

Method 1: while True loop

Use a simple infinite loop that calculates the next execution time and sleeps until then.

import os
import time
import sys
from datetime import datetime, timedelta
def One_Plan():
    # set period (24 hours)
    Second_update_time = 24 * 60 * 60
    now_Time = datetime.now()
    plan_Time = now_Time.replace(hour=9, minute=0, second=0, microsecond=0)
    delta = plan_Time - now_Time
    first_plan_Time = delta.total_seconds() % Second_update_time
    print("距离第一次执行需要睡眠%d秒" % first_plan_Time)
    return first_plan_Time

while True:
    s1 = One_Plan()
    time.sleep(s1)
    exe_file(D_list)
    print("正在执行首次更新程序")

This works for a single daily run but becomes limited when multiple runs per day are needed.

Method 2: Timer module

Use threading.Timer to schedule a task after a delay.

from datetime import datetime
from threading import Timer
import time

def task():
    print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

def timedTask():
    '''
    first argument: delay in seconds
    second argument: function to execute
    third argument: tuple of arguments
    '''
    Timer(5, task, ()).start()

while True:
    timedTask()
    time.sleep(5)

Method 3: sched module

Leverage the sched scheduler for more control.

from datetime import datetime
import sched
import time

def timedTask():
    scheduler = sched.scheduler(time.time, time.sleep)
    scheduler.enter(5, 1, task)
    scheduler.run()

def task():
    print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

if __name__ == '__main__':
    timedTask()

Additional example using the schedule library shows how to run tasks at specific times each day.

import schedule
import time

def hellow():
    print('hellow')

def Timer():
    schedule.every().day.at("09:00").do(hellow)
    schedule.every().day.at("18:00").do(hellow)

    while True:
        schedule.run_pending()
        time.sleep('需要睡眠的周期')
Timer()

These approaches allow flexible scheduling of Python crawlers, from simple loops to dedicated scheduling libraries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Scheduling Timer web crawling sched

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.