How to Schedule Python Web Crawlers: 3 Simple Methods Explained
This article demonstrates three practical ways to schedule Python web‑crawling tasks—using an infinite while loop, the Timer module, and the sched module—providing code snippets, usage tips, and considerations for handling multiple runs and resource constraints.
Introduction
Previously Windows Task Scheduler worked, but now the author records Python web‑crawling scheduling methods.
Method 1: while True loop
Use a simple infinite loop that calculates the next execution time and sleeps until then.
import os
import time
import sys
from datetime import datetime, timedelta
def One_Plan():
# set period (24 hours)
Second_update_time = 24 * 60 * 60
now_Time = datetime.now()
plan_Time = now_Time.replace(hour=9, minute=0, second=0, microsecond=0)
delta = plan_Time - now_Time
first_plan_Time = delta.total_seconds() % Second_update_time
print("距离第一次执行需要睡眠%d秒" % first_plan_Time)
return first_plan_Time
while True:
s1 = One_Plan()
time.sleep(s1)
exe_file(D_list)
print("正在执行首次更新程序")This works for a single daily run but becomes limited when multiple runs per day are needed.
Method 2: Timer module
Use threading.Timer to schedule a task after a delay.
from datetime import datetime
from threading import Timer
import time
def task():
print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
def timedTask():
'''
first argument: delay in seconds
second argument: function to execute
third argument: tuple of arguments
'''
Timer(5, task, ()).start()
while True:
timedTask()
time.sleep(5)Method 3: sched module
Leverage the sched scheduler for more control.
from datetime import datetime
import sched
import time
def timedTask():
scheduler = sched.scheduler(time.time, time.sleep)
scheduler.enter(5, 1, task)
scheduler.run()
def task():
print(datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
if __name__ == '__main__':
timedTask()Additional example using the schedule library shows how to run tasks at specific times each day.
import schedule
import time
def hellow():
print('hellow')
def Timer():
schedule.every().day.at("09:00").do(hellow)
schedule.every().day.at("18:00").do(hellow)
while True:
schedule.run_pending()
time.sleep('需要睡眠的周期')
Timer()These approaches allow flexible scheduling of Python crawlers, from simple loops to dedicated scheduling libraries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
