Optimizing feapder Spider with Gevent: Reducing CPU Usage and Thread Count
This article demonstrates how adding two gevent monkey‑patch lines to a feapder spider reduces CPU usage from 121% to 99% while changing the effective thread count from 36 to 12, and discusses the underlying principle, performance trade‑offs, and future directions for coroutine support.
Test Code
The original spider runs 32 threads to send 10,000 requests to Baidu.
import time
import feapder
from feapder.utils.log import log
class TestSpider(feapder.AirSpider):
def start_requests(self):
for i in range(10000):
yield feapder.Request(f"https://baidu.com#{i}")
def parse(self, request, response):
log.debug(response)
def start_callback(self):
self.start_time = time.time()
def end_callback(self):
self.end_time = time.time()
log.debug(f"耗时 {self.end_time - self.start_time}")
if __name__ == "__main__":
TestSpider(thread_count=32).start()The test shows a CPU usage of 121% and a total runtime of 288 seconds.
CPU Usage Before Optimization
CPU usage: 121%, duration: 288 seconds.
CPU Usage After Optimization
CPU usage: 99%, duration: 317 seconds.
Two Lines to Add
Insert the following two lines at the very top of the file:
from gevent import monkey
monkey.patch_all(os=False, subprocess=False, signal=False)Full Optimized Code
from gevent import monkey
monkey.patch_all(os=False, subprocess=False, signal=False)
import time
import feapder
from feapder.utils.log import log
class TestSpider(feapder.AirSpider):
def start_requests(self):
for i in range(10000):
yield feapder.Request(f"https://baidu.com#{i}")
def parse(self, request, response):
log.debug(response)
def start_callback(self):
self.start_time = time.time()
def end_callback(self):
self.end_time = time.time()
log.debug(f"耗时 {self.end_time - self.start_time}")
if __name__ == "__main__":
TestSpider(thread_count=32).start()Principle
Before optimization the total thread count was 36 (spider threads + framework scheduler threads) plus the threads created by monkey.patch_all . After applying the two gevent lines the count drops to 12 because monkey.patch_all converts many blocking calls into cooperative greenlets, eliminating the need for separate OS threads.
Gevent’s monkey‑patch replaces thread‑based blocking I/O with coroutine‑based non‑blocking I/O, reducing context‑switch overhead and CPU consumption.
Summary and Thoughts
Summary
Using gevent reduces CPU usage by about 20% while the runtime may increase slightly due to network variability.
Coroutines have lower overhead than threads and are more efficient.
Reflection
Why doesn’t feapder use coroutines by default? The framework was originally built five years ago when the author was unfamiliar with Python’s async ecosystem; threads and the requests library were simpler to adopt.
Switching to asyncio would require extensive refactoring and introduce async / await syntax, increasing the learning curve without significant speed gains, as tests show comparable crawl speeds between threads and coroutines.
Future plans may include integrating gevent or Twisted for asynchronous execution, pending community feedback and stability testing.
Conclusion
For now, adding the two gevent monkey‑patch lines is a quick way to lower CPU usage when using feapder; if the approach proves stable and memory‑safe, the framework may eventually embed gevent support.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.