Backend Development 6 min read

Optimizing feapder Spider with Gevent: Reducing CPU Usage and Thread Count

This article demonstrates how adding two gevent monkey‑patch lines to a feapder spider reduces CPU usage from 121% to 99% while changing the effective thread count from 36 to 12, and discusses the underlying principle, performance trade‑offs, and future directions for coroutine support.

IT Services Circle

Jul 5, 2022

Optimizing feapder Spider with Gevent: Reducing CPU Usage and Thread Count

Test Code

The original spider runs 32 threads to send 10,000 requests to Baidu.

import time
import feapder
from feapder.utils.log import log

class TestSpider(feapder.AirSpider):
    def start_requests(self):
        for i in range(10000):
            yield feapder.Request(f"https://baidu.com#{i}")
    def parse(self, request, response):
        log.debug(response)
    def start_callback(self):
        self.start_time = time.time()
    def end_callback(self):
        self.end_time = time.time()
        log.debug(f"耗时 {self.end_time - self.start_time}")

if __name__ == "__main__":
    TestSpider(thread_count=32).start()

The test shows a CPU usage of 121% and a total runtime of 288 seconds.

CPU Usage Before Optimization

CPU usage: 121%, duration: 288 seconds.

CPU Usage After Optimization

CPU usage: 99%, duration: 317 seconds.

Two Lines to Add

Insert the following two lines at the very top of the file:

from gevent import monkey
monkey.patch_all(os=False, subprocess=False, signal=False)

Full Optimized Code

from gevent import monkey
monkey.patch_all(os=False, subprocess=False, signal=False)

import time
import feapder
from feapder.utils.log import log

class TestSpider(feapder.AirSpider):
    def start_requests(self):
        for i in range(10000):
            yield feapder.Request(f"https://baidu.com#{i}")
    def parse(self, request, response):
        log.debug(response)
    def start_callback(self):
        self.start_time = time.time()
    def end_callback(self):
        self.end_time = time.time()
        log.debug(f"耗时 {self.end_time - self.start_time}")

if __name__ == "__main__":
    TestSpider(thread_count=32).start()

Principle

Before optimization the total thread count was 36 (spider threads + framework scheduler threads) plus the threads created by monkey.patch_all. After applying the two gevent lines the count drops to 12 because monkey.patch_all converts many blocking calls into cooperative greenlets, eliminating the need for separate OS threads.

Gevent’s monkey‑patch replaces thread‑based blocking I/O with coroutine‑based non‑blocking I/O, reducing context‑switch overhead and CPU consumption.

Summary and Thoughts

Summary

Using gevent reduces CPU usage by about 20% while the runtime may increase slightly due to network variability.

Coroutines have lower overhead than threads and are more efficient.

Reflection

Why doesn’t feapder use coroutines by default? The framework was originally built five years ago when the author was unfamiliar with Python’s async ecosystem; threads and the requests library were simpler to adopt.

Switching to asyncio would require extensive refactoring and introduce async / await syntax, increasing the learning curve without significant speed gains, as tests show comparable crawl speeds between threads and coroutines.

Future plans may include integrating gevent or Twisted for asynchronous execution, pending community feedback and stability testing.

Conclusion

For now, adding the two gevent monkey‑patch lines is a quick way to lower CPU usage when using feapder; if the approach proves stable and memory‑safe, the framework may eventually embed gevent support.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python CPU optimization multithreading web crawling gevent feapder

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.