21CTO
Oct 1, 2015 · Backend Development
How to Scrape 1.1 Million Zhihu Users with PHP cURL, Multi‑Threading, and Redis
This tutorial walks through collecting over a million Zhihu user profiles using PHP on Ubuntu, handling cookies, bypassing image hot‑link protection, scaling requests with curl_multi, de‑duplicating MySQL inserts, and coordinating work with Redis and multi‑process pcntl for efficient large‑scale web scraping.
LinuxMulti‑processingPHP
0 likes · 15 min read
