Tagged articles
1 articles
Page 1 of 1
21CTO
21CTO
Oct 1, 2015 · Backend Development

How to Scrape 1.1 Million Zhihu Users with PHP cURL, Multi‑Threading, and Redis

This tutorial walks through collecting over a million Zhihu user profiles using PHP on Ubuntu, handling cookies, bypassing image hot‑link protection, scaling requests with curl_multi, de‑duplicating MySQL inserts, and coordinating work with Redis and multi‑process pcntl for efficient large‑scale web scraping.

LinuxMulti‑processingPHP
0 likes · 15 min read
How to Scrape 1.1 Million Zhihu Users with PHP cURL, Multi‑Threading, and Redis