What 1.38 Million Zhihu Followers Reveal: A Python Scraping & Visualization Journey
This article documents a Python‑based web‑scraping project that harvested over 1.38 million Zhihu followers, filtered high‑impact users, and visualized insights such as follower distribution, gender ratio, top influencers, geographic spread, education, industry, and certification details, highlighting challenges and lessons learned.
Introduction
The author, a Python enthusiast, describes why they chose to crawl the Zhihu user with the largest follower count (Zhang Gongzi, 1.38 M+ followers) and outlines previous small‑scale crawling experiments, such as movie rankings and literary collections.
Using the Zhihu API, they fetched follower data by paging through the followers API with incremental offset values, extracting fields like nickname, user ID, gender, signature, and follow counts. To reduce workload, they filtered for users with more than 100 followers, ending up with about 41 k high‑quality profiles for further analysis.
Data Visualization
1. Followers Count Distribution
A pyramid chart shows the number of users in different follower‑count ranges, highlighting that the majority of users have fewer than 10 k followers, while a tiny fraction exceeds 100 k.
2. Gender Ratio
Among users with over 100 followers, the gender split remains roughly 2:1 (male:female), similar to the overall platform distribution.
3. Top 10 k+ Influencers
A word‑cloud visualizes the most followed users (over 100 k followers), featuring names like Ma Boyong, Hangzhou‑based accounts, and well‑known medical and tech personalities.
4. Geographic Distribution
Using the subset of users with more than 1 k followers, the author maps domestic city frequencies, with Beijing, Shanghai, Shenzhen, Hangzhou, Guangzhou, Chengdu, and Nanjing ranking highest. Overlapping points on the map indicate the need for better visual separation.
5. Top‑20 Profiles
Analysis of the top‑20 users by follower count reveals that most attended elite Chinese universities (985/211), work in internet‑related industries (software, finance, education), and hold positions such as programmers, product managers, founders, and CEOs.
6. Certification Information
Among the 41 k+ filtered users, 208 have verified certifications, including PhDs, post‑docs, doctors, professors, CFA/CPA holders, and executives from notable companies.
7. Outstanding Answerers
The dataset identifies 468 outstanding answerers (including Zhang Jiawei) across 257 topics, contributing a total of 768 answerer tags. Their answer counts and Zhihu‑recorded answers are visualized, showing a few super‑prolific users and many with modest contributions.
Conclusion
The project marks the author’s first experience handling million‑scale data scraping and visualization with Python and ECharts. While the workflow required many manual adjustments—especially for messy location fields and map rendering—the resulting charts provide valuable insights into Zhihu’s follower ecosystem and highlight areas for future improvement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
