Big Data 11 min read

What 1.38 Million Zhihu Followers Reveal: A Python Scraping & Visualization Journey

This article documents a Python‑based web‑scraping project that harvested over 1.38 million Zhihu followers, filtered high‑impact users, and visualized insights such as follower distribution, gender ratio, top influencers, geographic spread, education, industry, and certification details, highlighting challenges and lessons learned.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
What 1.38 Million Zhihu Followers Reveal: A Python Scraping & Visualization Journey

Introduction

The author, a Python enthusiast, describes why they chose to crawl the Zhihu user with the largest follower count (Zhang Gongzi, 1.38 M+ followers) and outlines previous small‑scale crawling experiments, such as movie rankings and literary collections.

Using the Zhihu API, they fetched follower data by paging through the followers API with incremental offset values, extracting fields like nickname, user ID, gender, signature, and follow counts. To reduce workload, they filtered for users with more than 100 followers, ending up with about 41 k high‑quality profiles for further analysis.

Data Visualization

1. Followers Count Distribution

A pyramid chart shows the number of users in different follower‑count ranges, highlighting that the majority of users have fewer than 10 k followers, while a tiny fraction exceeds 100 k.

2. Gender Ratio

Among users with over 100 followers, the gender split remains roughly 2:1 (male:female), similar to the overall platform distribution.

3. Top 10 k+ Influencers

A word‑cloud visualizes the most followed users (over 100 k followers), featuring names like Ma Boyong, Hangzhou‑based accounts, and well‑known medical and tech personalities.

4. Geographic Distribution

Using the subset of users with more than 1 k followers, the author maps domestic city frequencies, with Beijing, Shanghai, Shenzhen, Hangzhou, Guangzhou, Chengdu, and Nanjing ranking highest. Overlapping points on the map indicate the need for better visual separation.

5. Top‑20 Profiles

Analysis of the top‑20 users by follower count reveals that most attended elite Chinese universities (985/211), work in internet‑related industries (software, finance, education), and hold positions such as programmers, product managers, founders, and CEOs.

6. Certification Information

Among the 41 k+ filtered users, 208 have verified certifications, including PhDs, post‑docs, doctors, professors, CFA/CPA holders, and executives from notable companies.

7. Outstanding Answerers

The dataset identifies 468 outstanding answerers (including Zhang Jiawei) across 257 topics, contributing a total of 768 answerer tags. Their answer counts and Zhihu‑recorded answers are visualized, showing a few super‑prolific users and many with modest contributions.

Conclusion

The project marks the author’s first experience handling million‑scale data scraping and visualization with Python and ECharts. While the workflow required many manual adjustments—especially for messy location fields and map rendering—the resulting charts provide valuable insights into Zhihu’s follower ecosystem and highlight areas for future improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

pandasbig-dataweb-scrapingdata-visualization
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.