What Happens When Most Language Learners Quit? A Data‑Driven Dive into Shanbay Users
Using Python’s Scrapy, pandas, and seaborn, the author scraped and cleaned public Shanbay user data, stored it in PostgreSQL, and analyzed registration and study habits to reveal that over 68% of users abandon word‑learning on day one, with only a tiny fraction persisting beyond 100 days.
0x00 Introduction
The author wonders how many people actually finish using a vocabulary‑learning app on the first day and how many persist in studying words over time.
0x01 Problem Definition and Task Breakdown
Key questions include: (1) How many users keep up with word learning (defined as >100 days)? (2) How many dreams are lost due to lack of persistence? (3) Does the amount of learned vocabulary follow a normal distribution?
0x02 Task 1 – Data Crawling
Public user data from Shanbay (e.g., http://www.shanbay.com/bdc/review/progress/2 ) was scraped using Python 2 and Scrapy. The site’s anti‑scraping measures required using proxy servers and disabling cookies.
0x03 Task 2 – Cleaning and Storage
Collected records were stored in a PostgreSQL database. Basic cleaning was performed with SQL statements and pandas operations; further purification was optional.
0x04 Task 3 – Analysis
Analysis was carried out in an IPython notebook (Python 3, Anaconda). Visualizations were created with seaborn.
Further segmented histograms for ranges 0‑20, 20‑100, 100‑500, and 500‑2000 days are also included.
0x05 Conclusions
Highest check‑in days: chainyu – 1830 days
Highest growth value: Lerystal – 28,767
Highest word count: chenmaoboss – 38,313
Average metrics per user:
Average check‑in days: 14.18 (11.69% exceed average growth)
Average growth value: 121.79 (11.42% exceed average)
Average learned words: 78.92 (≈2.19% exceed average)
Key findings from the sample (≈600 k users):
68.15% abandon word learning on day 0.
76.40% abandon on day 1.
79.31% abandon on day 2.
83.52% abandon on day 5.
86.95% abandon on day 10.
90.28% abandon on day 20.
94.28% abandon on day 50.
96.69% abandon on day 100.
98.36% abandon before day 200.
98.81% abandon before day 263.
Thus, only a very small fraction of users persist beyond 200 days.
0x06 Reflections and Limitations
The sample is biased toward early‑registered users (IDs 1‑1,111,111). Later users have lower chances of long‑term persistence. Anti‑scraping defenses limited crawling speed; proxy handling and cookie disabling were necessary.
0x07 Code
The scraping and analysis code, along with the IPython notebook, are available on GitHub: https://github.com/twocucao/DataScience/ . The repository does not include proxy settings.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
