Big Data 12 min read

What Do Beijing’s 2020 Points‑Based Household Registration Data Reveal? A Shell‑Powered Analysis

The article examines the 2020 Beijing points‑based household registration dataset, showing how to download the CSV, use shell commands like awk, sort, uniq, and grep to extract score, company, surname, given‑name, and age distributions, and highlights the top companies and common surnames.

Liangxu Linux
Liangxu Linux
Liangxu Linux
What Do Beijing’s 2020 Points‑Based Household Registration Data Reveal? A Shell‑Powered Analysis

Background

On the day the annual Beijing points‑based household registration (积分落户) results were announced for 2020, the official website of the Beijing Human Resources and Social Security Bureau released a public CSV file containing 6,032 records.

Data source and format

The CSV includes fields for name, birth date, employer, and points score. Each row represents one applicant.

Score distribution

Most applicants scored between 97 and 102 points. Small differences of 0.01 points can separate dozens of people (e.g., 98.17 points has 39 applicants, 98.16 points has 21).

➜  积分落户2020数据分析  git:(master) ✗ awk '{print $5}' 10000.csv | sort | uniq -c | sort -nr -k 1 | head -n 10
  98 97.50
  84 97.25
  80 97.33
  73 97.17
  72 97.21
  67 98.50
  66 98.00
  61 97.46
  57 98.46
  54 97.13
➜  积分落户2020数据分析  git:(master) ✗ awk '{print $5}' 10000.csv | sort | uniq -c | sort -nr -k 1 | grep 98.17
  39 98.17
➜  积分落户2020数据分析  git:(master) ✗ awk '{print $5}' 10000.csv | sort | uniq -c | sort -nr -k 1 | grep 98.16
  21 98.16
Score distribution chart
Score distribution chart

Top 10 companies receiving household‑registration slots

Using a simple pipeline of awk, sort, uniq, and head -n 10 yields the companies with the most slots.

➜  首批积分落户  > grep 'unit' jifenluohu.json| cut -f2 -d: | sort | uniq -c | sort -nr -k 1 | head -n 10
 137 "北京华为数字技术有限公司"
  73 "中央电视台"
  57 "北京首钢建设集团有限公司"
  55 "百度在线网络技术(北京)有限公司"
  48 "联想(北京)有限公司"
  40 "北京外企人力资源服务有限公司"
  40 "中国民生银行股份有限公司"
  39 "国际商业机器(中国)投资有限公司"
  29 "中国国际技术智力合作有限公司"
  27 "华为技术有限公司北京研究所"

The list shows Huawei leading again, followed by CCTV, while Baidu drops and Tencent rises.

Company distribution chart
Company distribution chart

Most common surnames among applicants

Extracting the first character of the name field and counting occurrences reveals that surnames like "张" and "王" dominate.

➜  首批积分落户  > grep '"name":' jifenluohu.json| sed 's|"name": "||g' | sed 's| ||g' | cut -c 1 | sort | uniq -c | sort -nr -k 1 | head -n 10
 541 张
 531 王
 462 李
 376 刘
 205 陈
 193 杨
 166 赵
 132 孙
  95 郭
  95 徐
Surname distribution chart
Surname distribution chart

Given‑name popularity

Counting full given names shows that names like "王鹏" appear most frequently.

➜  积分落户2020数据分析  git:(master) ✗ awk '{print $2}' 10000.csv  | sort | uniq -c | sort -nr -k 1 | head -n 10
   9 王鹏
   6 王伟
   6 张颖
   5 赵静
   5 石磊
   5 王琳
   5 王燕
   5 王涛
   5 王勇
   5 孙涛

Age distribution

By extracting the birth year and computing the age (2020 minus birth year), the age range spans from 32 to 59 years, with the bulk between 38 and 47.

# Using awk on the CSV
awk '{print $3}' 10000.csv | cut -f1 -d"-" | awk '{print 2020-$1}' | sort | uniq -c
   1 32
   3 35
  30 36
  83 37
 290 38
 468 39
 644 40
 741 41
 808 42
 751 43
 636 44
 507 45
 365 46
 329 47
 108 48
 107 49
  85 50
  27 51
   6 52
  10 53
   9 54
   8 55
   6 56
   5 57
   3 58
   2 59
Age distribution chart
Age distribution chart

Conclusion

The 2020 dataset confirms Huawei’s leading role in securing household‑registration slots, shows a narrow age window (mostly 38‑47), and highlights that common surnames remain the same as national trends. The analysis can be reproduced with simple shell one‑liners, and the full list of 6,032 records is available by replying “2020积分落户” to the original source.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shellBeijingdata-analysisawkpoints-registration
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.