How to Quickly Analyze Beijing Residency Data with Shell Commands
This tutorial shows how to use standard Unix shell tools such as grep, cut, sort, uniq, awk, and join to extract insights—top companies, most common surnames, popular given names, age distribution, and hometown statistics—from a JSON dataset of over 6,000 Beijing residency applicants.
The article demonstrates a practical workflow for analyzing a JSON file containing information about the first batch of Beijing points‑based residency applicants. The dataset, which can be downloaded in a single request, includes fields like name, ID card, score, company, and other attributes.
Problem Description
The input is a JSON array named rows. Each element represents a candidate with properties such as name, idCard, unit (company), score, and more. The goal is to answer several analytical questions as fast as possible using only shell commands.
{
"id": 62981,
"idCard": "32092219721222****",
"unit": "北京利德华福电气技术有限公司",
"name": "杨效丰",
"ranking": 1,
"score": 122.59,
...
}Tasks
Top 10 companies that obtained the most residency slots
Most frequent surname among successful applicants
Most popular given name (first two characters after the surname)
Age distribution of the applicants
Top 10 hometowns (based on ID card region code)
Additional optional analyses such as zodiac or constellation frequencies
Solutions
Top 10 Companies
Extract the unit field, count occurrences, sort by count descending, and keep the first ten lines.
grep 'unit' jifenluohu.json | cut -f2 -d: | sort | uniq -c | sort -nr -k1 | head -n 10Sample output:
137 "北京华为数字技术有限公司"
73 "中央电视台"
57 "北京首钢建设集团有限公司"
55 "百度在线网络技术(北京)有限公司"
48 "联想(北京)有限公司"
40 "北京外企人力资源服务有限公司"
40 "中国民生银行股份有限公司"
39 "国际商业机器(中国)投资有限公司"
29 "中国国际技术智力合作有限公司"
27 "华为技术有限公司北京研究所"Most Frequent Surname
Extract the name field, strip everything before the first character, count and sort.
grep '"name":' jifenluohu.json | sed 's|"name": "||g' | cut -c 1 | sort | uniq -c | sort -nr -k1 | head -n 10Result shows surnames such as 张, 王, 李, 刘, 陈, etc.
Most Popular Given Name
After removing the surname, take the next two characters to form the given name and count frequencies.
grep '"name":' jifenluohu.json | sed 's|"name": "||g' | cut -c 2-4 | sort | uniq -c | sort -nr -k1 | head -n 10Typical popular given names include 伟, 静, 浩, 勇, 军, 敏, 颖, 鹏, etc.
Age Distribution
Extract the birth year from the ID card (characters 9‑12), compute age as 2019 - birth_year, then count occurrences.
grep '"idCard":' jifenluohu.json | cut -f2 -d: | cut -c 9-12 | awk '{print 2019-$1}' | sort | uniq -cThe output lists ages from 34 to 61 with corresponding counts.
Top 10 Hometowns
Use the first four digits of the ID card to obtain the region code, then join with a city code table to translate codes to city names.
# Extract region codes and count
grep '"idCard":' jifenluohu.json | cut -f2 -d: | cut -c 3-6 | sort | uniq -c | sort -nr -k1 > topcity.code
# Join with city.csv (pre‑processed to <code>code name</code>)
join -1 1 -2 2 city.code4 <(head -n 10 topcity.code | sort -k2)Result shows cities such as 天津市, 河北省石家庄市, etc., with the number of applicants from each.
Further Explorations
The same approach can be extended to compute the most common zodiac signs, constellations, or birthdays by extracting the relevant parts of the ID card and applying similar cut / sort / uniq pipelines.
All commands rely only on standard Unix utilities, making the analysis fast (often completing in seconds) and reproducible.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
