Operations 13 min read

How to Quickly Analyze Beijing Residency Data with Shell Commands

This tutorial shows how to use standard Unix shell tools such as grep, cut, sort, uniq, awk, and join to extract insights—top companies, most common surnames, popular given names, age distribution, and hometown statistics—from a JSON dataset of over 6,000 Beijing residency applicants.

Liangxu Linux

Aug 19, 2020

How to Quickly Analyze Beijing Residency Data with Shell Commands

The article demonstrates a practical workflow for analyzing a JSON file containing information about the first batch of Beijing points‑based residency applicants. The dataset, which can be downloaded in a single request, includes fields like name, ID card, score, company, and other attributes.

Problem Description

The input is a JSON array named rows. Each element represents a candidate with properties such as name, idCard, unit (company), score, and more. The goal is to answer several analytical questions as fast as possible using only shell commands.

{
  "id": 62981,
  "idCard": "32092219721222****",
  "unit": "北京利德华福电气技术有限公司",
  "name": "杨效丰",
  "ranking": 1,
  "score": 122.59,
  ...
}

Tasks

Top 10 companies that obtained the most residency slots

Most frequent surname among successful applicants

Most popular given name (first two characters after the surname)

Age distribution of the applicants

Top 10 hometowns (based on ID card region code)

Additional optional analyses such as zodiac or constellation frequencies

Solutions

Top 10 Companies

Extract the unit field, count occurrences, sort by count descending, and keep the first ten lines.

grep 'unit' jifenluohu.json | cut -f2 -d: | sort | uniq -c | sort -nr -k1 | head -n 10

Sample output:

137 "北京华为数字技术有限公司"
 73 "中央电视台"
 57 "北京首钢建设集团有限公司"
 55 "百度在线网络技术（北京）有限公司"
 48 "联想（北京）有限公司"
 40 "北京外企人力资源服务有限公司"
 40 "中国民生银行股份有限公司"
 39 "国际商业机器（中国）投资有限公司"
 29 "中国国际技术智力合作有限公司"
 27 "华为技术有限公司北京研究所"

Most Frequent Surname

Extract the name field, strip everything before the first character, count and sort.

grep '"name":' jifenluohu.json | sed 's|"name": "||g' | cut -c 1 | sort | uniq -c | sort -nr -k1 | head -n 10

Result shows surnames such as 张, 王, 李, 刘, 陈, etc.

Most Popular Given Name

After removing the surname, take the next two characters to form the given name and count frequencies.

grep '"name":' jifenluohu.json | sed 's|"name": "||g' | cut -c 2-4 | sort | uniq -c | sort -nr -k1 | head -n 10

Typical popular given names include 伟, 静, 浩, 勇, 军, 敏, 颖, 鹏, etc.

Age Distribution

Extract the birth year from the ID card (characters 9‑12), compute age as 2019 - birth_year, then count occurrences.

grep '"idCard":' jifenluohu.json | cut -f2 -d: | cut -c 9-12 | awk '{print 2019-$1}' | sort | uniq -c

The output lists ages from 34 to 61 with corresponding counts.

Top 10 Hometowns

Use the first four digits of the ID card to obtain the region code, then join with a city code table to translate codes to city names.

# Extract region codes and count
grep '"idCard":' jifenluohu.json | cut -f2 -d: | cut -c 3-6 | sort | uniq -c | sort -nr -k1 > topcity.code
# Join with city.csv (pre‑processed to <code>code name</code>)
join -1 1 -2 2 city.code4 <(head -n 10 topcity.code | sort -k2)

Result shows cities such as 天津市, 河北省石家庄市, etc., with the number of applicants from each.

Further Explorations

The same approach can be extended to compute the most common zodiac signs, constellations, or birthdays by extracting the relevant parts of the ID card and applying similar cut / sort / uniq pipelines.

All commands rely only on standard Unix utilities, making the analysis fast (often completing in seconds) and reproducible.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data data analysis json Shell

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.