What 13,966 Ops Job Listings Reveal About Salary, Skills, and Hot Cities?
This article analyzes 13,966 Chinese operations engineering job postings collected from 51job, detailing scraping methods, data cleaning steps, and visualizations that uncover top hiring industries, city demand, salary ranges, required education, company size distribution, and keyword trends for the ops market.
The author collected 13,966 operations (运维) job postings from 51job, using XPath for web scraping, Pandas for data cleaning, and Pyecharts for visualization.
1. Web Scraping
The scraper extracts job name, company name, location, salary, release date, experience, education, company type, size, and industry using XPath expressions.
# 1、岗位名称
job_name = dom.xpath('//div[@class="dw_table"]/div[@class="el"]//p/span/a[@target="_blank"]/@title')
# 2、公司名称
company_name = dom.xpath('//div[@class="dw_table"]/div[@class="el"]/span[@class="t2"]/a[@target="_blank"]/@title')
# ... (other fields omitted for brevity)2. Data Cleaning
Data is loaded with pandas.read_csv, indexed, and duplicate records are removed. Columns are renamed, salary strings are parsed into numeric ranges, locations and company sizes are standardized, and education levels are extracted with regular expressions.
# Read data
import pandas as pd, numpy as np, re, jieba
df = pd.read_csv("only_yun_wei.csv", encoding="gbk", header=None)
# Set index and columns
df.index = range(len(df))
df.columns = ["岗位名","公司名","工作地点","工资","发布日期","经验与学历","公司类型","公司规模","行业","工作描述"]
# Remove duplicates
df.drop_duplicates(subset=["公司名","岗位名","工作地点"], inplace=True)
# Parse salary
def get_money_max_min(x):
try:
if x[-3] == "万":
z = [float(i)*10000 for i in re.findall("[0-9]+\.?[0-9]*", x)]
elif x[-3] == "千":
z = [float(i)*1000 for i in re.findall("[0-9]+\.?[0-9]*", x)]
if x[-1] == "年":
z = [i/12 for i in z]
return z
except:
return x
salary = job_info["工资"].apply(get_money_max_min)
job_info["最低工资"] = salary.str[0]
job_info["最高工资"] = salary.str[1]
job_info["工资水平"] = job_info[["最低工资","最高工资"]].mean(axis=1)3. Data Visualization
Several visualizations illustrate the findings:
Top 10 hiring industries (e.g., computer software, internet, telecom).
Top 10 cities by job count (Beijing, Shanghai, Guangzhou, Shenzhen).
Provincial distribution of positions, highlighting Guangdong, Jiangsu, Shanghai, and Beijing.
Company size demand, showing 50‑500 employees as the most sought‑after range.
Average salaries for the top 10 positions, with DevOps, application ops, database ops, and Linux ops exceeding 10k RMB.
Education requirements, dominated by associate and bachelor degrees.
Word‑cloud of job‑posting keywords, emphasizing terms like "运维", "能力", "系统", "维护", "经验".
In summary, the analysis shows which industries, cities, and company sizes have the highest demand for operations engineers, the average salaries for key roles, the prevalent education requirements, and the most frequent keywords in job postings, providing valuable guidance for job seekers and recruiters in the ops field.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
