How to Scrape Python Job Listings and Visualize Trends with pyecharts
This article walks through collecting Python job postings from Lagou by handling anti‑scraping measures, parsing POST requests, storing results in Excel, and then using pyecharts to create bar, map, and pie visualizations that reveal city distribution, salary ranges, and experience requirements.
Background
The author decided to collect Python job postings from a recruitment website to analyze salary and location distribution.
Handling Anti‑Scraping Measures
Requests must include fake headers; otherwise the site returns "Your request is too frequent, please try later". A session is created and updated for each request to bypass multiple anti‑scraping strategies.
Request Analysis
Inspecting the network panel reveals the POST URL:
https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=falseThe response may contain an error like:
{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"124.77.161.207","state":2402}Each page request sends a POST with parameters and an accompanying GET request.
datas = {
'first': 'false',
'pn': x,
'kd': 'python',
}Python Script for Crawling Lagou
#!/usr/bin/env python3.4
# encoding: utf-8
"""
Created on 19-5-05
@title: ''
@author: Xusl
"""
import json
import requests
import xlwt
import time
def get_json(url, datas):
my_headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36",
"Referer": "https://www.lagou.com/jobs/list_Python?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=",
"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"
}
time.sleep(5)
ses = requests.session()
ses.headers.update(my_headers)
ses.get("https://www.lagou.com/jobs/list_python?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=")
content = ses.post(url=url, data=datas)
result = content.json()
info = result['content']['positionResult']['result']
info_list = []
for job in info:
information = []
information.append(job['positionId'])
information.append(job['city'])
information.append(job['companyFullName'])
information.append(job['companyLabelList'])
information.append(job['district'])
information.append(job['education'])
information.append(job['firstType'])
information.append(job['formatCreateTime'])
information.append(job['positionName'])
information.append(job['salary'])
information.append(job['workYear'])
info_list.append(information)
return info_list
def main():
page = int(input('请输入你要抓取的页码总数:'))
info_result = []
title = ['岗位id','城市','公司全名','福利待遇','工作地点','学历要求','工作类型','发布时间','职位名称','薪资','工作年限']
info_result.append(title)
for x in range(1, page+1):
url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
datas = {'first':'false','pn':x,'kd':'python'}
try:
info = get_json(url, datas)
info_result = info_result + info
print("第%s页正常采集" % x)
except Exception as msg:
print("第%s页出现问题" % x)
workbook = xlwt.Workbook(encoding='utf-8')
worksheet = workbook.add_sheet('lagouzp', cell_overwrite_ok=True)
for i, row in enumerate(info_result):
for j, col in enumerate(row):
worksheet.write(i, j, col)
workbook.save('lagouzp.xls')
if __name__ == '__main__':
main()Data Visualization with pyecharts
pyecharts combines Python with ECharts and supports many chart types such as Bar, Bar3D, Boxplot, EffectScatter, Funnel, Gauge, Geo, Graph, HeatMap, Kline, Line, Line3D, Liquid, Map, Parallel, Pie, Polar, Radar, Sankey, Scatter, Scatter3D, ThemeRiver, WordCloud, and custom classes like Grid, Overlap, Page, Timeline.
From version 0.3.2, map JS files are no longer bundled; users must install them separately.
pip install echarts-countries-pypkg
pip install echarts-china-provinces-pypkg
pip install echarts-china-cities-pypkgVisualization Examples
Bar chart of Python job counts per city:
from pyecharts import Bar
city_nms = ['北京','上海','深圳','成都','杭州','广州','武汉','南京','苏州','郑州','天津','西安','东莞','珠海','合肥','厦门','宁波','南宁','重庆','佛山','大连','哈尔滨','长沙','福州','中山']
city_nums = [149,95,77,22,17,17,16,13,7,5,4,4,3,2,2,2,1,1,1,1,1,1,1,1,1]
bar = Bar('Python岗位','各城市数量')
bar.add('数量', city_nms, city_nums, is_more_utils=True)
bar.render('Python岗位各城市数量.html')Geo map of city distribution:
from pyecharts import Geo
city_datas = [('北京',149),('上海',95),('深圳',77),('成都',22),('杭州',17),('广州',17),('武汉',16),('南京',13),('苏州',7),('郑州',5),('天津',4),('西安',4),('东莞',3),('珠海',2),('合肥',2),('厦门',2),('宁波',1),('南宁',1),('重庆',1),('佛山',1),('大连',1),('哈尔滨',1),('长沙',1),('福州',1),('中山',1)]
geo = Geo('Python岗位城市分布地图','数据来源拉勾',title_color='#fff',title_pos='center',width=1200,height=600,background_color='#404a59')
attr, value = geo.cast(city_datas)
geo.add('', attr, value, visual_range=[0,200], visual_text_color='#fff', symbol_size=15, is_visualmap=True)
geo.render('Python岗位城市分布地图_scatter.html')Pie chart of top cities:
from pyecharts import Pie
city_nms_top10 = ['北京','上海','深圳','成都','广州','杭州','武汉','南京','苏州','郑州']
city_nums_top10 = [149,95,77,22,17,17,16,13,7,5]
pie = Pie()
pie.add('', city_nms_top10, city_nums_top10, is_label_show=True)
pie.render('Python岗位各城市分布饼图.html')Insights
First‑tier cities (Beijing, Shanghai, Shenzhen) dominate Python job postings, reflecting higher salaries and more opportunities. Most positions require a bachelor's degree and 1‑5 years of experience, with salaries typically ranging from 10k to 20k RMB.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
