Big Data 13 min read

How to Scrape Python Job Listings and Visualize Trends with pyecharts

This article walks through collecting Python job postings from Lagou by handling anti‑scraping measures, parsing POST requests, storing results in Excel, and then using pyecharts to create bar, map, and pie visualizations that reveal city distribution, salary ranges, and experience requirements.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Scrape Python Job Listings and Visualize Trends with pyecharts

Background

The author decided to collect Python job postings from a recruitment website to analyze salary and location distribution.

Handling Anti‑Scraping Measures

Requests must include fake headers; otherwise the site returns "Your request is too frequent, please try later". A session is created and updated for each request to bypass multiple anti‑scraping strategies.

Request Analysis

Inspecting the network panel reveals the POST URL:

https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false

The response may contain an error like:

{"status":false,"msg":"您操作太频繁,请稍后再访问","clientIp":"124.77.161.207","state":2402}

Each page request sends a POST with parameters and an accompanying GET request.

datas = {
    'first': 'false',
    'pn': x,
    'kd': 'python',
}

Python Script for Crawling Lagou

#!/usr/bin/env python3.4
# encoding: utf-8
"""
Created on 19-5-05
@title: ''
@author: Xusl
"""
import json
import requests
import xlwt
import time

def get_json(url, datas):
    my_headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36",
        "Referer": "https://www.lagou.com/jobs/list_Python?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=",
        "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"
    }
    time.sleep(5)
    ses = requests.session()
    ses.headers.update(my_headers)
    ses.get("https://www.lagou.com/jobs/list_python?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=")
    content = ses.post(url=url, data=datas)
    result = content.json()
    info = result['content']['positionResult']['result']
    info_list = []
    for job in info:
        information = []
        information.append(job['positionId'])
        information.append(job['city'])
        information.append(job['companyFullName'])
        information.append(job['companyLabelList'])
        information.append(job['district'])
        information.append(job['education'])
        information.append(job['firstType'])
        information.append(job['formatCreateTime'])
        information.append(job['positionName'])
        information.append(job['salary'])
        information.append(job['workYear'])
        info_list.append(information)
    return info_list

def main():
    page = int(input('请输入你要抓取的页码总数:'))
    info_result = []
    title = ['岗位id','城市','公司全名','福利待遇','工作地点','学历要求','工作类型','发布时间','职位名称','薪资','工作年限']
    info_result.append(title)
    for x in range(1, page+1):
        url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
        datas = {'first':'false','pn':x,'kd':'python'}
        try:
            info = get_json(url, datas)
            info_result = info_result + info
            print("第%s页正常采集" % x)
        except Exception as msg:
            print("第%s页出现问题" % x)
        workbook = xlwt.Workbook(encoding='utf-8')
        worksheet = workbook.add_sheet('lagouzp', cell_overwrite_ok=True)
        for i, row in enumerate(info_result):
            for j, col in enumerate(row):
                worksheet.write(i, j, col)
        workbook.save('lagouzp.xls')

if __name__ == '__main__':
    main()

Data Visualization with pyecharts

pyecharts combines Python with ECharts and supports many chart types such as Bar, Bar3D, Boxplot, EffectScatter, Funnel, Gauge, Geo, Graph, HeatMap, Kline, Line, Line3D, Liquid, Map, Parallel, Pie, Polar, Radar, Sankey, Scatter, Scatter3D, ThemeRiver, WordCloud, and custom classes like Grid, Overlap, Page, Timeline.

From version 0.3.2, map JS files are no longer bundled; users must install them separately.

pip install echarts-countries-pypkg
pip install echarts-china-provinces-pypkg
pip install echarts-china-cities-pypkg

Visualization Examples

Bar chart of Python job counts per city:

Bar chart
Bar chart
from pyecharts import Bar
city_nms = ['北京','上海','深圳','成都','杭州','广州','武汉','南京','苏州','郑州','天津','西安','东莞','珠海','合肥','厦门','宁波','南宁','重庆','佛山','大连','哈尔滨','长沙','福州','中山']
city_nums = [149,95,77,22,17,17,16,13,7,5,4,4,3,2,2,2,1,1,1,1,1,1,1,1,1]
bar = Bar('Python岗位','各城市数量')
bar.add('数量', city_nms, city_nums, is_more_utils=True)
bar.render('Python岗位各城市数量.html')

Geo map of city distribution:

Geo map
Geo map
from pyecharts import Geo
city_datas = [('北京',149),('上海',95),('深圳',77),('成都',22),('杭州',17),('广州',17),('武汉',16),('南京',13),('苏州',7),('郑州',5),('天津',4),('西安',4),('东莞',3),('珠海',2),('合肥',2),('厦门',2),('宁波',1),('南宁',1),('重庆',1),('佛山',1),('大连',1),('哈尔滨',1),('长沙',1),('福州',1),('中山',1)]
geo = Geo('Python岗位城市分布地图','数据来源拉勾',title_color='#fff',title_pos='center',width=1200,height=600,background_color='#404a59')
attr, value = geo.cast(city_datas)
geo.add('', attr, value, visual_range=[0,200], visual_text_color='#fff', symbol_size=15, is_visualmap=True)
geo.render('Python岗位城市分布地图_scatter.html')

Pie chart of top cities:

Pie chart
Pie chart
from pyecharts import Pie
city_nms_top10 = ['北京','上海','深圳','成都','广州','杭州','武汉','南京','苏州','郑州']
city_nums_top10 = [149,95,77,22,17,17,16,13,7,5]
pie = Pie()
pie.add('', city_nms_top10, city_nums_top10, is_label_show=True)
pie.render('Python岗位各城市分布饼图.html')

Insights

First‑tier cities (Beijing, Shanghai, Shenzhen) dominate Python job postings, reflecting higher salaries and more opportunities. Most positions require a bachelor's degree and 1‑5 years of experience, with salaries typically ranging from 10k to 20k RMB.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonjob marketWeb ScrapingPyecharts
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.