Scrape Lagou Python Job Data and Visualize Trends with Python
This guide walks you through extracting Python job listings from Lagou.com with Python's requests library, parsing the JSON response, storing the results in CSV, and visualizing key insights such as education requirements, work experience, salary distribution, word clouds, and geographic salary heatmaps using pandas, matplotlib, and pyecharts.
Introduction
This article demonstrates how to collect Python job postings from Lagou.com using Python's requests library, handle the site's POST API, and then visualize the extracted data.
Web Scraping
The target URL is
https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false&isSchoolJob=0. Required POST parameters include kd (keyword) and pn (page number). The script sets appropriate headers to mimic a browser and iterates over pages, pausing between requests.
import requests
import re
import time
import random
url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false&isSchoolJob=0'
header = {
'Host': 'www.lagou.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,en-US;q=0.7,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.lagou.com/jobs/list_Python?labelWords=&fromSearch=true&suginput=',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'X-Anit-Forge-Token': 'None',
'X-Anit-Forge-Code': '0',
'Cookie': '... (omitted for brevity) ...',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache'
}
for n in range(30):
form = {'first': 'false', 'kd': 'Python', 'pn': str(n)}
time.sleep(random.randint(2,5))
html = requests.post(url, data=form, headers=header)
data = re.findall('{"companyId":.*?"positionName":"(.*?)","workYear":"(.*?)","education":"(.*?)","jobNature":"(.*?)","financeStage":"(.*?)","companyLogo":".*?","industryField":".*?","city":"(.*?)","salary":"(.*?)","positionId":.*?,"positionAdvantage":"(.*?)","companyShortName":"(.*?)","district"', html.text)
# Convert to DataFrame, save CSV, etc.Data Visualization
After saving the CSV, pandas reads the data and several plots are generated: education level bar chart, work experience bar chart, salary distribution pie chart, word cloud of job titles, and a geographic heat map of average salaries using pyecharts.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud
from pyecharts import Geo
# Example: plot education distribution
data['学历要求'].value_counts().plot(kind='barh')
plt.show()Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
