Fundamentals 7 min read

Scrape Lagou Python Job Data and Visualize Trends with Python

This guide walks you through extracting Python job listings from Lagou.com with Python's requests library, parsing the JSON response, storing the results in CSV, and visualizing key insights such as education requirements, work experience, salary distribution, word clouds, and geographic salary heatmaps using pandas, matplotlib, and pyecharts.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Scrape Lagou Python Job Data and Visualize Trends with Python

Introduction

This article demonstrates how to collect Python job postings from Lagou.com using Python's requests library, handle the site's POST API, and then visualize the extracted data.

Web Scraping

The target URL is

https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false&isSchoolJob=0

. Required POST parameters include kd (keyword) and pn (page number). The script sets appropriate headers to mimic a browser and iterates over pages, pausing between requests.

import requests
import re
import time
import random

url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false&isSchoolJob=0'
header = {
    'Host': 'www.lagou.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'Accept-Language': 'zh-CN,en-US;q=0.7,en;q=0.3',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://www.lagou.com/jobs/list_Python?labelWords=&fromSearch=true&suginput=',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'X-Requested-With': 'XMLHttpRequest',
    'X-Anit-Forge-Token': 'None',
    'X-Anit-Forge-Code': '0',
    'Cookie': '... (omitted for brevity) ...',
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache'
}
for n in range(30):
    form = {'first': 'false', 'kd': 'Python', 'pn': str(n)}
    time.sleep(random.randint(2,5))
    html = requests.post(url, data=form, headers=header)
    data = re.findall('{"companyId":.*?"positionName":"(.*?)","workYear":"(.*?)","education":"(.*?)","jobNature":"(.*?)","financeStage":"(.*?)","companyLogo":".*?","industryField":".*?","city":"(.*?)","salary":"(.*?)","positionId":.*?,"positionAdvantage":"(.*?)","companyShortName":"(.*?)","district"', html.text)
    # Convert to DataFrame, save CSV, etc.

Data Visualization

After saving the CSV, pandas reads the data and several plots are generated: education level bar chart, work experience bar chart, salary distribution pie chart, word cloud of job titles, and a geographic heat map of average salaries using pyecharts.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud
from pyecharts import Geo

# Example: plot education distribution
data['学历要求'].value_counts().plot(kind='barh')
plt.show()
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MatplotlibPyechartsweb-scrapingdata-visualization
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.