How to Scrape 51Job Listings with Python: A Complete Guide
This article walks through a Python-based web scraper that extracts job listings from 51job.com, detailing required header and cookie settings, pagination logic, data parsing, and CSV output, and includes full code snippets and tips for handling site changes and further data analysis.
1. Introduction
The author encountered a Python web‑crawler issue for 51job.com where the original script stopped working after the website’s structure changed. The article shares an updated solution that restores the crawler’s functionality while keeping the original logic intact.
2. Implementation
The revised script sets comprehensive request headers and cookies, iterates over pages 1‑35, builds query parameters for each request, and parses the JSON response to extract fields such as job title, city, salary, education requirement, company name, industry, work experience, benefits, company type, and company size. Extracted data are written to a CSV file for later analysis.
import requests
import time
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
"Connection": "keep-alive",
"From-Domain": "51job_web",
"Origin": "https://we.51job.com",
"Referer": "https://we.51job.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
"account-id": "",
"partner": "",
"property": "...",
"sec-ch-ua": "\"Microsoft Edge\";v=\"113\", \"Chromium\";v=\"113\", \"Not-A.Brand\";v=\"24\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"Windows\"",
"sign": "aeed648e6141b18dd1c49117a9338bb8c44ce3803b98e0b973d632346da193e8",
"user-token": "",
"uuid": "064dee39ac7e7feee763f449b69faa9a"
}
cookies = {
"guid": "064dee39ac7e7feee763f449b69faa9a",
"sajssdk_2015_cross_new_user": "1",
"sensorsdata2015jssdkcross": "...",
"nsearch": "jobarea%3D%26|%26ord_field%3D%26|%26recentSearch0%3D%26|%26recentSearch1%3D%26|%26recentSearch2%3D%26|%26recentSearch3%3D%26|%26recentSearch4%3D%26|%26collapse_expansion%3D",
"search": "...",
"acw_tc": "ac11000116841593187302414e00e0549c125d5ad342ec325b0211a4e1ec13",
"uid": "wKhJRWRiO1dsWBM3SVrMAg==",
"JSESSIONID": "00DEC3853A738DF9B12A498C25728575",
"ssxmod_itna": "...",
"ssxmod_itna2": "..."
}
url = "https://cupidjob.51job.com/open/noauth/search-pc"
f = open('job.csv', mode='a', encoding='utf-8')
for page in range(1, 36):
print(f"正在抓取第{page}页...")
time.sleep(3)
params = {
"api_key": "51job",
"timestamp": "1684159452",
"keyword": f"{v}",
"searchType": "2",
"jobArea": "000000",
"sortType": "0",
"pageNum": f"{page}",
"requestId": "0266bbd1054b9bb1ec7a0066e6e5060c",
"pageSize": "20",
"source": "1",
"pageCode": "sou|sou|soulb"
}
response = requests.get(url, headers=headers, cookies=cookies, params=params)
rows = response.json()["resultbody"]
for row in rows["job"]["items"]:
job_name = row["jobName"]
city = row["jobAreaString"]
salary = row["provideSalaryString"]
education = row["degreeString"]
company_name = row["fullCompanyName"]
field = row["industryType1Str"]
working_years = row["workYearString"]
jobwelf_list = "|".join(row["jobTags"])
companytype_text = row["companyTypeString"]
companysize_text = row["companySizeString"]
print(working_years, jobwelf_list, companytype_text, companysize_text)
f.write(f"{job_name}, {city}, {salary}, {education}, {company_name}, {field}, {working_years}, {jobwelf_list}, {companytype_text}, {companysize_text}
")Running the script produces a CSV file containing thousands of job records. Sample output screenshots are shown below.
Another image illustrates the volume of data collected for a single position.
If you need to scrape a different job category, you may have to update the cookie values; otherwise the request may not return the expected information.
3. Conclusion
The provided Python script demonstrates a practical approach to harvesting job data from 51job.com, which can be stored in a database, visualized through a web interface, or analyzed for research purposes. It also serves as a solid foundation for academic assignments or small‑scale data‑science projects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
