Backend Development 9 min read

How to Scrape 51Job Listings with Python: A Complete Guide

This article walks through a Python-based web scraper that extracts job listings from 51job.com, detailing required header and cookie settings, pagination logic, data parsing, and CSV output, and includes full code snippets and tips for handling site changes and further data analysis.

Python Crawling & Data Mining

Jun 13, 2023

How to Scrape 51Job Listings with Python: A Complete Guide

1. Introduction

The author encountered a Python web‑crawler issue for 51job.com where the original script stopped working after the website’s structure changed. The article shares an updated solution that restores the crawler’s functionality while keeping the original logic intact.

2. Implementation

The revised script sets comprehensive request headers and cookies, iterates over pages 1‑35, builds query parameters for each request, and parses the JSON response to extract fields such as job title, city, salary, education requirement, company name, industry, work experience, benefits, company type, and company size. Extracted data are written to a CSV file for later analysis.

import requests
import time

headers = {
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "Connection": "keep-alive",
    "From-Domain": "51job_web",
    "Origin": "https://we.51job.com",
    "Referer": "https://we.51job.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-site",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
    "account-id": "",
    "partner": "",
    "property": "...",
    "sec-ch-ua": "\"Microsoft Edge\";v=\"113\", \"Chromium\";v=\"113\", \"Not-A.Brand\";v=\"24\"",
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": "\"Windows\"",
    "sign": "aeed648e6141b18dd1c49117a9338bb8c44ce3803b98e0b973d632346da193e8",
    "user-token": "",
    "uuid": "064dee39ac7e7feee763f449b69faa9a"
}

cookies = {
    "guid": "064dee39ac7e7feee763f449b69faa9a",
    "sajssdk_2015_cross_new_user": "1",
    "sensorsdata2015jssdkcross": "...",
    "nsearch": "jobarea%3D%26|%26ord_field%3D%26|%26recentSearch0%3D%26|%26recentSearch1%3D%26|%26recentSearch2%3D%26|%26recentSearch3%3D%26|%26recentSearch4%3D%26|%26collapse_expansion%3D",
    "search": "...",
    "acw_tc": "ac11000116841593187302414e00e0549c125d5ad342ec325b0211a4e1ec13",
    "uid": "wKhJRWRiO1dsWBM3SVrMAg==",
    "JSESSIONID": "00DEC3853A738DF9B12A498C25728575",
    "ssxmod_itna": "...",
    "ssxmod_itna2": "..."
}

url = "https://cupidjob.51job.com/open/noauth/search-pc"
f = open('job.csv', mode='a', encoding='utf-8')
for page in range(1, 36):
    print(f"正在抓取第{page}页...")
    time.sleep(3)
    params = {
        "api_key": "51job",
        "timestamp": "1684159452",
        "keyword": f"{v}",
        "searchType": "2",
        "jobArea": "000000",
        "sortType": "0",
        "pageNum": f"{page}",
        "requestId": "0266bbd1054b9bb1ec7a0066e6e5060c",
        "pageSize": "20",
        "source": "1",
        "pageCode": "sou|sou|soulb"
    }
    response = requests.get(url, headers=headers, cookies=cookies, params=params)
    rows = response.json()["resultbody"]
    for row in rows["job"]["items"]:
        job_name = row["jobName"]
        city = row["jobAreaString"]
        salary = row["provideSalaryString"]
        education = row["degreeString"]
        company_name = row["fullCompanyName"]
        field = row["industryType1Str"]
        working_years = row["workYearString"]
        jobwelf_list = "|".join(row["jobTags"])
        companytype_text = row["companyTypeString"]
        companysize_text = row["companySizeString"]
        print(working_years, jobwelf_list, companytype_text, companysize_text)
        f.write(f"{job_name}, {city}, {salary}, {education}, {company_name}, {field}, {working_years}, {jobwelf_list}, {companytype_text}, {companysize_text}
")

Running the script produces a CSV file containing thousands of job records. Sample output screenshots are shown below.

Another image illustrates the volume of data collected for a single position.

If you need to scrape a different job category, you may have to update the cookie values; otherwise the request may not return the expected information.

3. Conclusion

The provided Python script demonstrates a practical approach to harvesting job data from 51job.com, which can be stored in a database, visualized through a web interface, or analyzed for research purposes. It also serves as a solid foundation for academic assignments or small‑scale data‑science projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data extraction Web Scraping requests Job Data

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.