How Data Mining and Machine Learning Can Pinpoint Your Ideal Beijing Rental

An Alibaba engineer demonstrates how to scrape Ziroom listings, apply random‑forest feature importance and grey relational analysis to rank shared apartments by size, price and commute distance, dramatically narrowing the search for a perfect rental in Beijing.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Data Mining and Machine Learning Can Pinpoint Your Ideal Beijing Rental

An Alibaba engineer, also a recent migrant to Beijing, shares his frustration with repeatedly finding rental apartments and explains how he turned the problem into a data‑driven selection process.

He crawled all Ziroom listings using Python's Scrapy framework combined with Splash for JavaScript rendering, ultimately collecting 7,907 rental entries. A sample of the JSON records is shown below:

{"floorTotal": "6", "rooms": "2", "lng": "116.422213", "direction": "南", "floorLoc": "5", "halls": "1", "rentType": "整", "time_unit": "每月", "title": "青年沟2居室", "privateBathroom": "0", "confStatus": "1", "district": "东城", "lat": "39.968073", "area": "64.17", "privateBalcony": "0", "price": "6590", "nearestSubWayDist": "367"} {"floorTotal": "18", "rooms": "3", "lng": "116.400737", "direction": "西", "floorLoc": "6", "halls": "1", "rentType": "合", "time_unit": "每月", "title": "望陶园小区3居室-02卧", "privateBathroom": "0", "confStatus": "1", "district": "东城", "lat": "39.870957", "area": "10.7", "privateBalcony": "0", "confType": "布丁", "price": "2490", "nearestSubWayDist": "517"}

After filtering out non‑shared rooms, 4,762 shared‑room records remain. The average and median rents are almost identical, indicating a roughly symmetric price distribution, which is visualized in the histogram (Image 1).

The spatial rent distribution is shown on a map (Image 2), where red indicates rentals above ¥3,000/month, green ¥2,000‑3,000, and purple below ¥2,000. The map reveals that northern districts tend to be pricier than southern ones, and eastern districts are generally more expensive than western ones.

To understand what drives rent prices, a random‑forest regression model was trained on 14 features (area, configuration version, configuration type, orientation, floor level, total floors, distance to nearest subway, presence of private balcony, private bathroom, number of rooms, number of halls, district, azimuth relative to Tiananmen, and distance to Tiananmen). Categorical features were one‑hot encoded, expanding the feature set to 41 dimensions. Using a 2/3‑training and 1/3‑testing split, the model achieved R² = 0.86. The top‑10 feature importances are:

Distance to Tiananmen – 34.87%

Room area – 10.48%

Distance to nearest subway – 10.35%

Azimuth relative to Tiananmen – 9.05%

Private bathroom – 8.17%

Located in Chaoyang district – 3.43%

Located in Haidian district – 2.77%

Total floors of the building – 2.52%

North‑facing – 2.21%

Private balcony – 2.15%

Focusing on the three attributes most important to the author—room area, rent, and distance to the workplace—a grey relational analysis was performed. Each attribute was normalized using deviation standardization to a [0,1] range, then weighted (area 1/6, rent 1/3, distance 1/2) based on a target‑optimization matrix. The relational coefficients for five filtered rooms (area ≤ 8 m², rent ≥ ¥2,200) are calculated and summed, yielding the final scores shown in the table below.

Sorting all rooms by the combined relational score and selecting the top ten dramatically reduces the search space, as illustrated in the final image.

The author concludes that this data‑driven workflow can be rerun whenever a new rental search is needed, though the listings change quickly, so timely execution is essential.

Readers can reply with the keyword “租房” to the Alibaba Tech public account to receive the full scraping and analysis code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonrandom forestdata-scrapinggrey relational analysisHousing Marketrental analysis
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.