What Shanghai’s 2020 Second‑Hand Housing Data Reveals About Prices and Market Trends
This article analyzes a Scrapy‑Redis crawl of Lianjia data from July 2020, cleans and enriches the dataset, and uses pandas profiling and visualizations to uncover Shanghai’s second‑hand housing price distribution, regional hotspots, housing types, and market dynamics during the first half of 2020.
Project Background
During a job interview I was asked about Shanghai’s second‑hand housing market, which prompted this analysis. Since the 2016 "no speculation" policy, housing prices have cooled, and subsequent measures in 2019 further affected supply, financing, rental‑sale balance, residency permits, and subsidies for migrant workers.
Economic data shows Shanghai’s per‑capita disposable income in H1 2020 was ¥36,577, a 3.64% YoY increase, while social data highlights a historic low birth rate since 1949.
Analysis Purpose
Track the overall listing volume and average price trend of Shanghai second‑hand houses in H1 2020.
Identify price ranges and characteristics of current listings.
Determine which districts have the highest selling pressure.
Data Cleaning
After crawling, the dataset contains 37,491 rows and 20 columns with a few duplicate rows. Cleaning steps include:
Remove duplicate rows.
Replace None values with a placeholder.
Split fields such as region, layout, floor, and mortgage information into separate columns.
Convert data types (e.g., price to float, dates to datetime).
Strip unit symbols from area and convert to float.
Normalize price units (convert values < 20 to ten‑thousands).
Calculate per‑square‑meter price.
import pandas_profiling
pandas_profiling.ProfileReport(data).to_file("./report/html")
# Remove duplicates
data.drop_duplicates(keep='first', inplace=True)
# Replace None
data = data.applymap(lambda x: '暂无数据' if x == 'None' else x)
# Split columns
data = pd.concat([data, data['地区'].str.extract('(?P<区>.*?)\s(?P<镇>.*?)\s(?P<环>.*)'),
data['房屋户型'].str.extract('(?P<室>\d+)室(?P<厅>\d+)厅(?P<厨>\d+)厨(?P<卫>\d+)卫'),
data['所在楼层'].str.extract('(?P<所处楼层>.+)\(共(?P<总层数>\d+)层\)'),
data['抵押信息'].map(lambda x: x.strip()).str.extract('(?P<有无抵押>.?)抵押(?P<抵押情况>.*)?')], axis=1)
# Clean up
data.drop(['地区','所在楼层','抵押信息'], axis=1, inplace=True)
# Convert area
data['建筑面积'] = data['建筑面积'].map(lambda x: float(x[:-1]))
# Convert price
data['价格'] = data['价格'].astype(float)
# Convert dates
data['挂牌时间'] = pd.to_datetime(data['挂牌时间'])
data['上次交易'] = pd.to_datetime(data['上次交易'], errors='coerce')
# Normalize price unit
data['价格'] = data['价格'].map(lambda x: x*10000 if x < 20 else x)
# Compute unit price
data['均价'] = round(data['价格']/data['建筑面积']*10000, 2)Descriptive Analysis
After cleaning, 37,483 records remain covering 2013‑01‑18 to 2020‑07‑24. Key statistics:
Total listings: 37,483.
Area range: 13 m² – 1,663.1 m².
Highest average price: ¥319,960.62 /m²; overall average: ¥56,466.26 /m².
High‑Price Listings (> ¥300,000 /m²)
Four garden‑style villas exceed ¥300,000 /m², located near the historic Xinguo Road and the iconic Wukang Building.
Hot Business Districts
Listing counts by district show Zhongshan Park (674 listings, ¥72,750 /m²) as relatively affordable, while districts like Jiading, Minhang, and Baoshan exhibit higher selling pressure.
Housing Layout Distribution
Two‑room units dominate the market; one‑room units account for only 15.65% of listings.
Price Segmentation
Listings are grouped into six price bands: under ¥1 M , ¥1‑3 M, ¥3‑5 M, ¥5‑8 M, ¥8‑10 M, and > ¥10 M. The majority fall within the ¥1‑3 M range (≈13,000 listings).
Ring Road Distribution
Properties outside the outer ring constitute the largest share, reflecting higher supply in peripheral districts.
Average Price Map (2020 H1)
Visualization shows central districts with significantly higher average prices compared to outer areas.
Listing Volume Over Time
Overall listings increased year‑over‑year, but the 2020 data (cut off at 23 July) may miss sold units from earlier years. The pandemic suppressed listings in Jan‑Feb, with a rebound from March onward; June 2020 saw the peak volume.
Summary
Listing volume rose steadily in H1 2020, with average price hovering around ¥55,100 /m².
Except for Hongkou, most districts saw price stabilization in Q2.
Properties under ¥1 M are scarce; the market concentrates in the ¥1‑3 M band.
Outer‑ring districts (Jiading, Minhang, Baoshan) dominate listings, indicating higher selling pressure.
One‑room units now represent only 15.65% of the market.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
