Big Data 14 min read

What Shanghai’s 2020 Second‑Hand Housing Data Reveals About Prices and Market Trends

This article analyzes a Scrapy‑Redis crawl of Lianjia data from July 2020, cleans and enriches the dataset, and uses pandas profiling and visualizations to uncover Shanghai’s second‑hand housing price distribution, regional hotspots, housing types, and market dynamics during the first half of 2020.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
What Shanghai’s 2020 Second‑Hand Housing Data Reveals About Prices and Market Trends

Project Background

During a job interview I was asked about Shanghai’s second‑hand housing market, which prompted this analysis. Since the 2016 "no speculation" policy, housing prices have cooled, and subsequent measures in 2019 further affected supply, financing, rental‑sale balance, residency permits, and subsidies for migrant workers.

Economic data shows Shanghai’s per‑capita disposable income in H1 2020 was ¥36,577, a 3.64% YoY increase, while social data highlights a historic low birth rate since 1949.

Analysis Purpose

Track the overall listing volume and average price trend of Shanghai second‑hand houses in H1 2020.

Identify price ranges and characteristics of current listings.

Determine which districts have the highest selling pressure.

Data Cleaning

After crawling, the dataset contains 37,491 rows and 20 columns with a few duplicate rows. Cleaning steps include:

Remove duplicate rows.

Replace None values with a placeholder.

Split fields such as region, layout, floor, and mortgage information into separate columns.

Convert data types (e.g., price to float, dates to datetime).

Strip unit symbols from area and convert to float.

Normalize price units (convert values < 20 to ten‑thousands).

Calculate per‑square‑meter price.

import pandas_profiling
pandas_profiling.ProfileReport(data).to_file("./report/html")
# Remove duplicates
data.drop_duplicates(keep='first', inplace=True)
# Replace None
data = data.applymap(lambda x: '暂无数据' if x == 'None' else x)
# Split columns
data = pd.concat([data, data['地区'].str.extract('(?P<区>.*?)\s(?P<镇>.*?)\s(?P<环>.*)'),
                  data['房屋户型'].str.extract('(?P<室>\d+)室(?P<厅>\d+)厅(?P<厨>\d+)厨(?P<卫>\d+)卫'),
                  data['所在楼层'].str.extract('(?P<所处楼层>.+)\(共(?P<总层数>\d+)层\)'),
                  data['抵押信息'].map(lambda x: x.strip()).str.extract('(?P<有无抵押>.?)抵押(?P<抵押情况>.*)?')], axis=1)
# Clean up
data.drop(['地区','所在楼层','抵押信息'], axis=1, inplace=True)
# Convert area
data['建筑面积'] = data['建筑面积'].map(lambda x: float(x[:-1]))
# Convert price
data['价格'] = data['价格'].astype(float)
# Convert dates
data['挂牌时间'] = pd.to_datetime(data['挂牌时间'])
data['上次交易'] = pd.to_datetime(data['上次交易'], errors='coerce')
# Normalize price unit
data['价格'] = data['价格'].map(lambda x: x*10000 if x < 20 else x)
# Compute unit price
data['均价'] = round(data['价格']/data['建筑面积']*10000, 2)

Descriptive Analysis

After cleaning, 37,483 records remain covering 2013‑01‑18 to 2020‑07‑24. Key statistics:

Total listings: 37,483.

Area range: 13 m² – 1,663.1 m².

Highest average price: ¥319,960.62 /m²; overall average: ¥56,466.26 /m².

High‑Price Listings (> ¥300,000 /m²)

Four garden‑style villas exceed ¥300,000 /m², located near the historic Xinguo Road and the iconic Wukang Building.

Hot Business Districts

Listing counts by district show Zhongshan Park (674 listings, ¥72,750 /m²) as relatively affordable, while districts like Jiading, Minhang, and Baoshan exhibit higher selling pressure.

Housing Layout Distribution

Two‑room units dominate the market; one‑room units account for only 15.65% of listings.

Price Segmentation

Listings are grouped into six price bands: under ¥1 M , ¥1‑3 M, ¥3‑5 M, ¥5‑8 M, ¥8‑10 M, and > ¥10 M. The majority fall within the ¥1‑3 M range (≈13,000 listings).

Ring Road Distribution

Properties outside the outer ring constitute the largest share, reflecting higher supply in peripheral districts.

Average Price Map (2020 H1)

Visualization shows central districts with significantly higher average prices compared to outer areas.

Listing Volume Over Time

Overall listings increased year‑over‑year, but the 2020 data (cut off at 23 July) may miss sold units from earlier years. The pandemic suppressed listings in Jan‑Feb, with a rebound from March onward; June 2020 saw the peak volume.

Summary

Listing volume rose steadily in H1 2020, with average price hovering around ¥55,100 /m².

Except for Hongkou, most districts saw price stabilization in Q2.

Properties under ¥1 M are scarce; the market concentrates in the ¥1‑3 M band.

Outer‑ring districts (Jiading, Minhang, Baoshan) dominate listings, indicating higher selling pressure.

One‑room units now represent only 15.65% of the market.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonData AnalysisReal EstateShanghaiHousing Market
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.