How to Scrape and Analyze Beijing Dank Apartment Data with Python
This article demonstrates how to crawl 6,025 Beijing Dank Apartment listings using Python, clean and enrich the data with Pandas, and visualize distribution, price, size, floor, and subway proximity through charts, revealing key market insights and correlation patterns.
Introduction
The rapid collapse of Dank Apartment caused widespread tenant and landlord disputes in Beijing. To provide a data‑driven perspective, 6,025 apartment records from the Beijing region were scraped, cleaned, and visualized.
Data Acquisition
The website has a simple structure; pagination URLs are generated automatically. A small number of pages return 404 and are filtered out. The core crawler uses requests to fetch pages and xpath to extract fields such as price, area, ID, layout, floor, location and subway information.
def get_danke(href):
time.sleep(random.uniform(0, 1)) # avoid overloading the server
response = requests.get(url=href, headers=headers)
if response.status_code == 200:
res = response.content.decode('utf-8')
div = etree.HTML(res)
items = div.xpath("/html/body/div[3]/div[1]/div[2]/div[2]")
for item in items:
house_price = item.xpath("./div[3]/div[2]/div/span/div/text()")[0]
house_area = item.xpath("./div[4]/div[1]/div[1]/label/text()")[0].replace('建筑面积:约', '').replace('㎡(以现场勘察为准)', '')
house_id = item.xpath("./div[4]/div[1]/div[2]/label/text()")[0].replace('编号:', '')
house_type = item.xpath("./div[4]/div[1]/div[3]/label/text()")[0].replace('
', '').replace(' ', '').replace('户型:', '')
house_floor = item.xpath("./div[4]/div[2]/div[3]/label/text()")[0].replace('楼层:', '')
house_position_1 = item.xpath("./div[4]/div[2]/div[4]/label/div/a[1]/text()")[0]
house_position_2 = item.xpath("./div[4]/div[2]/div[4]/label/div/a[2]/text()")[0]
house_position_3 = item.xpath("./div[4]/div[2]/div[4]/label/div/a[3]/text()")[0]
house_subway = item.xpath("./div[4]/div[2]/div[5]/label/text()")[0]
else:
house_price = house_area = house_id = house_type = house_floor = house_position_1 = house_position_2 = house_position_3 = house_subway = NoneData Processing
All CSV files generated by the crawler are concatenated with pandas.concat. Duplicate rows are removed, and non‑numeric columns (price, area) are cast to float64. Floor information is split into current floor and total floors. Subway count is derived by counting occurrences of “号线”, and distance to the nearest subway is extracted with a regular expression.
# Convert price and area to numeric types
jg = df['价格'] != '价格'
df = df.loc[jg, :]
df['价格'] = df['价格'].astype('float64')
df['面积'] = df['面积'].astype('float64')
# Extract floor numbers
df = df[df['楼层'].notnull()]
df['所在楼层'] = df['楼层'].apply(lambda x: x.split('/')[0]).astype('int32')
df['总楼层'] = df['楼层'].apply(lambda x: x.split('/')[1]).str.replace('层', '').astype('int32')
# Subway utilities
def get_subway_num(row):
return row.count('号线')
def get_subway_distance(row):
m = re.search(r'\d+(?=米)', row)
return int(m.group()) if m else -1
df['地铁数'] = df['地铁'].apply(get_subway_num)
df['距离地铁距离'] = df['地铁'].apply(get_subway_distance).astype('int32')Data Visualization
Using matplotlib, seaborn, and pyecharts, several charts were produced:
Bar chart of apartment counts per district (Chaoyang 1,877, Tongzhou 1,027).
Top‑10 residential complexes by apartment count.
Rent distribution showing >50% of units priced between 2,000–3,000 CNY/month.
Floor‑level distribution (73.9% below 10 floors).
Area distribution (86.8% below 20 m²).
Word‑cloud of commercial circles highlighting popular neighborhoods.
Correlation Analysis
A correlation matrix shows that apartment area (0.81) and the number of nearby subway lines (0.36) have the strongest positive relationship with price, while floor level has little impact.
color_map = sns.light_palette('orange', as_cmap=True)
df.corr().style.background_gradient(color_map)Conclusion
The analysis reveals that most Dank Apartments in Beijing are small, low‑rise units concentrated in Chaoyang and Tongzhou, with rent heavily influenced by size and subway accessibility. These insights help tenants and landlords understand market dynamics during the crisis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
