How to Scrape and Analyze 11 Years of Beijing Weather Data with Python
This article demonstrates how to collect historical Beijing weather data using Python web‑scraping, clean and transform it with pandas, visualize overall and monthly patterns, and extract insights such as the first winter snowfall each year and annual snow‑day counts.
Introduction
Using Python we scrape historical weather data for Beijing, explore whether the 2021 winter snow arrived earlier than usual, and perform a full data‑analysis workflow.
1. First Winter Snow Dates (Past 11 Years)
From 2011 to 2021 the first winter snowfall typically occurs in late November. In 2021 the first snow fell on November 6, which was a heavy snow compared to previous years.
2. 2021 Beijing Weather Data Overview
2.1 Overall Weather Distribution
Out of 304 days (up to October 31, 2021) more than 73% were sunny or partly cloudy, about 18% were overcast or smog, and roughly 8% were rainy.
2.2 Monthly Weather Distribution
Rainy days concentrate in May‑August, while haze is most common in February‑March.
2.3 Monthly Temperature Trends
June‑August are the high‑temperature months, and January can drop to –20 °C.
3. Data Collection
Data is fetched from Historical Weather using a simple XPath‑based scraper.
import requests
from lxml import etree
import pandas as pd
def get_html(month):
headers = {
"Accept-Encoding": "Gzip",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
}
url = f'https://lishi.tianqi.com/beijing/{month}.html'
r = requests.get(url, headers=headers)
r_html = etree.HTML(r.text)
return r_html
month_list = pd.period_range('201101', '202110', freq='M').strftime('%Y%m')
df = pd.DataFrame(columns=['日期', '最高气温', '最低气温', '天气', '风向'])
for i, month in enumerate(month_list):
r_html = get_html(month)
div = r_html.xpath('.//div[@class="tian_three"]')[0]
lis = div.xpath('.//li')
for li in lis:
item = {
'日期': li.xpath('./div[@class="th200"]/text()')[0],
'最高气温': li.xpath('./div[@class="th140"]/text()')[0],
'最低气温': li.xpath('./div[@class="th140"]/text()')[1],
'天气': li.xpath('./div[@class="th140"]/text()')[2],
'风向': li.xpath('./div[@class="th140"]/text()')[3]
}
df = df.append(item, ignore_index=True)
print(f'{i+1}/130 months data collected')
df.to_excel(r'北京历史天气数据.xlsx', index=None)4. Data Processing
We load the Excel file with pandas and perform several cleaning steps:
Split the 日期 column to separate date and weekday.
Remove the ℃ symbol from temperature columns.
Add a flag 是否有雪 indicating whether the weather description contains the character “雪”.
Convert data types: 日期 to datetime, temperature columns to integers.
Extract year, month, and day into separate columns.
import pandas as pd
df = pd.read_excel('北京历史天气数据.xlsx')
# Split date and weekday
df[['日期', '星期']] = df['日期'].str.split(' ', expand=True, n=1)
# Remove ℃
df[['最高气温', '最低气温']] = df[['最高气温', '最低气温']].apply(lambda x: x.str.replace('℃', ''))
# Snow flag
df.loc[df['天气'].str.contains('雪'), '是否有雪'] = '是'
df.fillna('否', inplace=True)
# Type conversion
df['日期'] = pd.to_datetime(df['日期'])
df[['最高气温', '最低气温']] = df[['最高气温', '最低气温']].astype(int)
# Year, month, day
df['年份'] = df['日期'].dt.year
df['月份'] = df['日期'].dt.month
df['日'] = df['日期'].dt.day5. Finding the First Winter Snow Each Year
snowData = df[df['是否有雪'] == '是']
firstSnow = snowData[snowData['月份'] >= 9].groupby('年份').first().reset_index()6. Snow Days per Year
snow_counts = snowData.groupby('年份')['日期'].count().to_frame('下雪天数').reset_index()Result example (years 2011‑2021):
2011: 11 days
2012: 13 days
2013: 15 days
2014: 6 days
2015: 15 days
2016: 6 days
2017: 6 days
2018: 2 days
2019: 2 days
2020: 6 days
2021: 1 day
Key Visualizations
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
