Uncovering Tianjin’s Bus Network: From Raw GPS Data to Complex Network Insights
This article walks through acquiring Tianjin bus line data via the Gaode Map API, cleaning and converting geographic coordinates, visualizing station distributions with matplotlib and Baidu maps, and then applying complex‑network analysis to reveal degree distributions, clustering coefficients, and small‑world characteristics of the city’s public‑transport system.
1. Data Viewing and Preprocessing
Data obtained from the Gaode Map API includes bus line names, direction (0 for up, 1 for down), station order, station name, and longitude/latitude expressed in minutes. The original file contains 30,396 records; five station names, one latitude, and 38 longitude entries are missing and are removed for simplicity.
import pandas as pd
df = pd.read_excel('site_information.xlsx')
print(df.head())Field description:
线路名称 – bus line name
上下行 – 0 for up, 1 for down
站序号 – station sequence number
站名称 – station name
经度(分) – longitude (minutes)
纬度(分) – latitude (minutes)
After cleaning, the dataset contains 6,618 bus lines and 4,851 stations.
df2 = df1.copy()
df2['经度(分)'] = df1['经度(分)'].apply(float) / 60
df2['纬度(分)'] = df1['纬度(分)'].apply(float) / 60
print(df2.head())
df2.to_excel('处理后数据.xlsx', index=False)2. Data Analysis
Using matplotlib to plot a scatter diagram of longitude versus latitude reveals clear hotspots in Heping and Nankai districts.
# -*- coding: UTF-8 -*-
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import random
df = pd.read_excel('处理后数据.xlsx')
x_data = df['经度(分)']
y_data = df['纬度(分)']
colors = [random.choice(['#FF0000', '#0000CD', '#00BFFF', '#008000', '#FF1493', '#FFD700', '#FF4500', '#00FA9A', '#191970', '#9932CC']) for _ in range(len(x_data))]
mpl.rcParams['font.family'] = 'SimHei'
plt.style.use('ggplot')
plt.figure(figsize=(12, 6), dpi=200)
plt.scatter(x_data, y_data, marker='o', s=9., c=colors)
plt.xlabel('经度')
plt.ylabel('纬度')
plt.title('天津市公交站点分布情况')
plt.savefig('经纬度散点图.png')
plt.show()Mapping the same data on an actual map with pyecharts BMap shows dense line networks in the same districts.
# -*- coding: UTF-8 -*-
import pandas as pd
from pyecharts.charts import BMap
from pyecharts import options as opts
from pyecharts.globals import CurrentConfig
CurrentConfig.ONLINE_HOST = 'D:/python/pyecharts-assets-master/assets/'
df = pd.read_excel('处理后数据.xlsx')
df.drop_duplicates(subset='站名称', inplace=True)
longitude = list(df['经度(分)'])
latitude = list(df['纬度(分)'])
datas = [list(zip(longitude, latitude))]
BAIDU_MAP_AK = '改成你的百度地图AK'
c = (BMap(init_opts=opts.InitOpts(width='1200px', height='800px'))
.add_schema(baidu_ak=BAIDU_MAP_AK, center=[117.20, 39.13], zoom=10, is_roam=True)
.add('', type_='lines', is_polyline=True, data_pair=datas,
linestyle_opts=opts.LineStyleOpts(opacity=0.2, width=0.5, color='red'),
progressive=200, progressive_threshold=500))
c.render('公交网络地图.html')Network Degree Analysis
The degree of a line node equals the number of other lines reachable via a transfer. The network contains 618 lines; the maximum degree is 175, the minimum is 0, and the average degree is 55.41. Most degrees lie between 7 and 26.
# -*- coding: UTF-8 -*-
import xlrd
import pandas as pd
import collections
import matplotlib.pyplot as plt
import matplotlib as mpl
# Load raw data
df = pd.read_excel('site_information.xlsx')
line_names = df['线路名称'].unique()
line_list = list(line_names)
# Build station dictionary for each line (upward direction only)
data = xlrd.open_workbook('site_information.xlsx')
table = data.sheets()[0]
site_dic = {k: [] for k in line_list}
for i in range(1, table.nrows):
row = table.row_values(i)
if row[1] == '0': # up direction
site_dic[row[0]].append(row[3])
# Initialize degree list
node_count = [0] * len(line_list)
sites = list(site_dic.values())
# Compute degree by checking shared stations between lines
for j in range(len(sites)):
for k in range(j + 1, len(sites)):
if set(sites[j]) & set(sites[k]):
node_count[j] += 1
node_count[k] += 1
print(f"公交网络共有 {len(line_list)} 条线路")
print(f"线路网络的度的最大值为:{max(node_count)}")
print(f"线路网络的度的最小值为:{min(node_count)}")
print(f"线路网络的度的平均值为:{sum(node_count) / len(node_count)}")
# Plot degree distribution
node_number = list(range(len(node_count)))
mpl.rcParams['font.family'] = 'SimHei'
plt.figure(figsize=(10, 6), dpi=150)
plt.bar(node_number, node_count, color='purple')
plt.xlabel('节点编号n')
plt.ylabel('节点的度数K')
plt.title('线路网络中各节点的度的大小分布')
plt.savefig('线路网络中各节点的度的大小.png')
plt.show()Clustering Coefficient
The clustering coefficient measures how tightly a node’s neighbors are connected. The average clustering coefficient of Tianjin’s bus network is 0.0906, lower than that of a comparable random network (≈0.00044), indicating a small‑world property.
# Compute clustering coefficient for each node
Ei = [] # actual number of edges among neighbors
for a in range(len(sites)):
neighbor = []
if node_count[a] <= 1:
Ei.append(0)
continue
for b in range(len(sites)):
if a == b:
continue
if set(sites[a]) & set(sites[b]):
neighbor.append(b)
# Count edges among neighbors
count = 0
for i in range(len(neighbor)):
for j in range(i + 1, len(neighbor)):
if set(sites[neighbor[i]]) & set(sites[neighbor[j]]):
count += 1
Ei.append(count)
# Clustering coefficient per node
Ci = []
for m in range(len(node_count)):
if node_count[m] <= 1:
Ci.append(0)
else:
Ci.append(2 * Ei[m] / (node_count[m] * (node_count[m] - 1)))
print("天津市公交线路网络平均聚类系数为:{:.4f}".format(sum(Ci) / len(Ci)))
# Plot clustering coefficient distribution
mpl.rcParams['font.family'] = 'SimHei'
plt.figure(figsize=(10, 6), dpi=150)
plt.bar(range(len(Ci)), Ci, color='blue')
plt.xlabel('节点编号n')
plt.ylabel('节点的聚类系数')
plt.title('线路网络中各节点的聚类系数分布')
plt.savefig('聚类系数分布.png')
plt.show()Reference: "Complex Network Analysis of Tianjin Public Transport" and related studies on urban bus network topology.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
