Fundamentals 19 min read

Uncovering Tianjin’s Bus Network: From Raw GPS Data to Complex Network Insights

This article walks through acquiring Tianjin bus line data via the Gaode Map API, cleaning and converting geographic coordinates, visualizing station distributions with matplotlib and Baidu maps, and then applying complex‑network analysis to reveal degree distributions, clustering coefficients, and small‑world characteristics of the city’s public‑transport system.

Python Crawling & Data Mining

Mar 3, 2021

Uncovering Tianjin’s Bus Network: From Raw GPS Data to Complex Network Insights

1. Data Viewing and Preprocessing

Data obtained from the Gaode Map API includes bus line names, direction (0 for up, 1 for down), station order, station name, and longitude/latitude expressed in minutes. The original file contains 30,396 records; five station names, one latitude, and 38 longitude entries are missing and are removed for simplicity.

import pandas as pd

df = pd.read_excel('site_information.xlsx')
print(df.head())

Field description:

线路名称 – bus line name

上下行 – 0 for up, 1 for down

站序号 – station sequence number

站名称 – station name

经度（分） – longitude (minutes)

纬度（分） – latitude (minutes)

After cleaning, the dataset contains 6,618 bus lines and 4,851 stations.

df2 = df1.copy()
df2['经度（分）'] = df1['经度（分）'].apply(float) / 60
df2['纬度（分）'] = df1['纬度（分）'].apply(float) / 60
print(df2.head())

df2.to_excel('处理后数据.xlsx', index=False)

2. Data Analysis

Using matplotlib to plot a scatter diagram of longitude versus latitude reveals clear hotspots in Heping and Nankai districts.

# -*- coding: UTF-8 -*-
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import random

df = pd.read_excel('处理后数据.xlsx')
x_data = df['经度（分）']
y_data = df['纬度（分）']
colors = [random.choice(['#FF0000', '#0000CD', '#00BFFF', '#008000', '#FF1493', '#FFD700', '#FF4500', '#00FA9A', '#191970', '#9932CC']) for _ in range(len(x_data))]
mpl.rcParams['font.family'] = 'SimHei'
plt.style.use('ggplot')
plt.figure(figsize=(12, 6), dpi=200)
plt.scatter(x_data, y_data, marker='o', s=9., c=colors)
plt.xlabel('经度')
plt.ylabel('纬度')
plt.title('天津市公交站点分布情况')
plt.savefig('经纬度散点图.png')
plt.show()

Mapping the same data on an actual map with pyecharts BMap shows dense line networks in the same districts.

# -*- coding: UTF-8 -*-
import pandas as pd
from pyecharts.charts import BMap
from pyecharts import options as opts
from pyecharts.globals import CurrentConfig

CurrentConfig.ONLINE_HOST = 'D:/python/pyecharts-assets-master/assets/'

df = pd.read_excel('处理后数据.xlsx')
df.drop_duplicates(subset='站名称', inplace=True)
longitude = list(df['经度（分）'])
latitude = list(df['纬度（分）'])

datas = [list(zip(longitude, latitude))]
BAIDU_MAP_AK = '改成你的百度地图AK'

c = (BMap(init_opts=opts.InitOpts(width='1200px', height='800px'))
     .add_schema(baidu_ak=BAIDU_MAP_AK, center=[117.20, 39.13], zoom=10, is_roam=True)
     .add('', type_='lines', is_polyline=True, data_pair=datas,
          linestyle_opts=opts.LineStyleOpts(opacity=0.2, width=0.5, color='red'),
          progressive=200, progressive_threshold=500))

c.render('公交网络地图.html')

Network Degree Analysis

The degree of a line node equals the number of other lines reachable via a transfer. The network contains 618 lines; the maximum degree is 175, the minimum is 0, and the average degree is 55.41. Most degrees lie between 7 and 26.

# -*- coding: UTF-8 -*-
import xlrd
import pandas as pd
import collections
import matplotlib.pyplot as plt
import matplotlib as mpl

# Load raw data
df = pd.read_excel('site_information.xlsx')
line_names = df['线路名称'].unique()
line_list = list(line_names)

# Build station dictionary for each line (upward direction only)
data = xlrd.open_workbook('site_information.xlsx')
table = data.sheets()[0]
site_dic = {k: [] for k in line_list}
for i in range(1, table.nrows):
    row = table.row_values(i)
    if row[1] == '0':  # up direction
        site_dic[row[0]].append(row[3])

# Initialize degree list
node_count = [0] * len(line_list)
sites = list(site_dic.values())

# Compute degree by checking shared stations between lines
for j in range(len(sites)):
    for k in range(j + 1, len(sites)):
        if set(sites[j]) & set(sites[k]):
            node_count[j] += 1
            node_count[k] += 1

print(f"公交网络共有 {len(line_list)} 条线路")
print(f"线路网络的度的最大值为：{max(node_count)}")
print(f"线路网络的度的最小值为：{min(node_count)}")
print(f"线路网络的度的平均值为：{sum(node_count) / len(node_count)}")

# Plot degree distribution
node_number = list(range(len(node_count)))
mpl.rcParams['font.family'] = 'SimHei'
plt.figure(figsize=(10, 6), dpi=150)
plt.bar(node_number, node_count, color='purple')
plt.xlabel('节点编号n')
plt.ylabel('节点的度数K')
plt.title('线路网络中各节点的度的大小分布')
plt.savefig('线路网络中各节点的度的大小.png')
plt.show()

Clustering Coefficient

The clustering coefficient measures how tightly a node’s neighbors are connected. The average clustering coefficient of Tianjin’s bus network is 0.0906, lower than that of a comparable random network (≈0.00044), indicating a small‑world property.

# Compute clustering coefficient for each node
Ei = []  # actual number of edges among neighbors
for a in range(len(sites)):
    neighbor = []
    if node_count[a] <= 1:
        Ei.append(0)
        continue
    for b in range(len(sites)):
        if a == b:
            continue
        if set(sites[a]) & set(sites[b]):
            neighbor.append(b)
    # Count edges among neighbors
    count = 0
    for i in range(len(neighbor)):
        for j in range(i + 1, len(neighbor)):
            if set(sites[neighbor[i]]) & set(sites[neighbor[j]]):
                count += 1
    Ei.append(count)

# Clustering coefficient per node
Ci = []
for m in range(len(node_count)):
    if node_count[m] <= 1:
        Ci.append(0)
    else:
        Ci.append(2 * Ei[m] / (node_count[m] * (node_count[m] - 1)))

print("天津市公交线路网络平均聚类系数为：{:.4f}".format(sum(Ci) / len(Ci)))

# Plot clustering coefficient distribution
mpl.rcParams['font.family'] = 'SimHei'
plt.figure(figsize=(10, 6), dpi=150)
plt.bar(range(len(Ci)), Ci, color='blue')
plt.xlabel('节点编号n')
plt.ylabel('节点的聚类系数')
plt.title('线路网络中各节点的聚类系数分布')
plt.savefig('聚类系数分布.png')
plt.show()

Reference: "Complex Network Analysis of Tianjin Public Transport" and related studies on urban bus network topology.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data analysis visualization transportation Complex Networks Network Metrics

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.