Big Data 13 min read

Using TransBigData for Python Transportation Data Analysis and Visualization

This article demonstrates how to install the TransBigData Python package and use it for preprocessing, grid‑based aggregation, OD extraction, and both static and interactive visualizations of taxi GPS data, showcasing code examples and detailed explanations for each step.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Using TransBigData for Python Transportation Data Analysis and Visualization

1. Introduction

The TransBigData library is an open‑source Python package designed for efficient processing, analysis, and visualization of transportation spatio‑temporal big data such as taxi GPS, bike‑share, and bus GPS records. It provides concise, high‑performance functions that simplify complex data‑handling tasks.

2. Installation

Install the library via pip (or conda) with the following command:

<code>pip install -U transbigdata</code>

After installation, import the package in Python:

<code>import transbigdata as tbd</code>

3. Data Pre‑processing

Read raw taxi GPS data with pandas, load the study area shapefile with GeoPandas, and then apply built‑in cleaning functions:

<code>import pandas as pd
import geopandas as gpd

data = pd.read_csv('TaxiData-Sample.csv', header=None)
data.columns = ['VehicleNum','time','lon','lat','OpenStatus','Speed']

sz = gpd.read_file(r'sz/sz.shp')

# Remove points outside the study area
data = tbd.clean_outofshape(data, sz, col=['lon','lat'], accuracy=500)

# Remove instantaneous status changes
data = tbd.clean_taxi_status(data, col=['VehicleNum','time','OpenStatus'])</code>

4. Grid Generation

Define a rectangular grid for the study region and obtain grid parameters:

<code>bounds = [113.75, 22.4, 114.62, 22.86]
params = tbd.area_to_params(bounds, accuracy=1000)
</code>

Map each GPS point to a grid cell and aggregate counts:

<code># Assign grid indices
data['LONCOL'], data['LATCOL'] = tbd.GPS_to_grids(data['lon'], data['lat'], params)

# Aggregate vehicle counts per grid cell
grid_agg = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()
grid_agg['geometry'] = tbd.grid_to_polygon([grid_agg['LONCOL'], grid_agg['LATCOL']], params)
grid_gdf = gpd.GeoDataFrame(grid_agg)
</code>

Visualize the rasterized result with Matplotlib, adding a basemap, colorbar, scale bar, and compass:

<code>import matplotlib.pyplot as plt
fig = plt.figure(1, (8,8), dpi=300)
ax = plt.subplot(111)
sz.plot(ax=ax, edgecolor='k', facecolor=(0,0,0,0.1), linewidths=0.5)
grid_gdf.plot(column='VehicleNum', cmap='autumn_r', ax=ax)
tbd.plotscale(ax, bounds=bounds, textsize=10, compasssize=1, accuracy=2000, rect=[0.06,0.03], zorder=10)
plt.axis('off')
plt.show()
</code>

5. OD Extraction and Aggregation

Extract origin‑destination (OD) points directly from the GPS data:

<code>oddata = tbd.taxigps_to_od(data, col=['VehicleNum','time','Lng','Lat','OpenStatus'])
</code>

Aggregate OD counts on a 2 km × 2 km grid:

<code>params_od = tbd.area_to_params(bounds, accuracy=2000)
od_gdf = tbd.odagg_grid(oddata, params_od)
</code>

Alternatively, aggregate OD counts to administrative polygons:

<code># Without explicit grid parameters (direct spatial join)
od_gdf = tbd.odagg_shape(oddata, sz, round_accuracy=6)

# With grid parameters for faster processing
od_gdf = tbd.odagg_shape(oddata, sz, params=params_od)
</code>

6. Interactive Visualization

Leverage the built‑in Kepler.gl integration for interactive maps. Visualize point density:

<code>tbd.visualization_data(data, col=['lon','lat'], accuracy=1000, height=500)
</code>

Visualize OD flows as arcs:

<code>tbd.visualization_od(oddata, accuracy=2000, height=500)
</code>

Animate individual vehicle trajectories:

<code>tbd.visualization_trip(data, col=['lon','lat','VehicleNum','time'], height=500)
</code>

All visualizations are rendered as interactive web maps that can be explored directly in a Jupyter notebook.

big dataPythondata-visualizationGISTransBigDataTransportation Data
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.