Master TransBigData: Python Toolkit for Transportation Big Data
TransBigData is a Python library that streamlines the preprocessing, gridding, visualization, and OD extraction of transportation spatiotemporal datasets such as taxi GPS, bike sharing, and bus data, offering concise, efficient functions for data cleaning, rasterization, interactive mapping, and analytical workflows.
1. TransBigData Overview
TransBigData is a Python package designed for processing, analyzing, and visualizing transportation spatiotemporal big data (e.g., taxi GPS, shared bike, and bus GPS data). It provides fast, concise, and flexible methods for each stage of analysis, enabling complex tasks with simple code.
Key functionalities include data preprocessing, gridding, visualization, trajectory handling, map basemap and coordinate conversion, and specialized processing such as extracting order start‑end points, identifying home/work locations from signaling data, and building subway network topologies.
2. Data Preprocessing
TransBigData integrates seamlessly with pandas and geopandas. Example code reads a taxi GPS CSV, assigns column names, and loads a study area shapefile.
import pandas as pd
# Read data
data = pd.read_csv('TaxiData-Sample.csv', header=None)
data.columns = ['VehicleNum','time','lon','lat','OpenStatus','Speed']
data.head()Preprocessing methods such as tbd.clean_outofshape remove points outside the study area, and tbd.clean_taxi_status filter out instantaneous status changes.
# Data preprocessing
# Remove points outside the study area (requires grid size definition)
data = tbd.clean_outofshape(data, sz, col=['lon','lat'], accuracy=500)
# Remove abrupt status changes
data = tbd.clean_taxi_status(data, col=['VehicleNum','time','OpenStatus'])3. Data Gridding
Gridding converts point data into uniform spatial cells. Define the study area bounds and generate grid parameters with tbd.area_to_params.
# Define study area bounds
bounds = [113.75, 22.4, 114.62, 22.86]
# Get grid parameters (accuracy in meters)
params = tbd.area_to_params(bounds, accuracy=1000)
paramsThe resulting params dictionary contains origin coordinates, cell size, rotation, shape (rect, tri, hexa), and grid size.
{'slon': 113.75,
'slat': 22.4,
'deltalon': 0.00974336289289822,
'deltalat': 0.008993210412845813,
'theta': 0,
'method': 'rect',
'gridsize': 1000}Map GPS points to grid cells using tbd.GPS_to_grids, then aggregate counts and create a GeoDataFrame for plotting.
# Map GPS points to grid cells
data['LONCOL'], data['LATCOL'] = tbd.GPS_to_grids(data['lon'], data['lat'], params)
# Aggregate counts per grid
grid_agg = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()
# Generate geometry
grid_agg['geometry'] = tbd.grid_to_polygon([grid_agg['LONCOL'], grid_agg['LATCOL']], params)
grid_agg = gpd.GeoDataFrame(grid_agg)
# Plot
grid_agg.plot(column='VehicleNum', cmap='autumn_r')Enhanced visualizations add basemap, colorbar, scale bar, and compass with tbd.plotscale.
import matplotlib.pyplot as plt
fig = plt.figure(1, (8,8), dpi=300)
ax = plt.subplot(111)
plt.sca(ax)
# Basemap
sz.plot(ax=ax, edgecolor=(0,0,0,0), facecolor=(0,0,0,0.1), linewidths=0.5)
# Colorbar
cax = plt.axes([0.04, 0.33, 0.02, 0.3])
plt.title('Data count')
# Plot grid
grid_agg.plot(column='VehicleNum', cmap='autumn_r', ax=ax, cax=cax, legend=True)
# Scale and compass
tbd.plotscale(ax, bounds=bounds, textsize=10, compasssize=1, accuracy=2000, rect=[0.06,0.03], zorder=10)
plt.axis('off')
plt.xlim(bounds[0], bounds[2])
plt.ylim(bounds[1], bounds[3])
plt.show()4. OD Extraction and Aggregation
TransBigData can directly extract origin‑destination (OD) points from taxi GPS data.
# Extract OD from GPS data
oddat = tbd.taxigps_to_od(data, col=['VehicleNum','time','Lng','Lat','OpenStatus'])
oddataDefine a finer grid (e.g., 2 km) and aggregate OD counts.
# Define 2 km grid
params = tbd.area_to_params(bounds, accuracy=2000)
# Grid and aggregate OD
od_gdf = tbd.odagg_grid(oddata, params)
od_gdf.plot(column='count')OD can also be aggregated directly to administrative polygons, optionally using a grid for speed.
# OD aggregation to polygons
# Method 1: direct lat/lon match
od_gdf = tbd.odagg_shape(oddata, sz, round_accuracy=6)
# Method 2: grid‑first for large datasets
od_gdf = tbd.odagg_shape(oddata, sz, params=params)
od_gdf.plot(column='count')5. Interactive Visualization
Using the built‑in tbd.visualization_data, tbd.visualization_od, and tbd.visualization_trip functions, users can create interactive maps in Jupyter notebooks powered by the keplergl library.
# Visualize point density
tbd.visualization_data(data, col=['lon','lat'], accuracy=1000, height=500)
# Visualize OD arcs
tbd.visualization_od(oddata, accuracy=2000, height=500)
# Visualize trajectories with timestamps
tbd.visualization_trip(data, col=['lon','lat','VehicleNum','time'], height=500)Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
