Big Data 13 min read

Master TransBigData: Python Toolkit for Transportation Big Data

TransBigData is a Python library that streamlines the preprocessing, gridding, visualization, and OD extraction of transportation spatiotemporal datasets such as taxi GPS, bike sharing, and bus data, offering concise, efficient functions for data cleaning, rasterization, interactive mapping, and analytical workflows.

MaGe Linux Operations

Sep 28, 2022

Master TransBigData: Python Toolkit for Transportation Big Data

1. TransBigData Overview

TransBigData is a Python package designed for processing, analyzing, and visualizing transportation spatiotemporal big data (e.g., taxi GPS, shared bike, and bus GPS data). It provides fast, concise, and flexible methods for each stage of analysis, enabling complex tasks with simple code.

Key functionalities include data preprocessing, gridding, visualization, trajectory handling, map basemap and coordinate conversion, and specialized processing such as extracting order start‑end points, identifying home/work locations from signaling data, and building subway network topologies.

2. Data Preprocessing

TransBigData integrates seamlessly with pandas and geopandas. Example code reads a taxi GPS CSV, assigns column names, and loads a study area shapefile.

import pandas as pd
# Read data
data = pd.read_csv('TaxiData-Sample.csv', header=None)
data.columns = ['VehicleNum','time','lon','lat','OpenStatus','Speed']
data.head()

Preprocessing methods such as tbd.clean_outofshape remove points outside the study area, and tbd.clean_taxi_status filter out instantaneous status changes.

# Data preprocessing
# Remove points outside the study area (requires grid size definition)
data = tbd.clean_outofshape(data, sz, col=['lon','lat'], accuracy=500)
# Remove abrupt status changes
data = tbd.clean_taxi_status(data, col=['VehicleNum','time','OpenStatus'])

3. Data Gridding

Gridding converts point data into uniform spatial cells. Define the study area bounds and generate grid parameters with tbd.area_to_params.

# Define study area bounds
bounds = [113.75, 22.4, 114.62, 22.86]
# Get grid parameters (accuracy in meters)
params = tbd.area_to_params(bounds, accuracy=1000)
params

The resulting params dictionary contains origin coordinates, cell size, rotation, shape (rect, tri, hexa), and grid size.

{'slon': 113.75,
 'slat': 22.4,
 'deltalon': 0.00974336289289822,
 'deltalat': 0.008993210412845813,
 'theta': 0,
 'method': 'rect',
 'gridsize': 1000}

Map GPS points to grid cells using tbd.GPS_to_grids, then aggregate counts and create a GeoDataFrame for plotting.

# Map GPS points to grid cells
data['LONCOL'], data['LATCOL'] = tbd.GPS_to_grids(data['lon'], data['lat'], params)
# Aggregate counts per grid
grid_agg = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()
# Generate geometry
grid_agg['geometry'] = tbd.grid_to_polygon([grid_agg['LONCOL'], grid_agg['LATCOL']], params)
grid_agg = gpd.GeoDataFrame(grid_agg)
# Plot
grid_agg.plot(column='VehicleNum', cmap='autumn_r')

Enhanced visualizations add basemap, colorbar, scale bar, and compass with tbd.plotscale.

import matplotlib.pyplot as plt
fig = plt.figure(1, (8,8), dpi=300)
ax = plt.subplot(111)
plt.sca(ax)
# Basemap
sz.plot(ax=ax, edgecolor=(0,0,0,0), facecolor=(0,0,0,0.1), linewidths=0.5)
# Colorbar
cax = plt.axes([0.04, 0.33, 0.02, 0.3])
plt.title('Data count')
# Plot grid
grid_agg.plot(column='VehicleNum', cmap='autumn_r', ax=ax, cax=cax, legend=True)
# Scale and compass
tbd.plotscale(ax, bounds=bounds, textsize=10, compasssize=1, accuracy=2000, rect=[0.06,0.03], zorder=10)
plt.axis('off')
plt.xlim(bounds[0], bounds[2])
plt.ylim(bounds[1], bounds[3])
plt.show()

4. OD Extraction and Aggregation

TransBigData can directly extract origin‑destination (OD) points from taxi GPS data.

# Extract OD from GPS data
oddat = tbd.taxigps_to_od(data, col=['VehicleNum','time','Lng','Lat','OpenStatus'])
oddata

Define a finer grid (e.g., 2 km) and aggregate OD counts.

# Define 2 km grid
params = tbd.area_to_params(bounds, accuracy=2000)
# Grid and aggregate OD
od_gdf = tbd.odagg_grid(oddata, params)
od_gdf.plot(column='count')

OD can also be aggregated directly to administrative polygons, optionally using a grid for speed.

# OD aggregation to polygons
# Method 1: direct lat/lon match
od_gdf = tbd.odagg_shape(oddata, sz, round_accuracy=6)
# Method 2: grid‑first for large datasets
od_gdf = tbd.odagg_shape(oddata, sz, params=params)
od_gdf.plot(column='count')

5. Interactive Visualization

Using the built‑in tbd.visualization_data, tbd.visualization_od, and tbd.visualization_trip functions, users can create interactive maps in Jupyter notebooks powered by the keplergl library.

# Visualize point density
tbd.visualization_data(data, col=['lon','lat'], accuracy=1000, height=500)
# Visualize OD arcs
tbd.visualization_od(oddata, accuracy=2000, height=500)
# Visualize trajectories with timestamps
tbd.visualization_trip(data, col=['lon','lat','VehicleNum','time'], height=500)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data visualization GIS Transportation Data Spatiotemporal Analysis

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.