Big Data 13 min read

Uncovering Ghost Bikes: How to Crawl and Analyze Mobike Data in Chengdu

This article details the process of capturing Mobike's public API data, building a high‑performance Python crawler with proxy rotation, storing the results in databases, and performing large‑scale analysis to reveal stationary bikes, travel distances, usage frequency, and urban development patterns in Chengdu.

21CTO
21CTO
21CTO
Uncovering Ghost Bikes: How to Crawl and Analyze Mobike Data in Chengdu

The rise of the sharing economy has transformed many industries, with bike‑sharing becoming a prominent example. Users often encounter bikes shown in an app that are nowhere to be found, prompting questions about "zombie" bikes and intentional placement.

Where to Get Data

Accessing the data requires intercepting the HTTP requests made by the Mobike app or its WeChat mini‑program. Tools such as Wireshark, Shark for Root (Android), Fiddler, Charles, and Packet Capture can capture these requests. Because the app uses HTTPS, using a proxy like Fiddler or Charles is more practical.

By setting up a proxy on the phone and monitoring traffic, the relevant API endpoint was identified:

https://mwx.mobike.com/mobike-api/rent/nearbyBikesInfo.do

The request payload includes latitude and longitude parameters, and the response returns bike information within a square area.

Other Attempts

Reverse‑engineering the Android APK proved difficult due to heavy obfuscation. Some other bike‑sharing services, such as Xiaolan, use HTTPS with encrypted requests, making data extraction far more challenging.

Directory Structure

analysis – Jupyter notebooks for data analysis

influx-importer – scripts for importing into InfluxDB

modules – proxy handling modules

web – real‑time visualization (React demo)

crawler.py – core crawling logic

importToDb.py – import into PostgreSQL

sql.sql – table creation script

start.sh – script to run the crawler continuously

Idea

The crawler divides the target area into a grid of latitude/longitude squares and queries the API for each cell. Data is first stored in a SQLite database, deduplicated, and then exported to CSV for further analysis.

def start(self):
    left = 30.7828453209
    top = 103.9213455517
    right = 30.4781772402
    bottom = 104.2178123382
    offset = 0.002
    # ... create table and launch threads ...

ThreadPoolExecutor with 250 workers is used to parallelize requests. After crawling, the data is grouped to remove overlaps between adjacent squares.

executor = ThreadPoolExecutor(max_workers=250)
for lat in np.arange(left, right, -offset):
    for lon in np.arange(top, bottom, offset):
        executor.submit(self.get_nearby_bikes, (lat, lon))
executor.shutdown()
self.group_data()

A proxy pool with over 8,000 proxies is maintained; each request selects a high‑scoring proxy, and proxies that fail are penalized.

class ProxyProvider:
    def pick(self):
        self._proxies.sort(key=lambda p: p.score, reverse=True)
        proxy = self._proxies[random.randrange(1, min(50, len(self._proxies)))]
        proxy.used()
        return proxy

Data Analysis Results

Standard vs. Lite Bike Count

In Chengdu, Mobike operates over 60,000 bikes, with the Lite model accounting for about 44% of the fleet, indicating a growing preference for the easier‑to‑ride version.

Bike type distribution
Bike type distribution

Approximately 30% of Bikes Never Moved

Analysis shows that about one‑third of the bikes remained stationary during the observation period, suggesting they may be placed in inaccessible or remote locations.

Travel Distance Mostly Under 3 km

Trips shorter than 3 km represent 87.2% of all rides, aligning with the intended short‑distance use case of shared bikes. Distances under 100 m were treated as GPS noise and excluded.

Travel distance distribution
Travel distance distribution

Ride Frequency Skewed Low

Among moving bikes, 60% completed five or fewer rides in the day, with 30% used only once or twice, indicating suboptimal utilization.

Ride frequency distribution
Ride frequency distribution

Urban Development Insights

Heat‑map analysis reveals a “dual‑core” development pattern: the traditional city center remains dense with bikes, while the emerging Tianfu New Area in the south shows rapid growth, especially around software parks and residential zones.

Dual‑core city development
Dual‑core city development

Disclaimer: This crawler is intended for learning and research purposes only. Users are responsible for any legal consequences arising from misuse.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datageospatial analysisMobikeBike Sharingdata-crawling
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.