Uncovering Ghost Bikes: How to Crawl and Analyze Mobike Data in Chengdu
This article details the process of capturing Mobike's public API data, building a high‑performance Python crawler with proxy rotation, storing the results in databases, and performing large‑scale analysis to reveal stationary bikes, travel distances, usage frequency, and urban development patterns in Chengdu.
The rise of the sharing economy has transformed many industries, with bike‑sharing becoming a prominent example. Users often encounter bikes shown in an app that are nowhere to be found, prompting questions about "zombie" bikes and intentional placement.
Where to Get Data
Accessing the data requires intercepting the HTTP requests made by the Mobike app or its WeChat mini‑program. Tools such as Wireshark, Shark for Root (Android), Fiddler, Charles, and Packet Capture can capture these requests. Because the app uses HTTPS, using a proxy like Fiddler or Charles is more practical.
By setting up a proxy on the phone and monitoring traffic, the relevant API endpoint was identified:
https://mwx.mobike.com/mobike-api/rent/nearbyBikesInfo.doThe request payload includes latitude and longitude parameters, and the response returns bike information within a square area.
Other Attempts
Reverse‑engineering the Android APK proved difficult due to heavy obfuscation. Some other bike‑sharing services, such as Xiaolan, use HTTPS with encrypted requests, making data extraction far more challenging.
Directory Structure
analysis – Jupyter notebooks for data analysis
influx-importer – scripts for importing into InfluxDB
modules – proxy handling modules
web – real‑time visualization (React demo)
crawler.py – core crawling logic
importToDb.py – import into PostgreSQL
sql.sql – table creation script
start.sh – script to run the crawler continuously
Idea
The crawler divides the target area into a grid of latitude/longitude squares and queries the API for each cell. Data is first stored in a SQLite database, deduplicated, and then exported to CSV for further analysis.
def start(self):
left = 30.7828453209
top = 103.9213455517
right = 30.4781772402
bottom = 104.2178123382
offset = 0.002
# ... create table and launch threads ...ThreadPoolExecutor with 250 workers is used to parallelize requests. After crawling, the data is grouped to remove overlaps between adjacent squares.
executor = ThreadPoolExecutor(max_workers=250)
for lat in np.arange(left, right, -offset):
for lon in np.arange(top, bottom, offset):
executor.submit(self.get_nearby_bikes, (lat, lon))
executor.shutdown()
self.group_data()A proxy pool with over 8,000 proxies is maintained; each request selects a high‑scoring proxy, and proxies that fail are penalized.
class ProxyProvider:
def pick(self):
self._proxies.sort(key=lambda p: p.score, reverse=True)
proxy = self._proxies[random.randrange(1, min(50, len(self._proxies)))]
proxy.used()
return proxyData Analysis Results
Standard vs. Lite Bike Count
In Chengdu, Mobike operates over 60,000 bikes, with the Lite model accounting for about 44% of the fleet, indicating a growing preference for the easier‑to‑ride version.
Approximately 30% of Bikes Never Moved
Analysis shows that about one‑third of the bikes remained stationary during the observation period, suggesting they may be placed in inaccessible or remote locations.
Travel Distance Mostly Under 3 km
Trips shorter than 3 km represent 87.2% of all rides, aligning with the intended short‑distance use case of shared bikes. Distances under 100 m were treated as GPS noise and excluded.
Ride Frequency Skewed Low
Among moving bikes, 60% completed five or fewer rides in the day, with 30% used only once or twice, indicating suboptimal utilization.
Urban Development Insights
Heat‑map analysis reveals a “dual‑core” development pattern: the traditional city center remains dense with bikes, while the emerging Tianfu New Area in the south shows rapid growth, especially around software parks and residential zones.
Disclaimer: This crawler is intended for learning and research purposes only. Users are responsible for any legal consequences arising from misuse.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
