How to Scrape and Analyze Hotel Reviews Near Top Chinese Universities with Python
This article explains how to use Python to crawl hotel information and reviews near major Chinese universities, process the data, handle common pitfalls, and draw insights about hotel distribution and rating patterns within a 2‑kilometer radius.
1. Introduction
This article introduces a Python web‑scraping project that collects hotel listings and review data around well‑known Chinese universities, then analyzes the results to see how many hotels are available and what their ratings are.
2. Implementation
Step 1: Retrieve hotel information near universities
Because the Meituan hotel client does not expose review data, the mobile web page https://i.meituan.com/awp/h5/hotel/search/search.html was used. By searching for hotels near a university (e.g., Peking University) and intercepting the network traffic, a JSON API URL was discovered.
The URL parameters include limit (maximum number of hotels, up to 50), offset (starting index), cityId (city identifier), sort=distance (order by distance), and q / keyword (university name).
The returned JSON contains hotel name, location, rating, realPoiId (hotel identifier used later for comments), and distance to the university.
Using this API, the script crawls the top 10 universities (selected arbitrarily for learning) and collects hotel data within a 2 km radius.
Step 2: Retrieve reviews for each hotel
The API endpoint that returns the number of comments for a hotel uses the poiId (hotel identifier).
Another endpoint returns all comments for a hotel in JSON format; the limit parameter can be set to the total number of comments, allowing a single request to fetch the full review set.
Step 3: Pitfalls encountered
Initially the comment API returned only 15 items per request; increasing the limit to the maximum solves this, but the hotel list’s comment count field is unreliable, so the second API must be used to get the exact number.
Comments contain many emojis and stray symbols that require thorough cleaning.
Using proxy IPs is advisable; otherwise the scraper may be blocked due to high request volume.
3. Conclusion
The Python crawler successfully gathered hotel counts and review quantities near selected universities, demonstrating a reusable approach for extracting location‑based data from Meituan. The same technique can be adapted to other domains and regions for broader data‑mining projects.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
