Fundamentals 6 min read

How to Scrape and Analyze Hotel Reviews Near Top Chinese Universities with Python

This article explains how to use Python to crawl hotel information and reviews near major Chinese universities, process the data, handle common pitfalls, and draw insights about hotel distribution and rating patterns within a 2‑kilometer radius.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Scrape and Analyze Hotel Reviews Near Top Chinese Universities with Python

1. Introduction

This article introduces a Python web‑scraping project that collects hotel listings and review data around well‑known Chinese universities, then analyzes the results to see how many hotels are available and what their ratings are.

2. Implementation

Step 1: Retrieve hotel information near universities

Because the Meituan hotel client does not expose review data, the mobile web page https://i.meituan.com/awp/h5/hotel/search/search.html was used. By searching for hotels near a university (e.g., Peking University) and intercepting the network traffic, a JSON API URL was discovered.

The URL parameters include limit (maximum number of hotels, up to 50), offset (starting index), cityId (city identifier), sort=distance (order by distance), and q / keyword (university name).

The returned JSON contains hotel name, location, rating, realPoiId (hotel identifier used later for comments), and distance to the university.

Using this API, the script crawls the top 10 universities (selected arbitrarily for learning) and collects hotel data within a 2 km radius.

Step 2: Retrieve reviews for each hotel

The API endpoint that returns the number of comments for a hotel uses the poiId (hotel identifier).

Another endpoint returns all comments for a hotel in JSON format; the limit parameter can be set to the total number of comments, allowing a single request to fetch the full review set.

Step 3: Pitfalls encountered

Initially the comment API returned only 15 items per request; increasing the limit to the maximum solves this, but the hotel list’s comment count field is unreliable, so the second API must be used to get the exact number.

Comments contain many emojis and stray symbols that require thorough cleaning.

Using proxy IPs is advisable; otherwise the scraper may be blocked due to high request volume.

3. Conclusion

The Python crawler successfully gathered hotel counts and review quantities near selected universities, demonstrating a reusable approach for extracting location‑based data from Meituan. The same technique can be adapted to other domains and regions for broader data‑mining projects.

Pythonhotel reviewsUniversity Proximity
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.