How to Build a Python Web Crawler to Map 2019 Chinese National Day Travel Hotspots

This article walks through the complete process of designing, implementing, and visualizing a Python web crawler that extracts tourism hotspot data from ticketing sites for China's 2019 National Day holiday, covering requirement analysis, URL and element inspection, data collection, cleaning, and geographic heat‑map presentation using Pyecharts.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Build a Python Web Crawler to Map 2019 Chinese National Day Travel Hotspots

Idea and Planning

The goal is to create a tourism hotspot map for the 2019 National Day holiday by crawling ticketing websites, extracting city, province, and popularity data, and visualizing it on a China map.

Website and URL Analysis

We target ticket platforms such as Qunar, Ctrip, and others. The request URL consists of a constant Keyword ("热门景点"), a Subject representing the attraction category, and a Page number for pagination.

URL analysis diagram
URL analysis diagram

Element Analysis

Using Chrome DevTools we locate the list container with id="search-list". Each attraction item resides in a div with class sight_item, containing name, level, city, province, popularity, and address.

List element diagram
List element diagram

Crawler Setup

We use Python with requests for HTTP, fake_useragent to randomize headers, and BeautifulSoup for HTML parsing. A CSV file (UTF‑8) stores the extracted fields: region, name, city, province, popularity, address.

Data Extraction Workflow

Prepare files and import libraries.

Construct URLs and send requests with retry logic.

Parse HTML, locate the search-list div, iterate over sight_item entries, and extract required fields.

Detect the "next" button (class next) to paginate until all pages are processed.

Write each record to the CSV.

Data Cleaning and Aggregation

After crawling, we load the CSV, filter out invalid or missing data, group by city/province, and sum popularity scores to obtain hotspot intensity per region.

Visualization with Pyecharts

Using the pyecharts library, we generate a geographic heat‑map of China. The map highlights Beijing, coastal provinces (Fujian, Guangdong), Jiang‑Zhe region, Gansu, and Wuhan as the most visited areas during the holiday.

Tourism heat‑map
Tourism heat‑map

Result Analysis

The heat‑map shows that major tourist destinations include Beijing, coastal cities, Jiang‑Zhe, and Gansu. The top‑20 cities and 5A attractions are also listed, confirming the crawler’s effectiveness.

Conclusion

The project demonstrates a complete pipeline from requirement definition, web crawling, data processing, to visual analytics, providing a reusable framework for similar tourism or event‑driven data collection tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonWeb CrawlingPyechartsTourism
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.