Fundamentals 16 min read

Unlock Changsha’s Hidden Gems: Python Web Scraping & Data Analysis Tutorial

This article demonstrates how to scrape attraction and food data for Changsha using Python, process it with pandas, visualize insights with Plotly and Pyecharts, and derive travel recommendations such as top scenic spots, popular eateries, and price trends.

Python Crawling & Data Mining

Aug 18, 2021

Unlock Changsha’s Hidden Gems: Python Web Scraping & Data Analysis Tutorial

Import Libraries

import pandas as pd
import re
import csv
import json
import requests
import random
# display all columns/rows (optional)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)
# pd.set_option('max_colwidth', 100)
import jieba
import matplotlib.pyplot as plt
from pyecharts.globals import CurrentConfig, OnlineHostType
from pyecharts import options as opts
from pyecharts.charts import Bar, Pie, Line, HeatMap, Funnel, WordCloud, Grid, Page
from pyecharts.commons.utils import JsCode
from pyecharts.globals import ThemeType, SymbolType
import plotly.express as px
import plotly.graph_objects as go

Changsha Attractions Data

Data was collected similarly to a previous Xiamen article; fields include Chinese name, English name, number of strategies, comment count, location, ranking, traveler proportion, and a brief description.

The final dataset contains 1,152 records, mostly Changsha attractions with a few nearby cities (Ningxiang, Liuyang, etc.).

Overall Overview

A scatter plot of strategy count vs. comment count shows that Orange Isle, Yuelu Mountain, Yuelu Academy, and Taiping Old Street rank highest.

Top Ranked Attractions

After sorting by ranking (excluding ranking = 0) the top attractions are Orange Isle, Yuelu Mountain, Huangxing Road Pedestrian Street, Mawangdui Han Tomb, and Hunan Provincial Museum.

Most Commented Attractions

A scatter plot of comment counts highlights the most discussed spots.

Strategy Count (Number of Guides)

The top 10 attractions by number of strategies are displayed.

Traveler Proportion

The traveler proportion field (a string with a "%" suffix) was cleaned by removing the "%" and converting to integer.

Changsha Food Data

Food data was scraped from Qunar (200 pages, 10 items per page). Fields include name (cn_title), score, average price (person_avg), address, recommended dishes, and comments.

Request Sending

url = "https://travel.qunar.com/p-cs300022-changsha-meishi?page=1"
headers = {"user-agent": "personal request header"}
response = requests.get(url=url, headers=headers)
result = response.content.decode()

Field Extraction

Regular expressions were used to extract each field. Missing values were replaced with "0" or "无" as appropriate.

# Example for extracting Chinese name
cn_title = re.findall('cn_tit\">(.*?)</span>.*?countbox', result, re.S)
# Example for extracting score ("--" means missing)
score = re.findall('cur_score\">(.*?)</span>.*?total_score', result, re.S)
# Example for extracting average price
person_avg = []
for i in range(len(sublistbox)):
    try:
        if "均" in sublistbox[i]:
            person_avg.append(re.findall('&yen;  (.*?)</dd></dl>', sublistbox[i], re.S)[0])
        else:
            person_avg.append(0)
    except:
        person_avg.append(0)

All pages (1‑200) were iterated to build complete lists for each column, then saved to a CSV file.

df = pd.DataFrame({
    "中文名": cn_title_list,
    "得分": score_list,
    "均价": person_avg_list,
    "地址": address_list,
    "推荐菜": recommand_list,
    "评价": comment_list
})
# df.to_csv("长沙美食.csv", index=False, encoding='utf_8_sig')

Food Data Analysis

Pre‑processing

Score column: replace "--" with "0" and convert to numeric.

df["得分"] = df["得分"].apply(lambda x: x.replace("--", "0"))

Wenheyou Restaurants

20 Wenheyou stores were found; the highest‑scored one is on Fuzhong Road.

"The shrimp is fresh, tender, and well‑seasoned. Not overly spicy, but enough for most people."

Stinky Tofu

19 stinky tofu shops were identified; a typical price is 31 CNY at Taiping Street No. 21.

Tea Yan Yue Se

10 stores were captured; average price is around 17 CNY.

Rice Noodles

103 noodle shops were found. The top 10 by score have prices mainly between 14‑15 CNY.

Other Shop Types

Counts: 19 Wenheyou, 19 stinky tofu, 10 Tea Yan Yue Se, 103 noodle shops, 15 bars, 28 hot‑pot restaurants.

doors = pd.DataFrame({
    "数量": [19, 19, 10, 103, 15, 28],
    "名称": ["文和友", "臭豆腐", "茶颜悦色", "粉店", "酒吧", "火锅店"]
})
doors = doors.sort_values("数量", ascending=False)
Bar().add_xaxis(doors["名称"].tolist()).add_yaxis("长沙店铺", doors["数量"].tolist()).reversal_axis().set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right"))

Recommended Dishes Word Cloud

A word cloud of the "recommended dishes" field was generated using Jieba segmentation.

Conclusion

Key travel suggestions derived from the analysis:

Visit Orange Isle for its bridge and fireworks.

Explore Wuyi Square area (Taiping Old Street, Huogongdian, Huangxing Road Pedestrian Street).

Check out Hunan Provincial Museum and Mawangdui Han Tomb for history lovers.

Try Wenheyou’s shrimp (e.g., Haixin Plaza store) if you enjoy spicy seafood.

Don’t miss Hunan rice noodles; many shops are available, e.g., Yuanwei Noodle.

"Everything that seems to have passed has never left; the love and warmth you give keep me steadfastly guarding this place."

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Web Scraping plotly Tourism Changsha

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.