Big Data 17 min read

What Changsha’s Top Attractions Reveal: A Python-Powered Data Dive

This article walks through a Python-driven web‑scraping and data‑analysis workflow that collects, cleans, and visualizes tourism and food data for Changsha, revealing the most popular sights, highly‑rated eateries, and visitor preferences through interactive charts and word clouds.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
What Changsha’s Top Attractions Reveal: A Python-Powered Data Dive

Changsha Attractions

The author re‑runs the same web‑scraping code used for a previous Xiamen article to collect data on Changsha attractions. The scraped fields include Chinese name, English name, number of travel guides, number of reviews, location, ranking, proportion of travelers, and a brief description.

After cleaning, the dataset contains 1,152 records, mostly from Changsha with a few from nearby cities such as Ningxiang and Liuyang.

Overall Situation

A scatter plot of the top 10 attractions by number of travel guides ("strategy") and comments shows that Orange Isle, Yuelu Mountain, Yuelu Academy, and Taiping Old Street rank highest.

fig = px.scatter(changsha[:10],
                 x="strategy",
                 y="comment",
                 color="comment",
                 size="comment",
                 hover_name="cn_title",
                 text="cn_title")
fig.update_traces(textposition='top center')
fig.show()

Top‑Ranked Attractions

Sorting by ranking (excluding zero) and selecting the top 20 reveals that Orange Isle, Yuelu Mountain, Huangxing Road Pedestrian Street, the Mawangdui Han Tomb site, and Hunan Provincial Museum are the most visited.

Most Commented Attractions

A second scatter plot highlights the attractions with the highest number of comments.

changsha2 = changsha[changsha["comment"] != 0].sort_values(by="comment", ascending=False)[:10]
fig = px.scatter(changsha2,
                 x="cn_title",
                 y="comment",
                 size="comment",
                 color="comment",
                 text="cn_title")
fig.update_traces(textposition='top center')
fig.show()

Travel Guides Count

The top 10 attractions by number of travel guides are displayed in a bar chart.

Traveler Proportion

The "traveler proportion" field originally contains a percentage string; the percent sign is stripped and the value converted to an integer for plotting.

# Remove % and convert to int
changsha["lvyou_number"] = changsha["lvyou"].apply(lambda x: x.split("%")[0])
changsha["lvyou_number"] = changsha["lvyou_number"].astype(int)

Attraction Descriptions Word Cloud

The "abstract" field is tokenized with jieba, stop‑words are removed, word frequencies are counted, and a word‑cloud is generated using pyecharts.

abstract_list = changsha["abstract"].tolist()
jieba_name = []
for i in range(len(abstract_list)):
    seg_list = jieba.cut(str(abstract_list[i]).strip(), cut_all=False)
    for each in list(seg_list):
        jieba_name.append(each)
# Remove stop‑words
stopwords = stopwordslist('nlp_stopwords.txt')
stopword_list = []
for word in jieba_name:
    if word not in stopwords and word not in ["\t", " ", "nan"]:
        stopword_list.append(word)
# Count frequencies
dic = {}
for each in stopword_list:
    dic[each] = dic.get(each, 0) + 1
# Convert to list of tuples for word cloud
tuple_list = [(k, v) for k, v in dic.items()][:20]

Changsha Food

The second part focuses on food establishments in Changsha and nearby areas. Data is collected from Qunar travel pages.

Sending Requests

url = "https://travel.qunar.com/p-cs300022-changsha-meishi?page=1"
headers = {"user-agent": "personal user‑agent"}
response = requests.get(url=url, headers=headers)
result = response.content.decode()

Field Extraction

The following fields are extracted: Chinese name (cn_title), score, average price (person_avg), address, recommended dishes (recommand), and comments.

Parsing Multiple Pages

For each of the 200 pages, the script extracts the fields, handling missing values by inserting "0" or "无" (none).

# Example for extracting average price
person_avg = []
for i in range(len(sublistbox)):
    try:
        if "均" in sublistbox[i]:
            person_avg.append(re.findall('&yen; (.*?)</dd></dl>', sublistbox[i], re.S)[0])
        else:
            person_avg.append(0)
    except:
        person_avg.append(0)

Data Assembly

df = pd.DataFrame({
    "中文名": cn_title_list,
    "得分": score_list,
    "均价": person_avg_list,
    "地址": address_list,
    "推荐菜": recommand_list,
    "评价": comment_list
})
# df.to_csv("长沙美食.csv", index=False, encoding='utf_8_sig')

Food Data Analysis

Pre‑processing replaces "--" in scores with "0" and converts numeric columns to appropriate types.

df["得分"] = df["得分"].apply(lambda x: x.replace("--", "0"))

Wenheyou Restaurants

There are 20 Wenheyou locations; the highest‑scoring one is on Fuzhong Road.

Stinky Tofu

19 stinky tofu shops are found; one on Taiping Street costs 31 CNY per serving.

Tea Yan Yu Se

Ten stores are identified, with average prices around 17 CNY.

Noodles (Mi Fen)

103 noodle shops are listed; the top‑scoring ten cost roughly 14‑15 CNY.

Shop Summary

A bar chart summarizes counts of different shop types: Wenheyou (19), Stinky Tofu (18), Tea Yan Yu Se (9), Noodle shops (103), Bars (15), Hot‑pot (28).

doors = pd.DataFrame({
    "数量": [19, 18, 9, 103, 15, 28],
    "名称": ["文和友", "臭豆腐", "茶颜悦色", "粉店", "酒吧", "火锅店"]
})
doors = doors.sort_values("数量", ascending=False)
Bar().add_xaxis(doors["名称"].tolist())\
    .add_yaxis("长沙店铺", doors["数量"].tolist())\
    .reversal_axis()\
    .set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right"))

Recommended Dishes Word Cloud

Recommended dishes are extracted, tokenized, counted, and visualized as a word cloud.

rec = df[df["推荐菜"] != "无"].sort_values("得分", ascending=False).reset_index(drop=True)
rec_list = rec["推荐菜"].tolist()
rec_jieba_list = []
for i in range(len(rec_list)):
    seg_list = jieba.cut(str(rec_list[i]).strip(), cut_all=False)
    for each in list(seg_list):
        rec_jieba_list.append(each)
rec_result = pd.value_counts(rec_jieba_list)[1:].to_frame().reset_index().rename(columns={"index":"词语", 0:"次数"})
rec_words = [tuple(z) for z in zip(rec_result["词语"].tolist(), rec_result["次数"].tolist())]
WordCloud().add("", rec_words, word_size_range=[20, 100], shape=SymbolType.DIAMOND)\
    .set_global_opts(title_opts=opts.TitleOpts(title="长沙美食推荐菜词云"))

Conclusion

The analysis suggests that visitors to Changsha should prioritize Orange Isle for its scenic views and fireworks, explore the bustling Five‑One Square area for food and entertainment, visit cultural sites such as the Hunan Provincial Museum and Mawangdui Han Tomb, try the highly‑rated Wenheyou restaurants for lobster, and definitely sample local noodle shops.

Everything that seems to have passed has never truly left; the love and warmth you give keep this place alive.
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Web ScrapingplotlyTourismChangsha
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.