What Changsha’s Top Attractions Reveal: A Python-Powered Data Dive
This article walks through a Python-driven web‑scraping and data‑analysis workflow that collects, cleans, and visualizes tourism and food data for Changsha, revealing the most popular sights, highly‑rated eateries, and visitor preferences through interactive charts and word clouds.
Changsha Attractions
The author re‑runs the same web‑scraping code used for a previous Xiamen article to collect data on Changsha attractions. The scraped fields include Chinese name, English name, number of travel guides, number of reviews, location, ranking, proportion of travelers, and a brief description.
After cleaning, the dataset contains 1,152 records, mostly from Changsha with a few from nearby cities such as Ningxiang and Liuyang.
Overall Situation
A scatter plot of the top 10 attractions by number of travel guides ("strategy") and comments shows that Orange Isle, Yuelu Mountain, Yuelu Academy, and Taiping Old Street rank highest.
fig = px.scatter(changsha[:10],
x="strategy",
y="comment",
color="comment",
size="comment",
hover_name="cn_title",
text="cn_title")
fig.update_traces(textposition='top center')
fig.show()Top‑Ranked Attractions
Sorting by ranking (excluding zero) and selecting the top 20 reveals that Orange Isle, Yuelu Mountain, Huangxing Road Pedestrian Street, the Mawangdui Han Tomb site, and Hunan Provincial Museum are the most visited.
Most Commented Attractions
A second scatter plot highlights the attractions with the highest number of comments.
changsha2 = changsha[changsha["comment"] != 0].sort_values(by="comment", ascending=False)[:10]
fig = px.scatter(changsha2,
x="cn_title",
y="comment",
size="comment",
color="comment",
text="cn_title")
fig.update_traces(textposition='top center')
fig.show()Travel Guides Count
The top 10 attractions by number of travel guides are displayed in a bar chart.
Traveler Proportion
The "traveler proportion" field originally contains a percentage string; the percent sign is stripped and the value converted to an integer for plotting.
# Remove % and convert to int
changsha["lvyou_number"] = changsha["lvyou"].apply(lambda x: x.split("%")[0])
changsha["lvyou_number"] = changsha["lvyou_number"].astype(int)Attraction Descriptions Word Cloud
The "abstract" field is tokenized with jieba, stop‑words are removed, word frequencies are counted, and a word‑cloud is generated using pyecharts.
abstract_list = changsha["abstract"].tolist()
jieba_name = []
for i in range(len(abstract_list)):
seg_list = jieba.cut(str(abstract_list[i]).strip(), cut_all=False)
for each in list(seg_list):
jieba_name.append(each)
# Remove stop‑words
stopwords = stopwordslist('nlp_stopwords.txt')
stopword_list = []
for word in jieba_name:
if word not in stopwords and word not in ["\t", " ", "nan"]:
stopword_list.append(word)
# Count frequencies
dic = {}
for each in stopword_list:
dic[each] = dic.get(each, 0) + 1
# Convert to list of tuples for word cloud
tuple_list = [(k, v) for k, v in dic.items()][:20]Changsha Food
The second part focuses on food establishments in Changsha and nearby areas. Data is collected from Qunar travel pages.
Sending Requests
url = "https://travel.qunar.com/p-cs300022-changsha-meishi?page=1"
headers = {"user-agent": "personal user‑agent"}
response = requests.get(url=url, headers=headers)
result = response.content.decode()Field Extraction
The following fields are extracted: Chinese name (cn_title), score, average price (person_avg), address, recommended dishes (recommand), and comments.
Parsing Multiple Pages
For each of the 200 pages, the script extracts the fields, handling missing values by inserting "0" or "无" (none).
# Example for extracting average price
person_avg = []
for i in range(len(sublistbox)):
try:
if "均" in sublistbox[i]:
person_avg.append(re.findall('¥ (.*?)</dd></dl>', sublistbox[i], re.S)[0])
else:
person_avg.append(0)
except:
person_avg.append(0)Data Assembly
df = pd.DataFrame({
"中文名": cn_title_list,
"得分": score_list,
"均价": person_avg_list,
"地址": address_list,
"推荐菜": recommand_list,
"评价": comment_list
})
# df.to_csv("长沙美食.csv", index=False, encoding='utf_8_sig')Food Data Analysis
Pre‑processing replaces "--" in scores with "0" and converts numeric columns to appropriate types.
df["得分"] = df["得分"].apply(lambda x: x.replace("--", "0"))Wenheyou Restaurants
There are 20 Wenheyou locations; the highest‑scoring one is on Fuzhong Road.
Stinky Tofu
19 stinky tofu shops are found; one on Taiping Street costs 31 CNY per serving.
Tea Yan Yu Se
Ten stores are identified, with average prices around 17 CNY.
Noodles (Mi Fen)
103 noodle shops are listed; the top‑scoring ten cost roughly 14‑15 CNY.
Shop Summary
A bar chart summarizes counts of different shop types: Wenheyou (19), Stinky Tofu (18), Tea Yan Yu Se (9), Noodle shops (103), Bars (15), Hot‑pot (28).
doors = pd.DataFrame({
"数量": [19, 18, 9, 103, 15, 28],
"名称": ["文和友", "臭豆腐", "茶颜悦色", "粉店", "酒吧", "火锅店"]
})
doors = doors.sort_values("数量", ascending=False)
Bar().add_xaxis(doors["名称"].tolist())\
.add_yaxis("长沙店铺", doors["数量"].tolist())\
.reversal_axis()\
.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right"))Recommended Dishes Word Cloud
Recommended dishes are extracted, tokenized, counted, and visualized as a word cloud.
rec = df[df["推荐菜"] != "无"].sort_values("得分", ascending=False).reset_index(drop=True)
rec_list = rec["推荐菜"].tolist()
rec_jieba_list = []
for i in range(len(rec_list)):
seg_list = jieba.cut(str(rec_list[i]).strip(), cut_all=False)
for each in list(seg_list):
rec_jieba_list.append(each)
rec_result = pd.value_counts(rec_jieba_list)[1:].to_frame().reset_index().rename(columns={"index":"词语", 0:"次数"})
rec_words = [tuple(z) for z in zip(rec_result["词语"].tolist(), rec_result["次数"].tolist())]
WordCloud().add("", rec_words, word_size_range=[20, 100], shape=SymbolType.DIAMOND)\
.set_global_opts(title_opts=opts.TitleOpts(title="长沙美食推荐菜词云"))Conclusion
The analysis suggests that visitors to Changsha should prioritize Orange Isle for its scenic views and fireworks, explore the bustling Five‑One Square area for food and entertainment, visit cultural sites such as the Hunan Provincial Museum and Mawangdui Han Tomb, try the highly‑rated Wenheyou restaurants for lobster, and definitely sample local noodle shops.
Everything that seems to have passed has never truly left; the love and warmth you give keep this place alive.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
