Unlock Changsha’s Hidden Gems: Python Web Scraping & Data Analysis Tutorial
This article demonstrates how to scrape attraction and food data for Changsha using Python, process it with pandas, visualize insights with Plotly and Pyecharts, and derive travel recommendations such as top scenic spots, popular eateries, and price trends.
Import Libraries
import pandas as pd
import re
import csv
import json
import requests
import random
# display all columns/rows (optional)
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)
# pd.set_option('max_colwidth', 100)
import jieba
import matplotlib.pyplot as plt
from pyecharts.globals import CurrentConfig, OnlineHostType
from pyecharts import options as opts
from pyecharts.charts import Bar, Pie, Line, HeatMap, Funnel, WordCloud, Grid, Page
from pyecharts.commons.utils import JsCode
from pyecharts.globals import ThemeType, SymbolType
import plotly.express as px
import plotly.graph_objects as goChangsha Attractions Data
Data was collected similarly to a previous Xiamen article; fields include Chinese name, English name, number of strategies, comment count, location, ranking, traveler proportion, and a brief description.
The final dataset contains 1,152 records, mostly Changsha attractions with a few nearby cities (Ningxiang, Liuyang, etc.).
Overall Overview
A scatter plot of strategy count vs. comment count shows that Orange Isle, Yuelu Mountain, Yuelu Academy, and Taiping Old Street rank highest.
Top Ranked Attractions
After sorting by ranking (excluding ranking = 0) the top attractions are Orange Isle, Yuelu Mountain, Huangxing Road Pedestrian Street, Mawangdui Han Tomb, and Hunan Provincial Museum.
Most Commented Attractions
A scatter plot of comment counts highlights the most discussed spots.
Strategy Count (Number of Guides)
The top 10 attractions by number of strategies are displayed.
Traveler Proportion
The traveler proportion field (a string with a "%" suffix) was cleaned by removing the "%" and converting to integer.
Changsha Food Data
Food data was scraped from Qunar (200 pages, 10 items per page). Fields include name (cn_title), score, average price (person_avg), address, recommended dishes, and comments.
Request Sending
url = "https://travel.qunar.com/p-cs300022-changsha-meishi?page=1"
headers = {"user-agent": "personal request header"}
response = requests.get(url=url, headers=headers)
result = response.content.decode()Field Extraction
Regular expressions were used to extract each field. Missing values were replaced with "0" or "无" as appropriate.
# Example for extracting Chinese name
cn_title = re.findall('cn_tit\">(.*?)</span>.*?countbox', result, re.S)
# Example for extracting score ("--" means missing)
score = re.findall('cur_score\">(.*?)</span>.*?total_score', result, re.S)
# Example for extracting average price
person_avg = []
for i in range(len(sublistbox)):
try:
if "均" in sublistbox[i]:
person_avg.append(re.findall('¥ (.*?)</dd></dl>', sublistbox[i], re.S)[0])
else:
person_avg.append(0)
except:
person_avg.append(0)All pages (1‑200) were iterated to build complete lists for each column, then saved to a CSV file.
df = pd.DataFrame({
"中文名": cn_title_list,
"得分": score_list,
"均价": person_avg_list,
"地址": address_list,
"推荐菜": recommand_list,
"评价": comment_list
})
# df.to_csv("长沙美食.csv", index=False, encoding='utf_8_sig')Food Data Analysis
Pre‑processing
Score column: replace "--" with "0" and convert to numeric.
df["得分"] = df["得分"].apply(lambda x: x.replace("--", "0"))Wenheyou Restaurants
20 Wenheyou stores were found; the highest‑scored one is on Fuzhong Road.
"The shrimp is fresh, tender, and well‑seasoned. Not overly spicy, but enough for most people."
Stinky Tofu
19 stinky tofu shops were identified; a typical price is 31 CNY at Taiping Street No. 21.
Tea Yan Yue Se
10 stores were captured; average price is around 17 CNY.
Rice Noodles
103 noodle shops were found. The top 10 by score have prices mainly between 14‑15 CNY.
Other Shop Types
Counts: 19 Wenheyou, 19 stinky tofu, 10 Tea Yan Yue Se, 103 noodle shops, 15 bars, 28 hot‑pot restaurants.
doors = pd.DataFrame({
"数量": [19, 19, 10, 103, 15, 28],
"名称": ["文和友", "臭豆腐", "茶颜悦色", "粉店", "酒吧", "火锅店"]
})
doors = doors.sort_values("数量", ascending=False)
Bar().add_xaxis(doors["名称"].tolist()).add_yaxis("长沙店铺", doors["数量"].tolist()).reversal_axis().set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right"))Recommended Dishes Word Cloud
A word cloud of the "recommended dishes" field was generated using Jieba segmentation.
Conclusion
Key travel suggestions derived from the analysis:
Visit Orange Isle for its bridge and fireworks.
Explore Wuyi Square area (Taiping Old Street, Huogongdian, Huangxing Road Pedestrian Street).
Check out Hunan Provincial Museum and Mawangdui Han Tomb for history lovers.
Try Wenheyou’s shrimp (e.g., Haixin Plaza store) if you enjoy spicy seafood.
Don’t miss Hunan rice noodles; many shops are available, e.g., Yuanwei Noodle.
"Everything that seems to have passed has never left; the love and warmth you give keep me steadfastly guarding this place."
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
