Generating Word Cloud and Pie Chart from News Articles Using Python
This tutorial explains how to scrape a news article with Python, segment Chinese text, count word frequencies, and visualize the top ten words using a word cloud and a pie chart, providing complete code and sample results.
This article demonstrates how to scrape a news webpage, extract its text, perform Chinese word segmentation, count word frequencies, and visualize the top ten words using a word cloud and a pie chart.
Solution steps: 1) Crawl all text from the news article; 2) Split the text into individual characters; 3) Count occurrences and select the ten most frequent characters; 4) Generate visualizations.
Below is the complete Python script used for the process:
import jieba as jieba
import requests
from bs4 import BeautifulSoup
from pyecharts.charts import WordCloud, Pie
if __name__ == "__main__":
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
}
url = "https://new.qq.com/rain/a/20230315A08LAK00"
res_html = requests.get(url, headers=headers).text
soup = BeautifulSoup(res_html, "lxml")
txt = soup.select(".content-article")[0].text
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word] = counts.get(word, 0) + 1
sort_data = sorted(counts.items(), key=lambda a: a[1], reverse=True)[:10]
wc = WordCloud()
wc.add("", sort_data, word_size_range=[20, 100])
wc.render("1.html")
pip = Pie()
pip.add(
series_name="次数",
data_pair=sort_data
)
pip.render("2.html")The generated word cloud and pie chart are displayed in the article, illustrating the most frequent words in the news content.
At the end of the article, a QR code is provided for readers to claim free Python course materials and additional resources.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.