Fundamentals 3 min read

Generating Word Cloud and Pie Chart from News Articles Using Python

This tutorial explains how to scrape a news article with Python, segment Chinese text, count word frequencies, and visualize the top ten words using a word cloud and a pie chart, providing complete code and sample results.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Generating Word Cloud and Pie Chart from News Articles Using Python

This article demonstrates how to scrape a news webpage, extract its text, perform Chinese word segmentation, count word frequencies, and visualize the top ten words using a word cloud and a pie chart.

Solution steps: 1) Crawl all text from the news article; 2) Split the text into individual characters; 3) Count occurrences and select the ten most frequent characters; 4) Generate visualizations.

Below is the complete Python script used for the process:

import jieba as jieba
import requests
from bs4 import BeautifulSoup
from pyecharts.charts import WordCloud, Pie

if __name__ == "__main__":
    headers = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
    }
    url = "https://new.qq.com/rain/a/20230315A08LAK00"
    res_html = requests.get(url, headers=headers).text
    soup = BeautifulSoup(res_html, "lxml")
    txt = soup.select(".content-article")[0].text
    words = jieba.lcut(txt)

    counts = {}
    for word in words:
        if len(word) == 1:
            continue
        else:
            counts[word] = counts.get(word, 0) + 1

    sort_data = sorted(counts.items(), key=lambda a: a[1], reverse=True)[:10]

    wc = WordCloud()
    wc.add("", sort_data, word_size_range=[20, 100])
    wc.render("1.html")

    pip = Pie()
    pip.add(
        series_name="次数",
        data_pair=sort_data
    )
    pip.render("2.html")

The generated word cloud and pie chart are displayed in the article, illustrating the most frequent words in the news content.

At the end of the article, a QR code is provided for readers to claim free Python course materials and additional resources.

pythonData VisualizationWeb ScrapingpyechartsjiebaWordCloud
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.