Turn Chinese Lyrics JSON into Word Clouds with Python
This guide walks you through extracting Jay Chou's lyrics from a Chinese lyrics JSON database, preprocessing the text with Python and Jieba, generating frequency tables, and visualizing the results as word clouds using both code and online tools.
Data Source and Goal
The lyrics data come from a Chinese lyrics database stored in JSON format. The objective is to preprocess the data, extract Jay Chou's songs, perform word segmentation, and create visualizations such as word clouds.
Method 1: Convert JSON to Excel
Use an online JSON‑to‑CSV/Excel converter, open the file in Excel, filter the singer column for “周杰伦”, copy the lyrics, and save them as a plain text (.txt) file.
Method 2: Python Pre‑processing
Below is the Python code that reads the JSON, filters for Jay Chou, and writes all lyrics to a text file.
import json with open('lyrics.json', 'r') as f:
data = json.load(f) data_zjl = [item for item in data if item['singer'] == '周杰伦']
print(len(data_zjl)) zjl_lyrics = []
for song in data_zjl:
zjl_lyrics = zjl_lyrics + song['lyric'] with open('zjl_lyrics.txt', 'w') as outfifile:
outfifile.write('
'.join(zjl_lyrics))The resulting zjl_lyrics.txt contains all of Jay Chou's lyrics (see Figure 1).
Word Segmentation with Jieba
Install the required libraries and load a Chinese stop‑words list.
import jieba
import jieba.analyse
import pandas as pd
from collections import Counter with open('chinese_stop_words.txt') as f:
stopwords = [line.strip() for line in f.readlines()]Read the lyrics file, segment the text, remove stop‑words and symbols, and count word frequencies.
fifile = open('zjl_lyrics.txt').read()
words = jieba.lcut(fifile, cut_all=False, use_paddle=True)
words = [w for w in words if w not in stopwords]
words = [w.strip() for w in words]
words = [w for w in words if w != ' ']
words_fifilter = [w for w in words if len(w) > 1]
df = pd.DataFrame.from_dict(Counter(words_fifilter), orient='index').reset_index()
df = df.rename(columns={'index': 'words', 0: 'count'})
df.to_excel('周杰伦分词结果.xlsx')The frequency table (Figure 2) can be used for further visualization.
Word Cloud Generation (Python)
from wordcloud import WordCloud
# Need a Chinese font to avoid garbled characters
wc = WordCloud(font_path='Alibaba-PuHuiTi-Regular.ttf', background_color='white', max_words=2000)
wc.generate(' '.join(words_fifilter))
import matplotlib.pyplot as plt
plt.imshow(wc)
plt.figure(figsize=(12,10), dpi=300)
plt.axis('off')
plt.show()The resulting word cloud is shown in Figure 3.
Online Word‑Cloud Tools
Web tools such as 微词云, 易词云, and 图悦 also support Chinese word‑cloud creation. Using 微词云 as an example, upload the Excel file with “单词” and “词频” columns, choose “分词筛词后导入”, and generate the cloud (Figure 4).
Further customization—selecting word categories, adjusting font size proportion, setting the number of words, and choosing a mask shape—allows fine‑tuned visual output (Figures 5 and 6).
Alternative Visualizations
Beyond word clouds, other charts such as proportional circles can display high‑frequency terms (Figure 7).
Both code‑based and online approaches have trade‑offs; users should compare results to choose the best method for their needs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
