How to Compute Word Frequencies in Python with Jieba and Export to Excel
This article demonstrates how to calculate word frequencies in Chinese text using Python's jieba library, collections.Counter, and xlwt to export results to Excel, providing complete code examples and alternative approaches for effective text analysis.
Introduction
Hello, I am Pipi. A question about word‑frequency processing in Python was raised in a Python community group, and I share the solution here.
Below is the original code that simply counts word frequencies using collections.Counter:
# 统计词频
from collections import import Counter
wordcount = Counter(all_words)
word_count = wordcount.most_common(30)
frequence_list = []
for i in range(len(word_count)):
frequence_list.append(word_count[i][0])
frequence_listImplementation
A contributor provided a more complete solution that uses jieba for Chinese word segmentation, writes the frequencies to a text file, and then stores the results in an Excel workbook using xlwt.
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import sys
import jieba
import jieba.analyse
import xlwt # write to Excel
if __name__ == "__main__":
wbk = xlwt.Workbook(encoding='ascii')
sheet = wbk.add_sheet("wordCount")
word_lst = []
key_list = []
for line in open('./《都挺好》阿耐_分词后_outputs.txt', 'r', encoding='utf-8'):
item = line.strip('
\r').split('\t')
tags = jieba.analyse.extract_tags(item[0])
for t in tags:
word_lst.append(t)
word_dict = {}
with open("分词结果.txt", 'w') as wf2:
for item in word_lst:
if item not in word_dict:
word_dict[item] = 1
else:
word_dict[item] += 1
orderList = list(word_dict.values())
orderList.sort(reverse=True)
for i in range(len(orderList)):
for key in word_dict:
if word_dict[key] == orderList[i]:
wf2.write(key + ' ' + str(word_dict[key]) + '
')
key_list.append(key)
word_dict[key] = 0
for i in range(len(key_list)):
sheet.write(i, 1, label=orderList[i])
sheet.write(i, 0, label=key_list[i])
wbk.save('wordCount_all_lyrics.xls')Other participants also shared alternative methods, such as converting the segmentation results into a pandas DataFrame and using groupby to aggregate frequencies.
Conclusion
The article outlines a practical approach to processing Chinese text for word‑frequency analysis in Python, offering complete code, explanations, and multiple community‑contributed solutions to help readers solve similar problems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
