Fundamentals 5 min read

How to Compute Word Frequencies in Python with Jieba and Export to Excel

This article demonstrates how to calculate word frequencies in Chinese text using Python's jieba library, collections.Counter, and xlwt to export results to Excel, providing complete code examples and alternative approaches for effective text analysis.

Python Crawling & Data Mining

Mar 27, 2023

How to Compute Word Frequencies in Python with Jieba and Export to Excel

Introduction

Hello, I am Pipi. A question about word‑frequency processing in Python was raised in a Python community group, and I share the solution here.

Below is the original code that simply counts word frequencies using collections.Counter:

# 统计词频
from collections import import Counter

wordcount = Counter(all_words)
word_count = wordcount.most_common(30)

frequence_list = []
for i in range(len(word_count)):
    frequence_list.append(word_count[i][0])
frequence_list

Implementation

A contributor provided a more complete solution that uses jieba for Chinese word segmentation, writes the frequencies to a text file, and then stores the results in an Excel workbook using xlwt.

#!/usr/bin/env python3
# -*- coding:utf-8 -*-

import sys
import jieba
import jieba.analyse
import xlwt  # write to Excel

if __name__ == "__main__":
    wbk = xlwt.Workbook(encoding='ascii')
    sheet = wbk.add_sheet("wordCount")
    word_lst = []
    key_list = []
    for line in open('./《都挺好》阿耐_分词后_outputs.txt', 'r', encoding='utf-8'):
        item = line.strip('
\r').split('\t')
        tags = jieba.analyse.extract_tags(item[0])
        for t in tags:
            word_lst.append(t)
    word_dict = {}
    with open("分词结果.txt", 'w') as wf2:
        for item in word_lst:
            if item not in word_dict:
                word_dict[item] = 1
            else:
                word_dict[item] += 1
        orderList = list(word_dict.values())
        orderList.sort(reverse=True)
        for i in range(len(orderList)):
            for key in word_dict:
                if word_dict[key] == orderList[i]:
                    wf2.write(key + ' ' + str(word_dict[key]) + '
')
                    key_list.append(key)
                    word_dict[key] = 0
    for i in range(len(key_list)):
        sheet.write(i, 1, label=orderList[i])
        sheet.write(i, 0, label=key_list[i])
    wbk.save('wordCount_all_lyrics.xls')

Other participants also shared alternative methods, such as converting the segmentation results into a pandas DataFrame and using groupby to aggregate frequencies.

Conclusion

The article outlines a practical approach to processing Chinese text for word‑frequency analysis in Python, offering complete code, explanations, and multiple community‑contributed solutions to help readers solve similar problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

text processing jieba Word Frequency

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.