Part-of-Speech Tagging with Jieba in Python

This article explains how to perform Chinese part-of-speech tagging using the jieba.posseg library in Python, including loading stop words, extracting article content via Newspaper3k, applying precise mode segmentation, filtering, and presenting results in a pandas DataFrame.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Part-of-Speech Tagging with Jieba in Python

Part-of-speech tagging assigns a grammatical category to each word, which is useful for tasks such as keyword extraction, filtering, and analyzing word distribution in texts.

Jieba's POS tagging follows the ICTCLAS-compatible tag set. Below is a simple example that loads stop words, fetches an article using Newspaper3k, performs precise‑mode segmentation with jieba.posseg, filters out stop words, and stores the word‑tag pairs in a pandas DataFrame.

import newspaper
import pandas as pd
import jieba.posseg as pseg

# Load stop words
stopWords = [line.strip() for line in open('stopWord2.txt', encoding='gbk').readlines()]

# Get article (example)
article = newspaper.Article('https://finance.sina.com.cn/money/bank/bank_hydt/2019-02-25/doc-ihsxncvf7656807.shtml', language='zh')
article.download()
article.parse()
article.nlp()
article_words = "".join(article.keywords)

seg_list_exact = pseg.cut(article_words)  # precise mode
words_list = []  # store (word, tag)

for word in seg_list_exact:
    if word not in stopWords:
        words_list.append((word.word, word.flag))

words_pd = pd.DataFrame(words_list, columns=['word', 'type'])
print(words_pd.head())  # display
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonNLPjiebatext analysisPOS tagging
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.