Mapping Character Relationships in 'Heavenly Sword and Dragon Slaying' with Jieba, Word2Vec & NetworkX
This article demonstrates how to combine Jieba segmentation, Word2Vec embeddings, and NetworkX graph visualization to extract and analyze character relationships from the Chinese novel "Heavenly Sword and Dragon Slaying," detailing data preparation, model training, entity matrix construction, and network graph generation.
Introduction
Natural language processing (NLP) presents challenges such as word segmentation, entity recognition, and relationship visualization. This case study applies Jieba, Word2Vec, and NetworkX to the novel Heavenly Sword and Dragon Slaying to explore character connections.
Data Preparation
The raw text of the novel is stored in a UTF‑8 file. A custom Jieba dictionary containing about 180 character names is created, along with a stop‑word list. The workflow includes:
Loading the novel text.
Loading the custom name dictionary.
Loading stop words.
Tools and Libraries
Python libraries: pandas, numpy, scipy, jieba, gensim (Word2Vec), networkx, matplotlib, pygraphviz.
Jieba for Chinese word segmentation.
Word2Vec for learning word vectors.
NetworkX for constructing and visualizing relationship graphs.
Implementation Steps
1. Text Pre‑processing
import numpy as np
import pandas as pd
import jieba
import jieba.posseg as posseg
%matplotlib inlineCustom dictionary and stop‑word files are loaded, the novel is read, and a function cut_join performs segmentation, removes stop words, and returns a comma‑separated token string.
2. Name Extraction
def extract_name(s):
new_s = posseg.cut(s)
words = []
flags = []
for k, v in new_s:
if len(k) > 1:
words.append(k)
flags.append(v)
full_wf["word"].extend(words)
full_wf["flag"].extend(flags)
return len(words)The extracted names are saved, filtered by frequency (>20 occurrences), and combined with the external name list.
3. Word2Vec Training
from gensim.models import word2vec
num_features = 300
min_word_count = 20
num_workers = 4
context = 20
downsampling = 1e-3
model = word2vec.Word2Vec(sentences, workers=num_workers, size=num_features,
min_count=min_word_count, window=context,
sample=downsampling)
model.save('yttlj_model.txt')4. Entity Relationship Matrix
Names are loaded into a DataFrame, and an empty square matrix ER is created with rows and columns representing characters. The matrix is populated with co‑occurrence counts and later with similarity scores from the Word2Vec model.
for i in entity["Name"].tolist():
for j in entity["Name"]:
try:
relation = model.wv.similarity(i, j)
ER.loc[i, j] = relation
if i != j:
ER.loc[j, i] = relation
except:
ER.loc[i, j] = 0
ER.to_hdf('ER.h5', 'ER')5. Graph Visualization
NetworkX builds a graph where nodes are characters and edge weights are similarity scores. The graph is visualized using Matplotlib and Graphviz layouts.
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
for i, row in ER.iterrows():
for j, weight in row.items():
if weight > 0 and i != j:
G.add_edge(i, j, weight=weight)
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_size=500, font_size=8)
plt.show()Results
The analysis produces several network graphs illustrating character similarity and relationship structures, such as a full similarity graph of all characters, a multi‑center graph centered on the protagonist Zhang Wuji, and sub‑graphs highlighting specific identity clusters.
Key observations include:
Word2Vec similarity scores serve as edge weights for the social network.
NetworkX visualizations reveal hidden connections and the prominence of different aliases for the same character.
These techniques significantly reduce manual reading time by automatically extracting entity information and can be applied to various text‑analysis tasks.
Images
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
