Artificial Intelligence 10 min read

Mapping Character Relationships in 'Heavenly Sword and Dragon Slaying' with Jieba, Word2Vec & NetworkX

This article demonstrates how to combine Jieba segmentation, Word2Vec embeddings, and NetworkX graph visualization to extract and analyze character relationships from the Chinese novel "Heavenly Sword and Dragon Slaying," detailing data preparation, model training, entity matrix construction, and network graph generation.

MaGe Linux Operations

Mar 22, 2018

Mapping Character Relationships in 'Heavenly Sword and Dragon Slaying' with Jieba, Word2Vec & NetworkX

Introduction

Natural language processing (NLP) presents challenges such as word segmentation, entity recognition, and relationship visualization. This case study applies Jieba, Word2Vec, and NetworkX to the novel Heavenly Sword and Dragon Slaying to explore character connections.

Data Preparation

The raw text of the novel is stored in a UTF‑8 file. A custom Jieba dictionary containing about 180 character names is created, along with a stop‑word list. The workflow includes:

Loading the novel text.

Loading the custom name dictionary.

Loading stop words.

Tools and Libraries

Python libraries: pandas, numpy, scipy, jieba, gensim (Word2Vec), networkx, matplotlib, pygraphviz.

Jieba for Chinese word segmentation.

Word2Vec for learning word vectors.

NetworkX for constructing and visualizing relationship graphs.

Implementation Steps

1. Text Pre‑processing

import numpy as np
import pandas as pd
import jieba
import jieba.posseg as posseg
%matplotlib inline

Custom dictionary and stop‑word files are loaded, the novel is read, and a function cut_join performs segmentation, removes stop words, and returns a comma‑separated token string.

2. Name Extraction

def extract_name(s):
    new_s = posseg.cut(s)
    words = []
    flags = []
    for k, v in new_s:
        if len(k) > 1:
            words.append(k)
            flags.append(v)
    full_wf["word"].extend(words)
    full_wf["flag"].extend(flags)
    return len(words)

The extracted names are saved, filtered by frequency (>20 occurrences), and combined with the external name list.

3. Word2Vec Training

from gensim.models import word2vec
num_features = 300
min_word_count = 20
num_workers = 4
context = 20
downsampling = 1e-3
model = word2vec.Word2Vec(sentences, workers=num_workers, size=num_features,
                            min_count=min_word_count, window=context,
                            sample=downsampling)
model.save('yttlj_model.txt')

4. Entity Relationship Matrix

Names are loaded into a DataFrame, and an empty square matrix ER is created with rows and columns representing characters. The matrix is populated with co‑occurrence counts and later with similarity scores from the Word2Vec model.

for i in entity["Name"].tolist():
    for j in entity["Name"]:
        try:
            relation = model.wv.similarity(i, j)
            ER.loc[i, j] = relation
            if i != j:
                ER.loc[j, i] = relation
        except:
            ER.loc[i, j] = 0
ER.to_hdf('ER.h5', 'ER')

5. Graph Visualization

NetworkX builds a graph where nodes are characters and edge weights are similarity scores. The graph is visualized using Matplotlib and Graphviz layouts.

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
for i, row in ER.iterrows():
    for j, weight in row.items():
        if weight > 0 and i != j:
            G.add_edge(i, j, weight=weight)
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_size=500, font_size=8)
plt.show()

Results

The analysis produces several network graphs illustrating character similarity and relationship structures, such as a full similarity graph of all characters, a multi‑center graph centered on the protagonist Zhang Wuji, and sub‑graphs highlighting specific identity clusters.

Key observations include:

Word2Vec similarity scores serve as edge weights for the social network.

NetworkX visualizations reveal hidden connections and the prominence of different aliases for the same character.

These techniques significantly reduce manual reading time by automatically extracting entity information and can be applied to various text‑analysis tasks.

Images

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python NLP Graph Visualization jieba networkx Word2Vec entity relationship

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.