How Knowledge Graphs and GNNs Boost HS Code Classification Accuracy
This article explores how integrating unstructured business data into structured knowledge graphs and applying graph neural networks can overcome deep‑learning bottlenecks in NLP, dramatically improving HS‑code product classification accuracy from around 60% to over 75% through richer reasoning and multimodal knowledge.
1. Background
Natural language is a symbolic description of the world; NLP processes highly abstract, discrete symbols, which makes deep learning face bottlenecks in reasoning and cognition. To break this limitation, the article proposes integrating unstructured business data into structured knowledge, using knowledge graphs as infrastructure and graph neural networks (GNN) for reasoning.
2. Knowledge Graph
Knowledge is distilled from meaningful data; a knowledge graph stores, represents, extracts, fuses, and reasons over knowledge. Building a KG requires six components: schema modeling, acquisition, fusion, storage, model mining, and application.
3. Graph Neural Network (GNN)
Graph data is irregular, making traditional convolution ineffective. GNNs, especially Graph Convolutional Networks (GCN), use a message‑passing framework composed of AGGREGATE, COMBINE, and READOUT steps. The basic formulas are illustrated below.
3.1 GCN Basic Principle
GCN consists of multiple graph‑convolution layers that aggregate first‑order neighbor information. The three core equations are:
3.2 AGGREGATE
AGGREGATE computes the layer‑wise feature aggregation by multiplying the adjacency matrix A with the node feature matrix X, then applying a weight matrix W and activation σ:
3.3 COMBINE
COMBINE concatenates the aggregated vector with the previous layer’s representation and passes the result through a dense (fully‑connected) layer.
3.4 READOUT
READOUT generates a graph‑level representation. Simple statistical methods (sum, max, average) are easy but lose information; learned pooling methods such as DIFFPOOL can preserve hierarchical structure.
4. Application to HS‑Code Product Classification
HS‑Code classification is a strict NLP task requiring precise reasoning. Traditional deep‑learning NLP models achieve only 59.3% accuracy because of noisy inputs, missing domain knowledge, and inability to perform logical calculations. By constructing a domain‑specific knowledge graph and applying a GCN, accuracy rises to 76%, a 16.7% improvement.
The KG schema models product names, declaration attributes, and their relationships; the GCN ingests word2vec‑based node embeddings and heterogeneous edge types, then performs message‑passing and READOUT to predict the correct HS‑code.
5. Experiments
Two experiments were conducted:
Comparing sum vs. average READOUT strategies for graph‑level pooling.
Comparing a simple star‑shaped KG (only product‑attribute edges) with a complex heterogeneous KG (additional attribute‑attribute and value‑attribute edges).
Results show that richer graph structures and more expressive READOUT improve classification accuracy.
6. Future Directions
How to extract and fuse large rule‑bases into knowledge graphs.
Combining rule‑based teacher networks with GNN student networks to guide learning.
Building multimodal knowledge graphs that incorporate images, audio, and other data types.
References
Inductive Representation Learning on Large Graphs, https://arxiv.org/abs/1706.02216
Hierarchical Graph Representation Learning with Differentiable Pooling, https://arxiv.org/abs/1806.08804
https://www.cnblogs.com/SivilTaram/p/graph_neural_network_3.html
https://zhuanlan.zhihu.com/p/68064309
https://zhuanlan.zhihu.com/p/37057052
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
