Deep Learning Approaches for Text Classification in Alipay Complaint Fraud Detection
This article reviews deep‑learning‑based text classification techniques—including TextCNN, BiGRU, Capsule Networks, Attention mechanisms, and the novel cw2vec embedding—applied to Alipay complaint fraud data, presents experimental comparisons, and discusses their advantages, challenges, and future directions.
With the rapid development of deep learning in image and speech domains, natural language processing (NLP) techniques based on deep learning have attracted increasing attention. This article introduces the task of text classification, a classic NLP scenario, and surveys several deep‑learning models used for it.
Background : In risk control, user complaints are crucial for understanding illicit activities. Large volumes of complaint texts are generated daily, and existing models (e.g., TextCNN, bidirectional GRU) only partially exploit these texts.
Related Work : Traditional machine‑learning methods rely on handcrafted features such as TF‑IDF, which suffer from sparsity and dimensionality issues. Word2vec introduced dense word embeddings, enabling semantic similarity calculations. CNNs (e.g., TextCNN) capture local n‑gram patterns, while recurrent networks (RNN, LSTM, GRU) model sequential dependencies. Capsule Networks and hierarchical attention mechanisms have also been explored for text classification.
CW2VEC : Cao et al. (AAAI 2018) proposed cw2vec, which extracts stroke‑n‑gram and pinyin‑n‑gram features from Chinese characters to generate richer embeddings. Experiments show cw2vec outperforms word2vec, GloVe, and CWE on public Chinese benchmarks.
Model Architectures : The study combines several architectures: TextCNN, bidirectional GRU (BiGRU), Capsule Network (with 10 capsules and 3 routing iterations), and attention‑based models (word‑level and hierarchical). All models use 128‑dimensional hidden states and various embeddings (word2vec, cw2vec, and their concatenation).
Experiments : A three‑class dataset (prohibited, non‑case, fraud) from Alipay complaint logs was split into training and test sets. Models were evaluated using ROC‑AUC, Accuracy, Precision, and Recall (focused on the fraud class). Results show that Capsule Networks consistently achieve higher Precision/Recall and AUC than TextCNN, while Attention models obtain higher Accuracy and AUC when using concatenated embeddings. cw2vec embeddings generally outperform word2vec, and the concatenated 600‑dimensional vectors further improve performance for some models.
Discussion & Outlook : Capsule Networks better capture positional, semantic, and syntactic information, whereas Attention mechanisms help RNNs focus on important text parts. The authors encourage further exploration of these techniques in broader scenarios and invite collaboration.
References : The article cites seminal works such as Mikolov et al. (2013) on word2vec, Kim (2014) on TextCNN, Liu et al. (2016) on RNNs, Cho et al. (2014) on GRU, Sabour et al. (2017) on Capsule Networks, Bahdanau et al. (2014) on Attention, and Yang et al. (2016) on Hierarchical Attention Networks, among others.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
