Tagged articles
13 articles
Page 1 of 1
Code DAO
Code DAO
Jan 15, 2022 · Artificial Intelligence

Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance

This article shows how the compress‑fasttext Python library can shrink a 7 GB fastText word‑embedding model to about 21 MB—a 300‑fold reduction—while preserving almost the same accuracy on downstream NLP tasks, and explains the underlying compression techniques, usage examples, and evaluation results.

NLPcompress-fasttextfastText
0 likes · 9 min read
Compressing Unsupervised fastText Models 300× Smaller with Near‑Identical NLP Performance
Code DAO
Code DAO
Dec 12, 2021 · Artificial Intelligence

How to Boost Text Analysis Accuracy on a 2‑Billion‑Word Corpus

This article explains practical techniques for improving NLP model accuracy on massive corpora, covering challenges of multi‑field text, word‑embedding choices, a fasttext‑based regression demo with book‑review data, feature engineering tricks, and a comparison with tf‑idf + LASSO.

NLPPythonWord2Vec
0 likes · 13 min read
How to Boost Text Analysis Accuracy on a 2‑Billion‑Word Corpus
FunTester
FunTester
Nov 11, 2020 · Artificial Intelligence

Unlocking NLP: From the Turing Test to Word Embeddings and Beyond

This article provides a comprehensive overview of natural language processing, tracing its origins from Turing's seminal test to modern techniques like regular expressions, word order importance, word embeddings, Word2vec, GloVe, and knowledge‑ and retrieval‑based chatbot methods.

GloVeKnowledge GraphsNLP
0 likes · 15 min read
Unlocking NLP: From the Turing Test to Word Embeddings and Beyond
Tencent Cloud Developer
Tencent Cloud Developer
Jul 8, 2020 · Artificial Intelligence

Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching

AlphaEmbedding builds a weighted graph linking Chinese words, sub‑words, characters and pinyin, then uses random‑walk‑based node2vec training to produce embeddings that capture orthographic and phonetic similarity, markedly improving recall and ranking for homophones, typos and OOV terms in enterprise search.

Chinese NLPgraph computingsemantic similarity
0 likes · 17 min read
Graph-Based Chinese Word Embedding (AlphaEmbedding) for Improved Text Matching
JD Retail Technology
JD Retail Technology
Aug 8, 2019 · Artificial Intelligence

From Word Representations to Sentiment Analysis – Talk by Dr. Feng Ao

On August 6, Dr. Feng Ao presented a comprehensive overview of the evolution of word representations and sentiment analysis, illustrating the shift from traditional linguistic features to modern pretrained models such as BERT and XLNet, and sharing practical convolutional experiments relevant to industry applications.

NLPSentiment Analysisartificial intelligence
0 likes · 4 min read
From Word Representations to Sentiment Analysis – Talk by Dr. Feng Ao
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 18, 2019 · Artificial Intelligence

From Word2Vec to Quick-Thought: A Complete Guide to Modern Embeddings

This article reviews the evolution of word and sentence embeddings, covering foundational theories like vector semantics and distributional hypothesis, practical models such as Word2Vec, GloVe, fastText, Skip‑Thought, Quick‑Thought, and evaluation techniques, while offering implementation tips and real‑world use cases.

GloVeNLPWord2Vec
0 likes · 21 min read
From Word2Vec to Quick-Thought: A Complete Guide to Modern Embeddings
DataFunTalk
DataFunTalk
Mar 13, 2019 · Artificial Intelligence

A Comprehensive Overview of NLP Development and Deep Learning Models

This article reviews the history of natural language processing, explains key deep‑learning models such as NNLM, Word2vec, CNN, RNN, attention mechanisms, and Transformers, and discusses their applications, future trends, and practical considerations in NLP tasks.

NLPTransformerattention
0 likes · 38 min read
A Comprehensive Overview of NLP Development and Deep Learning Models
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 25, 2018 · Artificial Intelligence

How cw2vec Beats Word2Vec: Leveraging Chinese Stroke N‑grams for Superior Word Embeddings

This article introduces cw2vec, a novel Chinese word‑embedding algorithm that exploits stroke‑level subword information, outlines its theoretical foundations, compares it with word2vec, GloVe, CWE and other models on multiple benchmarks, and demonstrates its superior performance across word similarity, analogy, text classification and named‑entity recognition tasks.

Chinese NLPDeep LearningUnsupervised Learning
0 likes · 14 min read
How cw2vec Beats Word2Vec: Leveraging Chinese Stroke N‑grams for Superior Word Embeddings
AntTech
AntTech
Jan 18, 2018 · Artificial Intelligence

cw2vec: Learning Chinese Word Embeddings with Stroke n-grams

The cw2vec paper, presented at AAAI 2018, introduces a Chinese word embedding method that leverages stroke n‑grams to capture character semantics, proposes a novel loss function, demonstrates consistent improvements over existing models across similarity, analogy, classification and NER tasks, and discusses real‑world AI applications.

AAAI 2018AI researchChinese NLP
0 likes · 7 min read
cw2vec: Learning Chinese Word Embeddings with Stroke n-grams
ITPUB
ITPUB
Dec 23, 2015 · Artificial Intelligence

How Computers Turn Words into Numbers: A Beginner’s Guide to Tokenization and Vector Similarity

This article explains how natural language processing stores word meanings as numeric vectors, builds token dictionaries, represents sentences as binary vectors, and uses dot‑product calculations to measure similarity, illustrating concepts with simple examples and highlighting current limitations and future directions.

NLPartificial intelligencetokenization
0 likes · 7 min read
How Computers Turn Words into Numbers: A Beginner’s Guide to Tokenization and Vector Similarity