Tagged articles
3 articles
Page 1 of 1
AI Engineer Programming
AI Engineer Programming
Apr 26, 2026 · Artificial Intelligence

From Bag‑of‑Words to Semantics: How Embeddings Turn Meaning into Numbers (Part 2)

The article explains how embedding techniques encode semantic information into numeric vectors, covering Word2Vec and GloVe fundamentals, BERT anisotropy, SimCSE contrastive learning, alignment and uniformity metrics, ANN index structures such as HNSW, IVF and PQ, Matryoshka representation learning, practical deployment challenges, and evaluation best practices.

ANNBERTEmbedding
0 likes · 23 min read
From Bag‑of‑Words to Semantics: How Embeddings Turn Meaning into Numbers (Part 2)
Baidu Geek Talk
Baidu Geek Talk
Nov 16, 2022 · Artificial Intelligence

How Baidu’s Ernie‑SimCSE Uses Contrastive Learning to Crush Spam Promotion

This article explains how Baidu's anti‑spam team tackled large‑scale promotional spam on Baidu Zhidao by combining the Ernie pretrained model with SimCSE contrastive learning, detailing the problem background, traditional methods, text‑representation stages, the SimCSE approach, training pipeline, optimizations, and experimental results.

ErnieNLPSimCSE
0 likes · 15 min read
How Baidu’s Ernie‑SimCSE Uses Contrastive Learning to Crush Spam Promotion
58 Tech
58 Tech
Aug 5, 2021 · Artificial Intelligence

Exploration and Practice of Text Representation Algorithms in the 58 Security Scenario

This article presents a comprehensive study of text representation techniques—from weighted word‑vector methods to supervised SimBert and unsupervised contrastive learning models—applied to large‑scale unstructured data in 58's information‑security workflows, evaluating their effectiveness for classification and content‑recall tasks.

BERTInformation SecuritySimCSE
0 likes · 11 min read
Exploration and Practice of Text Representation Algorithms in the 58 Security Scenario