Tag

WordPiece

1 views collected around this technical thread.

Code Mala Tang
Code Mala Tang
Mar 27, 2025 · Artificial Intelligence

How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?

This article explains the fundamentals, workflows, examples, and trade‑offs of three major subword tokenization algorithms—Byte Pair Encoding, WordPiece, and SentencePiece—helping practitioners choose the right method for their large language model pipelines.

BPENLPSentencePiece
0 likes · 12 min read
How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 20, 2023 · Artificial Intelligence

BERT Model Overview: Inputs, Encoder, Fine‑tuning, and Variants

This article explains BERT's WordPiece tokenization, input embeddings (token, segment, and position embeddings), encoder architecture for Base and Large models, fine‑tuning strategies for various NLP tasks, and introduces popular variants such as RoBERTa and ALBERT.

BERTFine‑tuningNLP
0 likes · 12 min read
BERT Model Overview: Inputs, Encoder, Fine‑tuning, and Variants