Artificial Intelligence 8 min read

Understanding Word2Vec: Theory, Architecture, and Python Implementation

This article explains the Word2Vec algorithm, its CBOW and Skip‑Gram architectures, cosine similarity mathematics, training process with negative sampling, and provides a concise Python example using the gensim library.

Model Perspective

Aug 10, 2023

Understanding Word2Vec: Theory, Architecture, and Python Implementation

Word2Vec Introduction

Word2Vec is a popular word‑embedding algorithm proposed by Tomas Mikolov and his team in 2013. Its main goal is to map each word to a fixed‑size vector that captures semantic relationships.

We illustrate the concept with a simple example.

Suppose you have the following sentences:

Dog likes to play ball.

Cat likes to climb trees.

Dog and cat are pets.

Football is a popular sport.

Training Word2Vec on these sentences yields vector representations that reflect semantic similarity: vectors for "dog" and "cat" are close because both are pets; "play ball" and "football" are related through the concept of a ball; "play ball" and "climb trees" are less related.

Similarity is quantified using cosine similarity, which measures the cosine of the angle between two vectors and ranges from -1 (opposite) to 1 (identical).

High cosine similarity indicates that two vectors are semantically close.

Word2Vec learns these vectors by training on contextual information.

How Word2Vec Works

Word2Vec training relies on two main architectures: CBOW (Continuous Bag of Words) and Skip‑Gram.

CBOW (Continuous Bag of Words)

Predicts the target (center) word from its surrounding context words.

Input layer: one‑hot encoding of context words.

Output layer: probability distribution over the target word.

Skip‑Gram

Predicts surrounding context words from a given target word.

Input layer: one‑hot encoding of the target word.

Output layer: probability distribution over context words.

Neural Network Architecture

Word2Vec uses a shallow neural network, typically with a single hidden layer. After training, the weights from the input layer to the hidden layer become the word vectors.

Training process:

Initialize random weights for each word.

Slide a window over the text to extract target and context words, training with either CBOW or Skip‑Gram.

Optimize using softmax, back‑propagation, and gradient descent.

Extract the final word vectors from the input‑to‑hidden weights.

Negative Sampling

To avoid the high computational cost of full softmax over the entire vocabulary, Word2Vec employs negative sampling, updating only the positive sample and a small set of randomly chosen negative samples.

After training, semantically similar words are close in the vector space; for example, the relationship "king - man + woman ≈ queen" can be captured by vector arithmetic.

Word2Vec provides dense vector representations that capture both semantic and syntactic relationships.

Python Implementation

The gensim library makes it easy to train Word2Vec models. Below are the steps to train a Skip‑Gram model.

Install gensim: pip install gensim Train Word2Vec with gensim:

from gensim.models import Word2Vec
import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

sentences = [
    "I love machine learning",
    "Machine learning is fascinating",
    "Deep learning and machine learning are both subsets of AI"
]

# Tokenize
sentences = [sentence.split() for sentence in sentences]

# Train Word2Vec (sg=1 for Skip‑Gram)
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4, sg=1)
model.save("word2vec_example.model")

Using the model:

# Load model
model = Word2Vec.load("word2vec_example.model")

# Find words most similar to "machine"
similar_words = model.wv.most_similar("machine", topn=5)
print(similar_words)

You can also retrieve a specific word's vector:

vector = model.wv['machine']
print(vector)

This simplified example demonstrates how to train Word2Vec with gensim; for meaningful embeddings, a large corpus is required.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Python AI natural language processing word embeddings Word2Vec Gensim

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.