Artificial Intelligence 8 min read

Understanding Word2Vec: Theory, Architecture, and Python Implementation

This article explains the Word2Vec algorithm, its CBOW and Skip‑Gram architectures, cosine similarity mathematics, training process with negative sampling, and provides a concise Python example using the gensim library.

Model Perspective
Model Perspective
Model Perspective
Understanding Word2Vec: Theory, Architecture, and Python Implementation

Word2Vec Introduction

Word2Vec is a popular word‑embedding algorithm proposed by Tomas Mikolov and his team in 2013. Its main goal is to map each word to a fixed‑size vector that captures semantic relationships.

We illustrate the concept with a simple example.

Suppose you have the following sentences:

Dog likes to play ball.

Cat likes to climb trees.

Dog and cat are pets.

Football is a popular sport.

Training Word2Vec on these sentences yields vector representations that reflect semantic similarity: vectors for "dog" and "cat" are close because both are pets; "play ball" and "football" are related through the concept of a ball; "play ball" and "climb trees" are less related.

Similarity is quantified using cosine similarity, which measures the cosine of the angle between two vectors and ranges from -1 (opposite) to 1 (identical).

High cosine similarity indicates that two vectors are semantically close.

Word2Vec learns these vectors by training on contextual information.

How Word2Vec Works

Word2Vec training relies on two main architectures: CBOW (Continuous Bag of Words) and Skip‑Gram.

CBOW (Continuous Bag of Words)

Predicts the target (center) word from its surrounding context words.

Input layer: one‑hot encoding of context words.

Output layer: probability distribution over the target word.

Skip‑Gram

Predicts surrounding context words from a given target word.

Input layer: one‑hot encoding of the target word.

Output layer: probability distribution over context words.

Neural Network Architecture

Word2Vec uses a shallow neural network, typically with a single hidden layer. After training, the weights from the input layer to the hidden layer become the word vectors.

Training process:

Initialize random weights for each word.

Slide a window over the text to extract target and context words, training with either CBOW or Skip‑Gram.

Optimize using softmax, back‑propagation, and gradient descent.

Extract the final word vectors from the input‑to‑hidden weights.

Negative Sampling

To avoid the high computational cost of full softmax over the entire vocabulary, Word2Vec employs negative sampling, updating only the positive sample and a small set of randomly chosen negative samples.

After training, semantically similar words are close in the vector space; for example, the relationship "king - man + woman ≈ queen" can be captured by vector arithmetic.

Word2Vec provides dense vector representations that capture both semantic and syntactic relationships.

Python Implementation

The gensim library makes it easy to train Word2Vec models. Below are the steps to train a Skip‑Gram model.

Install gensim:

<code>pip install gensim</code>

Train Word2Vec with gensim:

<code>from gensim.models import Word2Vec
import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

sentences = [
    "I love machine learning",
    "Machine learning is fascinating",
    "Deep learning and machine learning are both subsets of AI"
]

# Tokenize
sentences = [sentence.split() for sentence in sentences]

# Train Word2Vec (sg=1 for Skip‑Gram)
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4, sg=1)
model.save("word2vec_example.model")
</code>

Using the model:

<code># Load model
model = Word2Vec.load("word2vec_example.model")

# Find words most similar to "machine"
similar_words = model.wv.most_similar("machine", topn=5)
print(similar_words)
</code>

You can also retrieve a specific word's vector:

<code>vector = model.wv['machine']
print(vector)
</code>

This simplified example demonstrates how to train Word2Vec with gensim; for meaningful embeddings, a large corpus is required.

machine learningpythonAINatural Language Processingword embeddingsWord2Vecgensim
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.