Artificial Intelligence 19 min read

NLP Basics: Word Embeddings, Word2Vec, and Hand‑crafted RNN Implementation in PyTorch

This article introduces word‑level representations—from one‑hot encoding to dense word embeddings via Word2Vec—explains cosine similarity, then walks through the structure, limitations, and PyTorch implementation of a vanilla RNN, including a custom forward function and verification against the library API.

Rare Earth Juejin Tech Community

Oct 19, 2023

NLP Basics: Word Embeddings, Word2Vec, and Hand‑crafted RNN Implementation in PyTorch

Preface

Hello everyone, I’m Xiao Su. In this series I will explore Natural Language Processing (NLP) from a beginner’s perspective, covering essential concepts such as word vectors, RNNs, LSTM/ELMo, and finally GPT/BERT.

Word Vectors

In NLP we must convert words into numeric forms that computers can process. One‑hot encoding represents each word as a high‑dimensional sparse vector, which wastes space and cannot capture relationships between words (cosine similarity is zero).

Effective word representations should be dense, low‑dimensional, and allow similarity computation. Word2Vec learns such embeddings by training a shallow neural network (CBOW or Skip‑gram) that produces an embedding matrix Q.

Example: the 50‑dimensional vector for the word "king" is shown, and similar vectors for "man" and "woman" illustrate semantic proximity. Cosine similarity is introduced as a measure of vector similarity.

Word Embedding can be visualized by mapping vector values to colors, demonstrating that related words occupy nearby regions in the embedding space.

RNN Model

Recurrent Neural Networks (RNNs) handle sequential data such as text. The basic RNN cell consists of a tanh layer that processes the current input and the previous hidden state.

RNNs suffer from the long‑distance dependency problem: they struggle to capture relationships between tokens that are far apart in the sequence.

Hand‑crafted RNN in PyTorch

First, the built‑in nn.RNN API is demonstrated:

import torch
import torch.nn as nn
bs, T = 2, 3  # batch size, sequence length
input_size, hidden_size = 2, 3
input = torch.randn(bs, T, input_size)
h_prev = torch.zeros(bs, hidden_size)
rnn = nn.RNN(input_size, hidden_size, batch_first=True)
rnn_output, state_final = rnn(input, h_prev.unsqueeze(0))

Next, a custom forward function rnn_forward is provided, which manually performs the matrix multiplications and tanh activation for each time step:

def rnn_forward(input, weight_ih, weight_hh, bias_ih, bias_hh, h_prev):
    bs, T, input_size = input.shape
    h_dim = weight_ih.shape[0]
    h_out = torch.zeros(bs, T, h_dim)
    for t in range(T):
        x = input[:, t, :].unsqueeze(2)
        w_ih_batch = weight_ih.unsqueeze(0).tile(bs, 1, 1)
        w_hh_batch = weight_hh.unsqueeze(0).tile(bs, 1, 1)
        w_times_x = torch.bmm(x.transpose(1, 2), w_ih_batch.transpose(1, 2)).transpose(1, 2).squeeze(-1)
        w_times_h = torch.bmm(h_prev.unsqueeze(2).transpose(1, 2), w_hh_batch.transpose(1, 2)).transpose(1, 2).squeeze(-1)
        h_prev = torch.tanh(w_times_x + bias_ih + w_times_h + bias_hh)
        h_out[:, t, :] = h_prev
    return h_out, h_prev.unsqueeze(0)

The custom implementation is verified by feeding the same parameters extracted from the built‑in RNN (weights and biases) and confirming that custom_rnn_output and custom_state_final match the library results.

References

1. The Illustrated Word2vec 2. Understanding LSTM Networks 3. Transformer notes: from Word2Vec & Seq2Seq to GPT & BERT 4. Understanding LSTM Networks (English) 5. The evolution of pre‑trained language models 6. PyTorch source tutorials and cutting‑edge AI algorithm reproductions

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NLP PyTorch Cosine Similarity RNN Word Embedding Word2Vec

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.