Artificial Intelligence 9 min read

How Recurrent Neural Networks Generate Text Representations and Tackle Gradient Problems

This article explains what Recurrent Neural Networks are, how they create text representations, why they suffer from gradient vanishing or explosion, and what architectural improvements such as LSTM, GRU, residual connections and gradient clipping can do to overcome these issues.

Hulu Beijing

Dec 7, 2017

How Recurrent Neural Networks Generate Text Representations and Tackle Gradient Problems

Scene Description

Recurrent Neural Network (RNN) is a mainstream deep learning model introduced in the 1980s to model sequential data. Unlike traditional feed‑forward networks that accept fixed‑length vectors, RNN processes variable‑length ordered inputs such as word sequences, audio streams, or video frames by maintaining an internal state that carries information from previous time steps.

Thanks to advances in computation and architectural improvements like LSTM, GRU, and attention mechanisms, RNNs have achieved breakthroughs in machine translation, sequence labeling, image captioning, video recommendation, chatbots, and even automatic music composition.

Problem Description

What is a Recurrent Neural Network and how can it be used to generate text representations?

Why do RNNs suffer from gradient vanishing or explosion, and what improvements address these issues?

Answer and Analysis

1. What is a Recurrent Neural Network and how can it generate text representations?

Traditional feed‑forward networks, including CNNs, take a fixed‑length vector (e.g., TF‑IDF) as input, which discards word order. RNNs process sequences step‑by‑step, encoding previously read information into a hidden state, thus preserving order and enabling the network to produce a compact abstract representation that can be used for classification or generation.

Figure below shows a typical RNN architecture.

In this diagram, f and g are activation functions, U is the input‑to‑hidden weight matrix, and W is the hidden‑to‑hidden transition matrix. For text classification, f can be Tanh or ReLU, and g is usually Softmax.

2. Why do RNNs suffer from gradient vanishing or explosion, and what are the remedies?

Training RNNs with Back‑Propagation Through Time (BPTT) unfolds the network over time and applies standard back‑propagation. Although the architecture can, in principle, capture long‑range dependencies, in practice gradients either shrink exponentially (vanishing) or grow exponentially (exploding) depending on the spectral radius of the Jacobian matrix.

When the largest eigenvalue of the Jacobian exceeds 1, gradients explode; when it is less than 1, they vanish. Gradient explosion can be mitigated by gradient clipping, which rescales gradients whose norm exceeds a threshold.

Gradient vanishing is harder to solve by simply adjusting learning; instead, architectural changes are required. Residual connections (ResNet) alleviate vanishing in feed‑forward nets, while gated units such as LSTM and GRU introduce mechanisms that preserve gradients over long time spans, effectively addressing the vanishing problem.

References

Liu, Pengfei, Xipeng Qiu, and Xuanjing Huang. “Recurrent neural network for text classification with multi‑task learning.” arXiv preprint arXiv:1605.05101 (2016).

https://en.wikipedia.org/wiki/Activation_function

He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short‑term memory.” Neural computation 9.8 (1997): 1735‑1780.

Chung, Junyoung, et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning LSTM RNN Recurrent Neural Network gradient vanishing Gradient Explosion

Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.