How LSTM Networks Achieve Long‑Term Memory: Mechanisms Explained

This article explains how Long Short‑Term Memory (LSTM) networks overcome the long‑term dependency problem of traditional recurrent neural networks by using gated cells that store and retrieve information over extended periods, and highlights their widespread applications in speech recognition, machine translation, image captioning, and beyond.

Hulu Beijing
Hulu Beijing
Hulu Beijing
How LSTM Networks Achieve Long‑Term Memory: Mechanisms Explained

Introduction

It is well known that recurrent neural networks (RNNs) suffer from the long‑term dependency problem: as the network depth or input sequence length grows, earlier information cannot be effectively utilized. To solve this, Sepp Hochreiter and Jürgen Schmidhuber introduced the Long Short‑Term Memory network (LSTM) in 1997.

LSTM can respond to short‑term inputs while also storing valuable information for long periods, thereby enhancing learning capability. Thanks to its remarkable memory ability, LSTM has become the most popular architecture and is successfully applied in speech recognition, machine translation, image captioning, and many other fields.

Question

How does LSTM achieve both long‑term and short‑term memory?

Analysis and Answer

The following figures illustrate the difference between a standard RNN and an LSTM, and show the internal gated structure of an LSTM cell.

Figure 1: (a) Ordinary recurrent neural network (b) LSTM.

Figure 2: Schematic of LSTM gated unit.

References

[1] Hochreiter S., Schmidhuber J. Long short‑term memory. Neural Computation, 1997, 9(8): 1735‑1780.

[2] Greff K., Srivastava R. K., Koutník J., et al. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10): 2222‑2232.

Next Issue Preview

Upcoming article will cover the basic structure and evolution of Graph Neural Networks.

Graph Neural Networks Introduction

At the beginning of 2019, a top‑ten technology outlook highlighted a trend relevant to this chapter: “Massive graph neural network systems will endow machines with common sense.” Graph neural networks (GNNs) have been discussed since 2005, with spatial‑domain and spectral‑domain models later merged in 2017. GNNs aim to define convolution operations on graph data analogous to CNNs on grid data.

Question

What is the graph spectrum? What is graph Fourier transform? What are spectral‑domain graph convolutional networks?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Artificial IntelligenceLSTMLong Short-Term MemoryRecurrent Neural Networks
Hulu Beijing
Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.