How LSTM Networks Achieve Long‑Term Memory: Mechanisms Explained
This article explains how Long Short‑Term Memory (LSTM) networks overcome the long‑term dependency problem of traditional recurrent neural networks by using gated cells that store and retrieve information over extended periods, and highlights their widespread applications in speech recognition, machine translation, image captioning, and beyond.
Introduction
It is well known that recurrent neural networks (RNNs) suffer from the long‑term dependency problem: as the network depth or input sequence length grows, earlier information cannot be effectively utilized. To solve this, Sepp Hochreiter and Jürgen Schmidhuber introduced the Long Short‑Term Memory network (LSTM) in 1997.
LSTM can respond to short‑term inputs while also storing valuable information for long periods, thereby enhancing learning capability. Thanks to its remarkable memory ability, LSTM has become the most popular architecture and is successfully applied in speech recognition, machine translation, image captioning, and many other fields.
Question
How does LSTM achieve both long‑term and short‑term memory?
Analysis and Answer
The following figures illustrate the difference between a standard RNN and an LSTM, and show the internal gated structure of an LSTM cell.
Figure 1: (a) Ordinary recurrent neural network (b) LSTM.
Figure 2: Schematic of LSTM gated unit.
References
[1] Hochreiter S., Schmidhuber J. Long short‑term memory. Neural Computation, 1997, 9(8): 1735‑1780.
[2] Greff K., Srivastava R. K., Koutník J., et al. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10): 2222‑2232.
Next Issue Preview
Upcoming article will cover the basic structure and evolution of Graph Neural Networks.
Graph Neural Networks Introduction
At the beginning of 2019, a top‑ten technology outlook highlighted a trend relevant to this chapter: “Massive graph neural network systems will endow machines with common sense.” Graph neural networks (GNNs) have been discussed since 2005, with spatial‑domain and spectral‑domain models later merged in 2017. GNNs aim to define convolution operations on graph data analogous to CNNs on grid data.
Question
What is the graph spectrum? What is graph Fourier transform? What are spectral‑domain graph convolutional networks?
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Hulu Beijing
Follow Hulu's official WeChat account for the latest company updates and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
