Artificial Intelligence 7 min read

LSTM‑Jump: Learning to Skim Text for Faster Sequence Modeling

The paper introduces LSTM‑Jump, a reinforcement‑learning‑trained LSTM variant that can dynamically skip irrelevant tokens, achieving up to six‑fold speed‑ups over standard sequential LSTMs while maintaining or improving accuracy on various NLP tasks such as sentiment analysis, document classification, and question answering.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
LSTM‑Jump: Learning to Skim Text for Faster Sequence Modeling

1 Introduction

In many NLP sub‑fields—document classification, machine translation, and QA—recurrent neural networks (RNNs) have shown great promise, but they typically read every token sequentially, which is slow for long texts. This work proposes a method that allows the model to jump over unimportant parts of the input, reducing computation while preserving performance.

The underlying model is an LSTM that, after reading a small number of tokens, decides how many tokens to skip. A policy‑gradient reinforcement learning approach is used to train the discrete jump decisions. Experiments on four tasks (numeric prediction, sentiment analysis, news classification, and QA) show that the jumping LSTM can be up to six times faster than a standard sequential LSTM with comparable or better accuracy.

2 Method

We describe the LSTM‑Jump architecture. Before training, we fix the maximum number of jumps K, the number of tokens read between jumps R, and the maximum jump size K. K is a fixed hyper‑parameter, while N and R can vary during training and testing. Notation d₁:p denotes the sequence d₁, d₂, …, dₚ.

2.1 Model Overview

The model (illustrated in Figure 1) is built on a standard LSTM. At each step the LSTM reads R tokens, produces a hidden state, and feeds it to a softmax that predicts a distribution over possible jump lengths 1…K. A jump length κ is sampled from this distribution, and the next token to read becomes x_{R+κ}. The process repeats until one of three termination conditions occurs:

a) the jump softmax samples a 0;

b) the number of jumps exceeds N;

c) the network reaches the final token x_T.

After termination, the final hidden state is used for the downstream task: a classification softmax for tasks in Sections 3.1–3.3, or a similarity computation for the QA task in Section 3.4.

3 Experimental Results

Tables 1‑7 (shown as images) report task and dataset statistics, as well as test accuracy and runtime for various jump settings. Across synthetic numeric problems, sentiment analysis (IMDB), news classification (Rotten Tomatoes), and the Children’s Book Test, LSTM‑Jump consistently reduces processing time while achieving accuracy comparable to or better than the baseline sequential LSTM.

NLPreinforcement learningLSTMsequence modelingtext skimming
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.