Tagged articles

sequence modeling

23 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 10, 2026 · Artificial Intelligence

Bypassing BPTT: MIT’s SMT Puts RNNs on the Parallel Training Path

The article reviews MIT’s Supervised Memory Training (SMT) and its DAgger extension (DMT), which replace traditional back‑propagation through time with a Transformer‑based teacher, enabling one‑step memory supervision for RNNs, achieving parallel‑friendly training and superior long‑sequence performance on synthetic benchmarks, TinyStories and pixel‑wise image generation.

BPTTDMTRNN

0 likes · 10 min read

Bypassing BPTT: MIT’s SMT Puts RNNs on the Parallel Training Path

Machine Heart

Jun 9, 2026 · Artificial Intelligence

How Linear Attention Learns “Write‑Before‑Think”: Parallel Multi‑Step Memory Writes with PRISM

PRISM demonstrates that linear‑attention models can adopt a “write‑before‑think” paradigm by reconstructing the multi‑step step‑size × residual × direction iteration of Test‑Time Training, achieving Transformer‑level quality while delivering up to 174× higher throughput through parallel scan and fused kernels.

Linear AttentionPRISMParallel Scan

0 likes · 19 min read

How Linear Attention Learns “Write‑Before‑Think”: Parallel Multi‑Step Memory Writes with PRISM

Machine Heart

May 17, 2026 · Artificial Intelligence

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

The paper systematically studies Test‑Time Training (TTT) for vision, derives six design principles, and introduces ViT³—a pure TTT architecture that uses full‑batch internal training, a learning rate of 1.0, and lightweight SwiGLU‑Depthwise convolution modules, achieving state‑of‑the‑art linear‑complexity performance across classification, detection, segmentation and generation tasks.

Linear ComplexityTest-Time TrainingVision Transformers

0 likes · 14 min read

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

DataFunTalk

Feb 13, 2026 · Artificial Intelligence

HyFormer: Unified Sequence Modeling and Feature Interaction for Recommendations

HyFormer, a novel hybrid Transformer framework introduced by ByteDance’s TikTok search team, integrates sequence modeling and feature interaction into a unified alternating optimization process, enhancing representation power and scaling efficiency for ultra‑long user behavior sequences and high‑dimensional heterogeneous features, leading to significant offline and online performance gains.

AIHyFormerfeature interaction

0 likes · 12 min read

HyFormer: Unified Sequence Modeling and Feature Interaction for Recommendations

AI Cyberspace

Feb 11, 2026 · Artificial Intelligence

From RNNs to LSTMs and GRUs: A Hands‑On Guide to Sequence Modeling in PyTorch

This tutorial explains the nature of sequential data, why traditional feed‑forward networks struggle with it, and how recurrent architectures such as RNN, LSTM, and GRU capture temporal dependencies, complete with mathematical foundations, training algorithms, and full PyTorch implementations for sentiment analysis, text generation, and encoder‑decoder models.

Encoder-DecoderGRULSTM

0 likes · 57 min read

From RNNs to LSTMs and GRUs: A Hands‑On Guide to Sequence Modeling in PyTorch

Bighead's Algorithm Notes

Jan 21, 2026 · Artificial Intelligence

Lead–LagNet: Modeling Cross‑Series Lead‑Lag Dependencies for Time‑Series Forecasting

Lead–LagNet addresses three key limitations of existing graph neural networks for multivariate time‑series forecasting—loss of fine‑grained temporal detail, shared weight assumptions, and reduced interpretability—by introducing a sequence preprocessor with a global influence separator and subsequence detector, a subsequence dependency encoder, and a decoupled message‑passing mechanism, achieving superior performance on synthetic benchmarks and S&P 500 market data.

Financial Market PredictionLead‑Lag DependencyLead–LagNet

0 likes · 13 min read

Lead–LagNet: Modeling Cross‑Series Lead‑Lag Dependencies for Time‑Series Forecasting

PaperAgent

Dec 6, 2025 · Artificial Intelligence

How Titans and MIRAS Enable AI Models to Remember 1 Million Tokens

Google's Titans architecture and the MIRAS theoretical framework introduce a deep neural memory that lets large language models learn in real time, retain surprising information, and handle context windows of up to two million tokens, outperforming existing Transformers and linear RNNs on a range of benchmarks.

AI memoryLarge Language ModelsMIRAS framework

0 likes · 10 min read

How Titans and MIRAS Enable AI Models to Remember 1 Million Tokens

HyperAI Super Neural

Oct 20, 2025 · Artificial Intelligence

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

The paper introduces ERDM, an enhanced rolling diffusion model that integrates progressive noise scheduling and time‑loss weighting from EDM, demonstrates superior CRPS scores on Navier‑Stokes and ERA5 mid‑term weather forecasts, and achieves comparable accuracy with far lower computational cost.

AIDiffusion ModelsERDM

0 likes · 14 min read

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

Qborfy AI

Aug 7, 2025 · Artificial Intelligence

Understanding RNNs: From Memory Cells to Real‑World Applications

This article explains how recurrent neural networks (RNNs) add memory to neural models, details the gate mechanisms of LSTM and GRU, compares their structures and parameter counts, and illustrates their use in speech recognition, translation, stock prediction, and video generation, while highlighting practical insights and energy considerations.

AIDeep LearningGRU

0 likes · 5 min read

Understanding RNNs: From Memory Cells to Real‑World Applications

AI Frontier Lectures

Jul 24, 2025 · Artificial Intelligence

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

This article analyzes the fundamental differences between state space models (SSM) and Transformer architectures, highlighting their three core components, training efficiency, memory handling, tokenization impact, and empirical performance trade‑offs, and argues why SSMs can outperform Transformers on many sequence tasks.

AI ArchitectureTokenizationTransformers

0 likes · 19 min read

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

DeWu Technology

Jul 16, 2025 · Artificial Intelligence

How We Built a Scalable Offline‑Online Sequence Modeling System for Community Search

This article details the design of a community‑search pipeline that leverages long‑term user interaction sequences for CTR/CVR prediction, describes the global, online and offline architectures, enumerates the major performance and consistency challenges encountered, and presents the practical optimizations and future directions adopted to achieve reliable, high‑throughput sequence modeling.

Data Consistencyai-optimizationoffline processing

0 likes · 12 min read

How We Built a Scalable Offline‑Online Sequence Modeling System for Community Search

Amap Tech

Jun 30, 2025 · Artificial Intelligence

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

SeqGrowGraph introduces a novel chain-of-graph expansion framework that incrementally builds lane topology graphs using a Transformer-based autoregressive model, achieving state‑of‑the‑art performance on large autonomous‑driving datasets such as nuScenes and Argoverse 2 by accurately modeling complex road structures.

Transformerautonomous drivingcomputer vision

0 likes · 10 min read

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

Tencent Advertising Technology

Oct 17, 2024 · Artificial Intelligence

Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec

This article presents a comprehensive solution for heterogeneous long‑behavior sequence modeling in advertising recommendation, introducing the TIN backbone, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec, along with platform‑level optimizations that enable million‑scale sequences while delivering significant online performance gains.

AdvertisingDeep LearningPerformance Optimization

0 likes · 15 min read

Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec

NewBeeNLP

Mar 4, 2024 · Artificial Intelligence

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

This article presents a GitHub‑hosted collection of 25 recent research papers on Mamba and its variants, summarizing each work’s core contributions across sequence modeling, vision, medical imaging, graph analysis, and multimodal tasks, and highlighting their performance gains over prior methods.

Deep LearningMambacomputer vision

0 likes · 13 min read

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

Kuaishou Tech

Apr 26, 2023 · Artificial Intelligence

Dual-Interest Decomposition Head Attention for Sequence Recommendation with Positive and Negative Feedback

The paper proposes a dual‑interest decomposition head‑attention model that uses a feedback‑aware encoding layer, a factorized head attention mechanism, and separate positive/negative interest towers to improve sequence recommendation performance on short‑video and e‑commerce datasets.

AITransformerfeedback

0 likes · 8 min read

Dual-Interest Decomposition Head Attention for Sequence Recommendation with Positive and Negative Feedback

Model Perspective

Aug 15, 2022 · Artificial Intelligence

Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras

This article introduces recurrent neural networks (RNNs) and their ability to handle sequential data, explains the limitations of vanilla RNNs, presents the LSTM architecture with its gates, and provides complete Keras code for data loading, model building, and training both vanilla RNN and LSTM models.

Deep LearningKerasLSTM

0 likes · 5 min read

Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras

DataFunTalk

Apr 28, 2022 · Artificial Intelligence

Sequence Feature Modeling in Large-Scale Recommendation Systems and Fast Deployment with EasyRec

This article reviews the evolution of behavior‑sequence modeling methods—from pooling and target‑attention to RNN, capsule, transformer, and graph neural networks—explains their industrial relevance, and demonstrates how to quickly apply these techniques in the EasyRec framework with practical configuration examples.

DINEasyRecbehavior features

0 likes · 21 min read

Sequence Feature Modeling in Large-Scale Recommendation Systems and Fast Deployment with EasyRec

DataFunSummit

Apr 11, 2022 · Artificial Intelligence

Exploring QQ Music Recall Algorithms: Knowledge‑Graph Fusion, Sequence & Multi‑Interest Modeling, Audio Recall, and Federated Learning

This article presents a comprehensive overview of QQ Music's recall pipeline, detailing business characteristics, challenges such as noisy user behavior and cold‑start, and four major solutions—including knowledge‑graph‑enhanced recall, sequence‑based and multi‑interest modeling, audio‑based recall, and federated learning—along with practical insights and Q&A.

Audio EmbeddingKnowledge GraphRecommendation Systems

0 likes · 19 min read

Exploring QQ Music Recall Algorithms: Knowledge‑Graph Fusion, Sequence & Multi‑Interest Modeling, Audio Recall, and Federated Learning

DataFunTalk

Feb 10, 2022 · Artificial Intelligence

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details the technical evolution of Kuaishou's short‑video recommendation pipeline, focusing on sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking, and explains how transformer‑based models, generator‑evaluator frameworks, and reinforcement‑learning strategies are employed to maximize overall sequence value, user engagement, and revenue.

KuaishouRe‑rankingmulti-content mixing

0 likes · 15 min read

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

58 Tech

Apr 12, 2021 · Artificial Intelligence

Deep Interest Modeling and Multi‑Channel Recommendation for 58.com Home Page

This article presents the challenges of large‑scale home‑page recommendation at 58.com, describes how behavior‑sequence models such as DIN, DIEN and Transformer are applied and evolved into double‑channel and multi‑channel deep interest architectures, and details offline and online performance optimizations that yielded significant gains in click‑through and conversion rates.

AIlarge-scale systemsrecommendation

0 likes · 19 min read

Deep Interest Modeling and Multi‑Channel Recommendation for 58.com Home Page

DataFunTalk

Apr 3, 2021 · Artificial Intelligence

A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

User behavior sequence modeling, crucial for search and recommendation advertising ranking, has evolved from simple pooling to attention, RNN, capsule, and Transformer architectures, with industrial applications across e‑commerce, social, video, and music platforms, and future directions include time‑aware, multi‑dimensional, and self‑supervised approaches.

Deep LearningRecommendation SystemsTransformer

0 likes · 24 min read

A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

Hulu Beijing

Dec 12, 2017 · Artificial Intelligence

How LSTM Achieves Long‑Term Memory: Gates, Activations & Variants Explained

This article explains how LSTM networks overcome RNN limitations by using input, forget, and output gates with sigmoid and tanh activations, describes the core update equations, discusses alternative activation functions and hard‑gate variants, and provides references for deeper study.

LSTMRNNactivation functions

0 likes · 10 min read

How LSTM Achieves Long‑Term Memory: Gates, Activations & Variants Explained

Qunar Tech Salon

Apr 27, 2017 · Artificial Intelligence

LSTM‑Jump: Learning to Skim Text for Faster Sequence Modeling

The paper introduces LSTM‑Jump, a reinforcement‑learning‑trained LSTM variant that can dynamically skip irrelevant tokens, achieving up to six‑fold speed‑ups over standard sequential LSTMs while maintaining or improving accuracy on various NLP tasks such as sentiment analysis, document classification, and question answering.

LSTMNLPreinforcement learning

0 likes · 7 min read

LSTM‑Jump: Learning to Skim Text for Faster Sequence Modeling