Tagged articles
21 articles
Page 1 of 1
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)

The paper systematically studies Test‑Time Training (TTT) for vision, derives six design principles, and introduces ViT³—a pure TTT architecture that uses full‑batch internal training, a learning rate of 1.0, and lightweight SwiGLU‑Depthwise convolution modules, achieving state‑of‑the‑art linear‑complexity performance across classification, detection, segmentation and generation tasks.

Computer VisionLinear ComplexitySequence Modeling
0 likes · 14 min read
ViT³: Vision Test‑Time Training Architecture Breaking Transformer Complexity (CVPR 2026 Oral)
DataFunTalk
DataFunTalk
Feb 13, 2026 · Artificial Intelligence

HyFormer: Unified Sequence Modeling and Feature Interaction for Recommendations

HyFormer, a novel hybrid Transformer framework introduced by ByteDance’s TikTok search team, integrates sequence modeling and feature interaction into a unified alternating optimization process, enhancing representation power and scaling efficiency for ultra‑long user behavior sequences and high‑dimensional heterogeneous features, leading to significant offline and online performance gains.

HyFormerSequence Modelingai
0 likes · 12 min read
HyFormer: Unified Sequence Modeling and Feature Interaction for Recommendations
AI Cyberspace
AI Cyberspace
Feb 11, 2026 · Artificial Intelligence

From RNNs to LSTMs and GRUs: A Hands‑On Guide to Sequence Modeling in PyTorch

This tutorial explains the nature of sequential data, why traditional feed‑forward networks struggle with it, and how recurrent architectures such as RNN, LSTM, and GRU capture temporal dependencies, complete with mathematical foundations, training algorithms, and full PyTorch implementations for sentiment analysis, text generation, and encoder‑decoder models.

Encoder-DecoderGRULSTM
0 likes · 57 min read
From RNNs to LSTMs and GRUs: A Hands‑On Guide to Sequence Modeling in PyTorch
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Jan 21, 2026 · Artificial Intelligence

Lead–LagNet: Modeling Cross‑Series Lead‑Lag Dependencies for Time‑Series Forecasting

Lead–LagNet addresses three key limitations of existing graph neural networks for multivariate time‑series forecasting—loss of fine‑grained temporal detail, shared weight assumptions, and reduced interpretability—by introducing a sequence preprocessor with a global influence separator and subsequence detector, a subsequence dependency encoder, and a decoupled message‑passing mechanism, achieving superior performance on synthetic benchmarks and S&P 500 market data.

Financial Market PredictionLead‑Lag DependencyLead–LagNet
0 likes · 13 min read
Lead–LagNet: Modeling Cross‑Series Lead‑Lag Dependencies for Time‑Series Forecasting
PaperAgent
PaperAgent
Dec 6, 2025 · Artificial Intelligence

How Titans and MIRAS Enable AI Models to Remember 1 Million Tokens

Google's Titans architecture and the MIRAS theoretical framework introduce a deep neural memory that lets large language models learn in real time, retain surprising information, and handle context windows of up to two million tokens, outperforming existing Transformers and linear RNNs on a range of benchmarks.

AI memoryLarge Language ModelsMIRAS framework
0 likes · 10 min read
How Titans and MIRAS Enable AI Models to Remember 1 Million Tokens
HyperAI Super Neural
HyperAI Super Neural
Oct 20, 2025 · Artificial Intelligence

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

The paper introduces ERDM, an enhanced rolling diffusion model that integrates progressive noise scheduling and time‑loss weighting from EDM, demonstrates superior CRPS scores on Navier‑Stokes and ERA5 mid‑term weather forecasts, and achieves comparable accuracy with far lower computational cost.

Diffusion ModelsERDMNeurIPS 2025
0 likes · 14 min read
How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)
Qborfy AI
Qborfy AI
Aug 7, 2025 · Artificial Intelligence

Understanding RNNs: From Memory Cells to Real‑World Applications

This article explains how recurrent neural networks (RNNs) add memory to neural models, details the gate mechanisms of LSTM and GRU, compares their structures and parameter counts, and illustrates their use in speech recognition, translation, stock prediction, and video generation, while highlighting practical insights and energy considerations.

Deep LearningGRULSTM
0 likes · 5 min read
Understanding RNNs: From Memory Cells to Real‑World Applications
AI Frontier Lectures
AI Frontier Lectures
Jul 24, 2025 · Artificial Intelligence

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

This article analyzes the fundamental differences between state space models (SSM) and Transformer architectures, highlighting their three core components, training efficiency, memory handling, tokenization impact, and empirical performance trade‑offs, and argues why SSMs can outperform Transformers on many sequence tasks.

AI ArchitectureSequence ModelingTransformers
0 likes · 19 min read
State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling
DeWu Technology
DeWu Technology
Jul 16, 2025 · Artificial Intelligence

How We Built a Scalable Offline‑Online Sequence Modeling System for Community Search

This article details the design of a community‑search pipeline that leverages long‑term user interaction sequences for CTR/CVR prediction, describes the global, online and offline architectures, enumerates the major performance and consistency challenges encountered, and presents the practical optimizations and future directions adopted to achieve reliable, high‑throughput sequence modeling.

AI OptimizationData ConsistencySequence Modeling
0 likes · 12 min read
How We Built a Scalable Offline‑Online Sequence Modeling System for Community Search
Amap Tech
Amap Tech
Jun 30, 2025 · Artificial Intelligence

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

SeqGrowGraph introduces a novel chain-of-graph expansion framework that incrementally builds lane topology graphs using a Transformer-based autoregressive model, achieving state‑of‑the‑art performance on large autonomous‑driving datasets such as nuScenes and Argoverse 2 by accurately modeling complex road structures.

Computer VisionSequence ModelingTransformer
0 likes · 10 min read
SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology
Tencent Advertising Technology
Tencent Advertising Technology
Oct 17, 2024 · Artificial Intelligence

Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec

This article presents a comprehensive solution for heterogeneous long‑behavior sequence modeling in advertising recommendation, introducing the TIN backbone, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec, along with platform‑level optimizations that enable million‑scale sequences while delivering significant online performance gains.

AdvertisingDeep LearningSequence Modeling
0 likes · 15 min read
Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec
NewBeeNLP
NewBeeNLP
Mar 4, 2024 · Artificial Intelligence

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

This article presents a GitHub‑hosted collection of 25 recent research papers on Mamba and its variants, summarizing each work’s core contributions across sequence modeling, vision, medical imaging, graph analysis, and multimodal tasks, and highlighting their performance gains over prior methods.

Computer VisionDeep LearningMamba
0 likes · 13 min read
A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations
Kuaishou Tech
Kuaishou Tech
Apr 26, 2023 · Artificial Intelligence

Dual-Interest Decomposition Head Attention for Sequence Recommendation with Positive and Negative Feedback

The paper proposes a dual‑interest decomposition head‑attention model that uses a feedback‑aware encoding layer, a factorized head attention mechanism, and separate positive/negative interest towers to improve sequence recommendation performance on short‑video and e‑commerce datasets.

FeedbackSequence ModelingTransformer
0 likes · 8 min read
Dual-Interest Decomposition Head Attention for Sequence Recommendation with Positive and Negative Feedback
Model Perspective
Model Perspective
Aug 15, 2022 · Artificial Intelligence

Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras

This article introduces recurrent neural networks (RNNs) and their ability to handle sequential data, explains the limitations of vanilla RNNs, presents the LSTM architecture with its gates, and provides complete Keras code for data loading, model building, and training both vanilla RNN and LSTM models.

Deep LearningKerasLSTM
0 likes · 5 min read
Understanding Recurrent Neural Networks: From Vanilla RNN to LSTM with Keras
DataFunTalk
DataFunTalk
Apr 28, 2022 · Artificial Intelligence

Sequence Feature Modeling in Large-Scale Recommendation Systems and Fast Deployment with EasyRec

This article reviews the evolution of behavior‑sequence modeling methods—from pooling and target‑attention to RNN, capsule, transformer, and graph neural networks—explains their industrial relevance, and demonstrates how to quickly apply these techniques in the EasyRec framework with practical configuration examples.

DINEasyRecSequence Modeling
0 likes · 21 min read
Sequence Feature Modeling in Large-Scale Recommendation Systems and Fast Deployment with EasyRec
DataFunSummit
DataFunSummit
Apr 11, 2022 · Artificial Intelligence

Exploring QQ Music Recall Algorithms: Knowledge‑Graph Fusion, Sequence & Multi‑Interest Modeling, Audio Recall, and Federated Learning

This article presents a comprehensive overview of QQ Music's recall pipeline, detailing business characteristics, challenges such as noisy user behavior and cold‑start, and four major solutions—including knowledge‑graph‑enhanced recall, sequence‑based and multi‑interest modeling, audio‑based recall, and federated learning—along with practical insights and Q&A.

Audio EmbeddingRecommendation SystemsSequence Modeling
0 likes · 19 min read
Exploring QQ Music Recall Algorithms: Knowledge‑Graph Fusion, Sequence & Multi‑Interest Modeling, Audio Recall, and Federated Learning
DataFunTalk
DataFunTalk
Feb 10, 2022 · Artificial Intelligence

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details the technical evolution of Kuaishou's short‑video recommendation pipeline, focusing on sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking, and explains how transformer‑based models, generator‑evaluator frameworks, and reinforcement‑learning strategies are employed to maximize overall sequence value, user engagement, and revenue.

KuaishouReinforcement LearningSequence Modeling
0 likes · 15 min read
Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System
58 Tech
58 Tech
Apr 12, 2021 · Artificial Intelligence

Deep Interest Modeling and Multi‑Channel Recommendation for 58.com Home Page

This article presents the challenges of large‑scale home‑page recommendation at 58.com, describes how behavior‑sequence models such as DIN, DIEN and Transformer are applied and evolved into double‑channel and multi‑channel deep interest architectures, and details offline and online performance optimizations that yielded significant gains in click‑through and conversion rates.

Sequence Modelingailarge-scale systems
0 likes · 19 min read
Deep Interest Modeling and Multi‑Channel Recommendation for 58.com Home Page
DataFunTalk
DataFunTalk
Apr 3, 2021 · Artificial Intelligence

A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

User behavior sequence modeling, crucial for search and recommendation advertising ranking, has evolved from simple pooling to attention, RNN, capsule, and Transformer architectures, with industrial applications across e‑commerce, social, video, and music platforms, and future directions include time‑aware, multi‑dimensional, and self‑supervised approaches.

Deep LearningRecommendation SystemsSequence Modeling
0 likes · 24 min read
A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising
Hulu Beijing
Hulu Beijing
Dec 12, 2017 · Artificial Intelligence

How LSTM Achieves Long‑Term Memory: Gates, Activations & Variants Explained

This article explains how LSTM networks overcome RNN limitations by using input, forget, and output gates with sigmoid and tanh activations, describes the core update equations, discusses alternative activation functions and hard‑gate variants, and provides references for deeper study.

LSTMRNNSequence Modeling
0 likes · 10 min read
How LSTM Achieves Long‑Term Memory: Gates, Activations & Variants Explained
Qunar Tech Salon
Qunar Tech Salon
Apr 27, 2017 · Artificial Intelligence

LSTM‑Jump: Learning to Skim Text for Faster Sequence Modeling

The paper introduces LSTM‑Jump, a reinforcement‑learning‑trained LSTM variant that can dynamically skip irrelevant tokens, achieving up to six‑fold speed‑ups over standard sequential LSTMs while maintaining or improving accuracy on various NLP tasks such as sentiment analysis, document classification, and question answering.

LSTMNLPReinforcement Learning
0 likes · 7 min read
LSTM‑Jump: Learning to Skim Text for Faster Sequence Modeling