Artificial Intelligence 14 min read

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

The paper introduces Pyraformer, a low‑complexity pyramidal‑attention Transformer that captures multi‑scale temporal dependencies with linear time‑space complexity, achieving superior single‑step and long‑range forecasting performance on real‑world datasets while supporting green‑computing capacity management.

AntTech
AntTech
AntTech
Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

Time‑series forecasting is crucial for risk management, resource allocation, and green computing, yet existing models struggle to capture multi‑scale dependencies (daily, weekly, monthly) without incurring high computational cost.

To address this, Ant Group and Shanghai Jiao Tong University propose the Pyramid Attention Module (PAM), which extracts features at different resolutions via inter‑scale edges and models dependencies within each scale via intra‑scale edges, achieving linear time and space complexity while maintaining an O(1) maximum information‑propagation path.

The resulting model, Pyraformer, consists of three key components: (A) the PAM that builds a hierarchical graph of attention; (B) a Coarse‑Scale Construction Module (CSCM) that aggregates fine‑scale nodes into coarser representations using a bottleneck convolution; and (C) a prediction module, either a simple linear head or a two‑layer attention decoder, both of which benefit from the rich multi‑resolution features.

Experiments on four real‑world datasets (Wind, Ant App Flow, Electricity, and ETT) demonstrate that Pyraformer consistently outperforms Transformer, Longformer, Reformer, ETC, and Informer in both single‑step and long‑range forecasting metrics (NRMSE, ND) while using far fewer Q‑K operations, less compute time, and lower memory consumption.

In the context of Ant’s “green computing” initiative, Pyraformer enables predictive auto‑scaling of containerized micro‑services, improving CPU utilization by 5‑10% and saving approximately 30,000 cores per day, thereby reducing energy consumption.

The paper was accepted as an oral presentation at ICLR 2022, and the code has been open‑sourced on GitHub (https://github.com/alipay/Pyraformer). References to related works on attention mechanisms and time‑series models are provided.

transformertime series forecastinglow complexityPyraformerpyramidal attentionlong-range dependencies
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.