Artificial Intelligence 14 min read

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

The paper introduces Pyraformer, a low‑complexity pyramidal‑attention Transformer that captures multi‑scale temporal dependencies with linear time‑space complexity, achieving superior single‑step and long‑range forecasting performance on real‑world datasets while supporting green‑computing capacity management.

AntTech

Apr 27, 2022

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

Time‑series forecasting is crucial for risk management, resource allocation, and green computing, yet existing models struggle to capture multi‑scale dependencies (daily, weekly, monthly) without incurring high computational cost.

To address this, Ant Group and Shanghai Jiao Tong University propose the Pyramid Attention Module (PAM), which extracts features at different resolutions via inter‑scale edges and models dependencies within each scale via intra‑scale edges, achieving linear time and space complexity while maintaining an O(1) maximum information‑propagation path.

The resulting model, Pyraformer, consists of three key components: (A) the PAM that builds a hierarchical graph of attention; (B) a Coarse‑Scale Construction Module (CSCM) that aggregates fine‑scale nodes into coarser representations using a bottleneck convolution; and (C) a prediction module, either a simple linear head or a two‑layer attention decoder, both of which benefit from the rich multi‑resolution features.

Experiments on four real‑world datasets (Wind, Ant App Flow, Electricity, and ETT) demonstrate that Pyraformer consistently outperforms Transformer, Longformer, Reformer, ETC, and Informer in both single‑step and long‑range forecasting metrics (NRMSE, ND) while using far fewer Q‑K operations, less compute time, and lower memory consumption.

In the context of Ant’s “green computing” initiative, Pyraformer enables predictive auto‑scaling of containerized micro‑services, improving CPU utilization by 5‑10% and saving approximately 30,000 cores per day, thereby reducing energy consumption.

The paper was accepted as an oral presentation at ICLR 2022, and the code has been open‑sourced on GitHub (https://github.com/alipay/Pyraformer). References to related works on attention mechanisms and time‑series models are provided.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Transformer time series forecasting low-complexity Pyraformer pyramidal attention long-range dependencies

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.