Domestic Supercomputer Hits 2.16 EFLOP/s, Enabling 10,000× Remote‑Sensing Compression
The D2AR generative compression framework leverages historical Earth‑observation priors and a dual‑decoupled asymmetric design to achieve exascale training at 2.16 EFLOP/s on a domestic Armv9 CPU supercomputer, scaling to 20,480 nodes and delivering up to 10,000× data reduction while preserving scientific utility.
Researchers from Tsinghua University, Sun Yat‑sen University, NUS and the Shenzhen Supercomputing Center present D2AR, a generative compression framework for global Earth‑observation data, described in the paper "Transforming the Use of Earth Observation Data: Exascale Training of a Generative Compression Model with Historical Priors for up to 10,000x Data Reduction" (arXiv:2605.08633).
The core idea shifts compression from per‑image redundancy removal to modeling long‑term spatial, temporal and spectral regularities with historical priors. D2AR adopts a Dual‑Decoupled Asymmetric Compression and Reconstruction (D2AR) strategy: the front‑end encoder extracts a minimal set of control tokens from multi‑source remote‑sensing streams, while the back‑end generative model, conditioned on geographic location and observation time, reconstructs the data in a unified multispectral latent space using an EQ‑VAE backbone and Flow Matching.
To train the model, the team built a software stack on the LingSheng Armv9 CPU supercomputer (an E‑class domestic system with >2 EFLOP/s FP64 performance). Optimizations targeted hierarchical memory, NUMA awareness, matrix‑extension kernels, and a layered parallel‑communication strategy. Communication‑computation overlap and runtime scheduling reduced synchronization stalls, enabling efficient large‑scale training.
Performance results show that a single node equipped with Armv9 LX2 and SME matches the throughput of a NVIDIA A100 GPU and outperforms an Intel Xeon 8558P (AMX). In weak‑scale experiments across 20,480 nodes, D2AR‑rec‑6B achieved 1.54 EFLOP/s sustained (BFloat16) and a peak of >2.16 EFLOP/s, demonstrating strong scalability.
From an application perspective, D2AR does not merely shrink storage; it creates a callable generative prior that can reconstruct scientifically valuable information on demand at extreme compression ratios (up to 10,000×). Experiments report improved perceptual quality, structural similarity, and NDVI metrics, and downstream land‑cover classification retains high task utility, confirming that the compressed representations remain useful for analysis.
This work validates the capability of domestic supercomputers to support AI‑for‑Science workloads, showing that coordinated algorithm, model and system‑software design can enable both traditional scientific computing and massive generative AI training. The results suggest a new paradigm for Earth‑observation data handling where storage efficiency and analytical value are jointly optimized.
[1] Peak training performance is calculated from the full forward‑and‑backward model FLOP count and measured wall‑time, including runtime scheduling and kernel launch overhead; sustained performance further accounts for data loading, communication synchronization and optimizer updates.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
