Understanding FLOPS, Benchmarks, and AI Compute Performance
This article explains the concept of FLOPS, its measurement units, common benchmarks such as Linpack and MLPerf, why traditional HPC benchmarks may not suit AI workloads, and provides a comprehensive overview of hardware performance figures from GFLOPS to PFLOPS across various modern processors and supercomputers.
FLOPS (floating‑point operations per second) is a metric used to estimate a computer's performance, especially for scientific calculations that involve many floating‑point operations; it essentially measures the speed of a processor's floating‑point unit (FPU) and is often benchmarked with programs like Linpack.
The article cites the BM1684 AI chip as an example, noting its FP32 performance of 2.2 TFLOPS, INT8 performance up to 35.2 TOPS, and its integrated TPU module containing 1,024 EU units.
It defines the common FLOPS scales: MFLOPS (10⁶), GFLOPS (10⁹), TFLOPS (10¹²), and PFLOPS (10¹⁵), and explains how Linpack evaluates high‑performance computers by solving dense linear systems using Gaussian elimination, with results expressed in FLOPS.
Because AI workloads typically use single‑precision, half‑precision, or INT8 arithmetic rather than double‑precision, the article argues that Linpack is not an ideal benchmark for AI; instead, AI‑focused suites such as MLPerf, MobileAI Bench, DeepBench, and HPL‑AI are discussed, highlighting scalability challenges at large accelerator counts.
MLPerf is described as a widely accepted AI benchmark covering training and inference tasks (e.g., ResNet‑50, BERT, SSD), with performance depending on storage, CPU, memory‑to‑GPU bandwidth, and GPU compute, and the article notes that hardware components become bottlenecks if any are slower.
Extensive tables list representative hardware FLOPS figures: from early CPUs (Intel Xeon <1.8 GFLOPS) to modern GPUs (NVIDIA GTX 1080Ti 10.8 TFLOPS) and supercomputers (IBM Roadrunner 1.026 PFLOPS, China’s Sunway 125 PFLOPS), illustrating the rapid growth of compute capability.
The piece also surveys current and upcoming high‑performance systems: Intel’s Aurora (>1 EFLOPS), the US exascale Frontier, Japan’s Fugaku (415.5 PFLOPS Linpack), and Europe’s EuroHPC project targeting multi‑petaflop installations, emphasizing the global race toward exascale and post‑exascale computing.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.