Artificial Intelligence 11 min read

Understanding FLOPS, Benchmarks, and AI Compute Performance

This article explains the concept of FLOPS, its measurement units, common benchmarks such as Linpack and MLPerf, why traditional HPC benchmarks may not suit AI workloads, and provides a comprehensive overview of hardware performance figures from GFLOPS to PFLOPS across various modern processors and supercomputers.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Understanding FLOPS, Benchmarks, and AI Compute Performance

FLOPS (floating‑point operations per second) is a metric used to estimate a computer's performance, especially for scientific calculations that involve many floating‑point operations; it essentially measures the speed of a processor's floating‑point unit (FPU) and is often benchmarked with programs like Linpack.

The article cites the BM1684 AI chip as an example, noting its FP32 performance of 2.2 TFLOPS, INT8 performance up to 35.2 TOPS, and its integrated TPU module containing 1,024 EU units.

It defines the common FLOPS scales: MFLOPS (10⁶), GFLOPS (10⁹), TFLOPS (10¹²), and PFLOPS (10¹⁵), and explains how Linpack evaluates high‑performance computers by solving dense linear systems using Gaussian elimination, with results expressed in FLOPS.

Because AI workloads typically use single‑precision, half‑precision, or INT8 arithmetic rather than double‑precision, the article argues that Linpack is not an ideal benchmark for AI; instead, AI‑focused suites such as MLPerf, MobileAI Bench, DeepBench, and HPL‑AI are discussed, highlighting scalability challenges at large accelerator counts.

MLPerf is described as a widely accepted AI benchmark covering training and inference tasks (e.g., ResNet‑50, BERT, SSD), with performance depending on storage, CPU, memory‑to‑GPU bandwidth, and GPU compute, and the article notes that hardware components become bottlenecks if any are slower.

Extensive tables list representative hardware FLOPS figures: from early CPUs (Intel Xeon <1.8 GFLOPS) to modern GPUs (NVIDIA GTX 1080Ti 10.8 TFLOPS) and supercomputers (IBM Roadrunner 1.026 PFLOPS, China’s Sunway 125 PFLOPS), illustrating the rapid growth of compute capability.

The piece also surveys current and upcoming high‑performance systems: Intel’s Aurora (>1 EFLOPS), the US exascale Frontier, Japan’s Fugaku (415.5 PFLOPS Linpack), and Europe’s EuroHPC project targeting multi‑petaflop installations, emphasizing the global race toward exascale and post‑exascale computing.

PerformancebenchmarkHPCAI computeFLOPSMLPerf
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.