Fundamentals 6 min read

Why RISC‑V’s Integer ISA Omits a Multiply‑Accumulate Instruction – A Microarchitectural Guess

The article speculates why the RISC‑V integer ISA does not define a multiply‑accumulate (MAC) instruction, examining how such an instruction would affect integer‑unit micro‑architecture, decoder width, rename complexity, and OoO window size, and compares these trade‑offs with ARM and Apple M1 designs.

Linux Code Review Hub

Aug 4, 2024

Why RISC‑V’s Integer ISA Omits a Multiply‑Accumulate Instruction – A Microarchitectural Guess

Today I write a brief discussion that speculates on a small point in the RISC‑V ISA: the integer instruction set does not define a multiply‑accumulate (MAC) instruction, and I explore why this might be.

Although the floating‑point (F/D) extensions include FMA instructions and the vector (V) extension also defines multiply‑add operations—because the V extension expects frequent use of such instructions—the scalar integer set omits them, likely because the probability of needing a three‑operand MAC in integer code is low.

It is not simply an oversight; adding an integer MAC would change the micro‑architectural design of the integer unit. To illustrate the impact, I compare with current high‑performance CPUs that can challenge Intel and AMD, namely Apple’s M1 and ARM’s V/X series. These designs aim for a “kilo‑instruction processor” with roughly a thousand‑entry out‑of‑order (OoO) window, seeking aggressive speculative execution.

Achieving such wide OoO windows requires a wide decoder and rename stage, but the pipeline cannot become too long because mis‑prediction penalties grow with pipeline depth. This trade‑off influences whether a three‑source MAC instruction is worthwhile.

AMD’s Zen dispatches up to eight uops per cycle, while ARM’s X4 can dispatch up to ten macro‑operations (or twenty uops) per cycle. To support three source registers per instruction, ARM must perform dependency analysis and validity checks in the stage before rename, ensuring that only the actually valid source operands consume rename resources.

Speculatively, RISC‑V may have omitted the integer MAC to simplify the implementation of a future ultra‑wide CPU. With an eight‑wide decoder, limiting integer instructions to at most two source operands eases register‑rename logic, especially when the third operand would be valid only rarely.

If this reasoning is correct, it shows that the RISC‑V ISA is designed with micro‑architectural constraints in mind, balancing integer‑unit performance, vector‑unit capabilities, and implementation difficulty.

ARM microarchitecture RISC-V Apple M1 ISA multiply-accumulate

Written by

Linux Code Review Hub

A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.