Tag

ARMv86

1 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Nov 18, 2022 · Artificial Intelligence

ARMv86 Instruction Set Optimization for MNN: Accelerating Int8 and BF16 Matrix Multiplication

The article explains how ARMv86’s new SMMLA and BFMMLA GEMM instructions are integrated into MNN to accelerate INT8 and BF16 matrix multiplication, delivering up to 90% speedup over ARMv82’s SDOT and FP16‑FMLA kernels through optimized kernels, tiling, and compatibility handling.

ARMv86MNNNeural Network Inference
0 likes · 15 min read
ARMv86 Instruction Set Optimization for MNN: Accelerating Int8 and BF16 Matrix Multiplication