Tagged articles

DepthwiseConvolution

1 articles · Page 1 of 1

Nov 24, 2023 · Artificial Intelligence

Performance Optimization of Depthwise Conv Int8 on ARM CPUs

By converting the input format to a C16 layout and exploiting the ARM V8.2 Sdot instruction, the Int8 depthwise‑convolution operator on ARM CPUs can be accelerated from 4.46 ms to 1.75 ms—a 2.5× speedup—though the required data‑rearrangement overhead prevents it from overtaking FP16 performance.

ArmDepthwiseConvolutionINT8

0 likes · 10 min read

Performance Optimization of Depthwise Conv Int8 on ARM CPUs