Quick Guide to ARM Assembly Development: Tips, Bugs, and Performance Optimization
This quick‑start guide walks readers through ARM assembly development by teaching simple template functions, exposing typical parameter‑passing and register bugs with debugging tricks, and demonstrating a depthwise convolution written in assembly that delivers roughly 4.7× faster inference on a Huawei Mate40 Pro compared to its C++ counterpart, while also covering ARM32/ARM64 register conventions, vector instructions, and floating‑point handling.