Fundamentals 7 min read

Master ARM32/64 Architecture: From Instruction Sets to Performance Analysis

This intensive two‑day course covers ARM32/64 processor instruction sets, mode switching, exception vectors, system call mechanisms, memory management, atomic operations, cache synchronization, and top‑down performance analysis with perf, while also introducing M‑series MCU architectures and providing hands‑on labs for embedded Linux developers.

Deepin Linux
Deepin Linux
Deepin Linux
Master ARM32/64 Architecture: From Instruction Sets to Performance Analysis

Course Introduction

This course focuses on ARM32/64 application processor instruction sets, mode switching, OS exception vectors, system call implementation, memory management (MMU), atomic operations, memory barriers, multi‑core cache synchronization, and programming methods. It compares differences and commonalities between ARM32 and ARM64, and also introduces M‑series MCU architectures and instruction sets. The final module teaches top‑down performance analysis using perf to locate micro‑architectural bottlenecks.

Co‑organizer

China High‑Tech Industryization Research Association – Intelligent Information Processing Branch.

Organizer

Beijing Zhongji Fugue Technology Co., Ltd. Beijing Zhongji Saiwei Cultural Development Co., Ltd.

Training Dates

September 25‑26, 2025 (two days).

Location

Shanghai.

Instructor

Mr. Song, a renowned embedded systems expert who has provided Linux training for companies such as Cisco, Alcatel, Lucent, Huawei, STMicroelectronics, Fuji Xerox, and many others. He has contributed over 400 patches to the mainline Linux kernel, making him one of the most prolific Chinese contributors.

Target Audience

Software developers working on ARM32/64 or M‑series hardware on bare‑metal, RTOS, or Linux platforms who need a solid low‑level foundation; the top‑down analysis part is aimed at Linux developers. Hardware engineers can also benefit to better understand software requirements.

Course Format

Lectures combined with hands‑on labs, approximately 20 lab exercises. Concepts, methods, and principles are taught through explanation and discussion, with individual or group classroom exercises.

Course Outline

Chapter 1 – ARM Processors

RISC vs CISC

Von Neumann and Harvard architectures

History of ARM processors

ARM processor families

ARM MMU

Chapter 2 – ARM Instruction Set (Cortex‑A series)

ARM32/ARM64 registers

Processor modes

Memory operation instructions

Coprocessor instructions

Arithmetic instructions

Synchronization instructions

Instruction set differences

ARM64 RET and ERET

ARM64 CSEL

Chapter 3 – ARM Processor Mode Switching

ARM32 exception handling and vector table

ARM64 EL0‑EL3 mode switching

ARM64 vector tables

Interrupt control in multicore

TrustZone

Virtualization extensions

ARMv8 vs ARMv9

Chapter 4 – ARM/ARM64 ABI and System Calls (usr to svc)

ARM ABI, C and assembly interoperation

System call process

Assembly‑written applications on Linux

Bare‑metal SWI example

ARM64 ABI and system calls

Chapter 5 – Memory, Cache and Pipelines

Read‑modify‑write and M‑series bit‑banding

Atomic operations: swap instruction

Atomic operations: ldrex/strex

Atomic operations: ldxr/stxr

Memory barriers: dmb, dsb, isb

Cache synchronization and false sharing in multicore

Chapter 6 – Vector Computing: NEON, SVE, SVE2

NEON operation principles

ARM NEON intrinsics and programming examples

ARMv8 SVE

ARMv9 SVE2

Chapter 7 – Cortex‑M Series

Operating modes of Cortex‑M processors

Cortex‑M vector table

Reset process

Vector interrupts

Interrupt handling

Chapter 8 – Top‑Down Micro‑architecture Performance Analysis

Frontend vs backend bottlenecks

Top‑down performance profiling

Contact

For further inquiries, contact Mr. Sun at 13001966238 (note: ARM processors).

Image
Image
Image
Image
Image
Image
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxARMperformance analysisembedded systemsprocessor architecture
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.