Operations 16 min read

How Alibaba Cloud’s CMU610 Redefines BMC Architecture for AI Servers

The article examines the evolving demands of AI‑driven data‑center servers, the limitations of traditional BMC chips, and how Alibaba Cloud’s self‑designed CMU610 chip combined with a Zephyr‑based OpenBMC firmware delivers a highly integrated, cost‑effective solution that reshapes server management.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
How Alibaba Cloud’s CMU610 Redefines BMC Architecture for AI Servers

Background

Modern data‑center servers consist of a compute subsystem (CPU/GPU) and a Baseboard Management Controller (BMC) subsystem that monitors power, temperature, and health of critical components. The rapid growth of AI workloads has driven a shift from CPU‑centric to heterogeneous GPU‑centric designs, often including multiple sub‑boards such as PCIe switches, GPUs, PowerShelf units, and DPU accelerators. Each sub‑board now requires its own BMC management, dramatically increasing the number of BMC chips per server.

Challenges of Traditional BMC Chips

Limited sensor interfaces cannot cover the exploding number of monitoring points in AI servers.

Weak hardware root‑of‑trust (ROT) and overall security capabilities.

Reliance on external IO expanders, DDR, flash, and ROT chips prevents full System‑on‑Chip integration.

Market is dominated by a single vendor (~90% share), limiting supply‑chain diversity and innovation.

Typical BMC subsystem consists of five core chips (controller, flash, DDR, IO expander, ROT), leading to high cost, power, and board area.

AI Server BMC Demand Explosion

Analysis of Nvidia NVL72 shows the BMC count per server rising to 98 chips, a several‑fold increase over traditional servers. Each functional sub‑board now needs an independent BMC subsystem, and in high‑density 8‑GPU configurations the BMC area can occupy up to 25% of the board, with response latency exceeding 10 seconds when managing thousands of sensors.

CMU610 – A Self‑Developed BMC Chip

The CMU610 chip, designed by Alibaba Cloud’s DAMO Academy, integrates the following on a single die:

Dual‑core XuanTie E906 RISC‑V cores @ 600 MHz

1.2 MB SRAM, 32 MB integrated Flash, 8 MB PSRAM (DDR‑less)

Integrated ROT, ADC, and double the number of I3C interfaces compared to high‑end BMCs

Security‑isolated AMP mode: one core runs a Zephyr‑based secure firmware, the other runs a Zephyr‑based OpenBMC firmware

Typical power consumption < 0.5 W, cost reduction ~80 %, power saving ~70 %, area reduction ~60 %

Security is enforced by a separate Security AHB bus and an I/O Memory Protection Unit (IOPMP) that isolates the business core from the secure bus.

CoreLynxV6 – Zephyr‑Based OpenBMC Firmware

CoreLynxV6 migrates OpenBMC from a Linux kernel to the Zephyr kernel while preserving >90 % of the original code. Key architectural changes:

Linux processes → Zephyr threads

D‑Bus replaced by ZBus (message channel)

systemd removed; POSIX APIs retained for compatibility

File system emulated with ZFS/LittleFS to provide Linux‑style device files

Because the POSIX layer is kept, existing OpenBMC applications and test suites compile and run unchanged.

Middleware Framework for Linux‑Style Applications

The framework consists of five modules that allow Zephyr to host unmodified Linux applications:

Zbus – public message channel with subscription and point‑to‑point calls.

Sd‑bus + Sdbusplus – D‑Bus compatibility layer for existing OpenBMC apps.

Boost.Asio – asynchronous I/O library ported to Zephyr.

Zephyr‑OSAL – POSIX API shim providing Unix‑style system calls.

Device filesystem – ZFS/LittleFS based file system exposing device nodes as Linux‑style files.

Development Flow Integration

OpenBMC Yocto layers remain unchanged; CMU610 is added as a new machine/board.

External‑toolchain integration pulls the DAMO RISC‑V toolchain for cross‑compilation.

Source code is fetched and built with the standard devtool workflow.

No new branches are required – existing OpenBMC repositories can be used directly.

Key Breakthroughs Achieved

Full OpenBMC code reuse without any firmware rewrite.

Seamless execution of Linux applications on Zephyr, preserving the extensive OpenBMC ecosystem.

Reuse of OpenBMC test suites and automation tools, eliminating separate validation effort.

Tight integration of BMC silicon and firmware yields superior cost‑performance (80 % cost, 70 % power, 60 % area improvements).

Unified code base supports both “BigBMC” (full‑featured) and “LittleBMC” (high‑density) product lines, accelerating time‑to‑market.

Ecosystem and Standards Outlook

The convergence of the OpenBMC and Zephyr communities has led to joint working groups and emerging standards for firmware‑defined BMC chips. Industry participants—including major server vendors, silicon IP providers, and open‑source foundations—are collaborating on specifications that formalize the “Firmware Defined BMC” model demonstrated by CMU610 and CoreLynxV6.

Illustrative Diagrams

CMU610 architecture diagram
CMU610 architecture diagram
CoreLynxV6 software stack
CoreLynxV6 software stack
Middleware module interaction
Middleware module interaction
hardwarefirmwareBMCAI serversOpenBMCZephyrCMU610
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.