Operations 8 min read

Alibaba’s OS, Storage, and Resource Management Highlights from OSDI'18

The 13th OSDI conference in Carlsbad attracted over 650 attendees, featuring 47 accepted papers and three Best Papers—two led by Chinese students—while Alibaba showcased its latest OS kernel (AliKernel), next‑generation distributed storage system Pangu 2.0, and the large‑scale resource manager Sigma, sparking lively discussions among global experts.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba’s OS, Storage, and Resource Management Highlights from OSDI'18
OSDI'18 conference photo
OSDI'18 conference photo

The USENIX Symposium on Operating Systems Design and Implementation (OSDI) held its 13th edition from October 8‑10, 2018 in Carlsbad, California. The event set a record with more than 650 participants, received 257 paper submissions, and accepted 47 papers (acceptance rate < 20%). In addition, 83 posters and 6 demos were presented.

Best Papers

Understanding Failures (REPT: Reverse Debugging of Failures in Deployed Software) – Weidong Cui et al.

Operating System (LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation) – Yizhou Shan et al.

Debugging (Orca: Differential Bug Localization in Large‑Scale Services) – Ranjita Bhagwan et al.

Two of the three Best Papers have first authors who are Chinese students, highlighting the growing impact of Chinese researchers in systems conferences. The Operating‑System Best Paper (LegoOS) cites Alibaba’s previously released cluster trace, indicating that Alibaba’s foundational technologies are gaining recognition at top academic venues.

Alibaba’s Sponsored BoF

Alibaba was the gold‑level sponsor of OSDI'18 and organized a Birds‑of‑a‑Feather (BoF) session covering three topics:

Recent advances in Alibaba’s OS development and innovation (AliKernel).

Alibaba Cloud’s next‑generation distributed storage system Pangu 2.0.

The large‑scale resource management system Sigma and its challenges during regular operation and the Double‑11 shopping festival.

AliKernel and Unikernel Exploration

AliKernel is an in‑house operating‑system kernel built to support Alibaba’s massive and diverse workloads. As server counts, application heterogeneity, and mixed‑tenant scales grow, the kernel must address challenges such as rapid iteration, cost reduction, and new paradigms like serverless. Senior experts presented on fast kernel development cycles, resource isolation, performance tuning, and innovations in the Unikernel direction (AliUK). The architecture of AliUK is illustrated below:

AliUK architecture
AliUK architecture

Pangu 2.0 Distributed Storage

Pangu 2.0 is a new generation distributed storage system already deployed widely within Alibaba. It offers low latency and high IOPS, featuring a multi‑tier design that adapts to diverse application scenarios. Key innovations include a pure user‑space storage engine (USSOS), a soft‑hardware co‑design, support for emerging storage media and RDMA networking, and substantial improvements in CPU efficiency and NVM utilization.

Pangu 2.0 architecture
Pangu 2.0 architecture

Sigma – Large‑Scale Resource Management

Sigma is Alibaba’s internal resource‑management platform and a cornerstone of its cloud‑transformation strategy. It serves numerous business units (e.g., Tmall, Taobao, advertising, logistics) and handles resource allocation for massive events such as Double‑11. Since 2011, Sigma has tackled capacity planning, stability, and cost‑control challenges.

During the BoF, experts described daily resource management, Double‑11 preparation, and stability improvements. Techniques highlighted include resource isolation, priority control, multi‑scheduler coordination (raising average CPU utilization of mixed workloads by over 45 %), reinforcement‑learning‑based scheduling, and a dynamic quota mechanism that adapts to workload and container capabilities, dramatically increasing overall resource utilization.

Sigma‑Fuxi mixed‑tenant architecture
Sigma‑Fuxi mixed‑tenant architecture

Closing Remarks

The BoF, which began at 20:30, quickly filled the room, reflecting strong interest in Alibaba’s system‑software work. Stanford’s Platform Lab Director John Ousterhout attended, and Alibaba continues collaborations with Stanford and many other leading universities. The session significantly raised awareness of Alibaba’s infrastructure among top researchers, and the team welcomes further contributions from the community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaResource ManagementOperating SystemsOSDIdistributed storage
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.