What Is a Leap Second and How Alibaba Ensures Its Smooth Implementation
The article explains the concept of a leap second, its scheduling by the IERS, the operational impact on large IT systems, and details Alibaba's technical strategy—including testing, time‑distribution methods, and team coordination—to guarantee a seamless transition during the 2015 positive leap second.
A leap second is the adjustment made to align the uniform atomic time scale (TAI) with the civil time scale (UTC) when their difference exceeds a threshold, a decision taken by the International Earth Rotation Service (IERS) based on the UT1‑TAI trend; the last adjustment occurred on July 1, 2012, and a positive leap second was scheduled for June 30, 2015, creating the unusual timestamp 23:59:60.
The one‑second correction can cause significant service disruptions: during the 2012 leap second, sites such as Reddit, Gawker, LinkedIn, and Yelp experienced temporary outages, and Amadeus’s Altea reservation system failed, affecting airline schedules; beyond the internet, sectors like power and telecommunications also rely heavily on precise time, so any anomaly quickly becomes noticeable.
Alibaba’s senior system engineer Cao Shijun, who participated in the 2012 leap‑second planning, recounts how the 2015 event fell to his team as part of the technical support department, especially after the IERS announced the upcoming positive leap second.
For 2015 the challenges grew: widespread use of OceanBase demanded higher time precision, and the rapid expansion of cloud services meant that foundational services such as NTP would affect both internal and external customers; the team evaluated two approaches—using a high catch‑up rate for a short period versus a low rate over a longer period—and, after extensive simulations on a 5 k‑node cluster, chose to distribute the extra second evenly across the 24‑hour day, beginning 12 hours before and ending 12 hours after the leap‑second moment.
The effort required intense coordination, with the team describing the pressure as “mountain‑high”; they expressed gratitude to colleagues from OceanBase and the Feitian 5 K project, noting that the meticulous plan was driven by those groups, and highlighted the metaphor that the single second felt like a whole day of work.
In a concluding note, the team reported that the leap‑second transition proceeded smoothly, with the extra second fully absorbed by 20:00, confirming the stability of Alibaba’s services during this critical time adjustment.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.