Cloud Native 14 min read

How ByteDance Scaled with Multi‑Cloud: Lessons from Their Cloud‑Native Journey

ByteDance’s multi‑cloud evolution, driven by rapid business growth, cost control, and compliance needs, showcases a distributed cloud‑native platform built on open‑source orchestration, unified resource management, and advanced data‑lake solutions, while addressing operational complexity, interoperability, and emerging AI‑driven challenges.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How ByteDance Scaled with Multi‑Cloud: Lessons from Their Cloud‑Native Journey

Background and Industry Trends

In 2022, IDC surveyed over 4,500 enterprises with cloud spend exceeding one million dollars and found that 88% adopted multi‑cloud architectures, a historic high. McKinsey predicts that by 2025, 42% of enterprises will retain private clouds, and edge cloud workloads are projected to exceed 30% of data processing needs.

Business‑Driven Multi‑Cloud Architecture

Complex business models, cost management, data security, and regulatory requirements push enterprises toward multi‑cloud strategies. As cloud workloads increase in complexity, distributed cloud becomes essential for balancing load, ensuring security, and achieving optimal cloud utilization.

ByteDance Multi‑Cloud Evolution

2016 : Launch of the Toutiao Cloud Engine (TCE) to unify resource pools across business units.

2017‑2018 : Rapid growth of Douyin (TikTok) required flexible resource supplementation from multiple cloud providers.

2019 : Massive scale of video and live‑streaming services drove a cloud‑native transformation of physical and online services.

2020 : Introduction of normal‑off‑line mixed‑placement to control costs and improve resource utilization.

2021‑2022 : Implementation of a federated cluster managing nearly 500,000 nodes, supporting over 100,000 micro‑services and 30,000 daily changes.

Distributed Cloud‑Native Platform

The platform aggregates public‑cloud clusters, IDC clusters, and edge clusters, unified by the open‑source orchestration engine KubeAdmiral , which raises average resource utilization from 85‑90% to 95%.

A unified scheduler named Godel handles mixed‑placement workloads with performance optimizations for large‑scale scenarios. The resource control system Katalyst, rebuilt with Kubernetes‑native principles, provides fine‑grained resource allocation, multi‑dimensional isolation, and advanced load‑draining strategies.

Compute Platform and Data Lake

To address massive offline workloads, ByteDance introduced CloudFS for multi‑cloud object storage with local caching acceleration, and Serverless YARN, a 100% Hadoop YARN‑compatible, cloud‑native solution that enables seamless migration of big‑data jobs.

The ResLake unified offline resource lake integrates compute, storage, and networking, delivering over 1.4× job acceleration and more than 30% cross‑region traffic optimization.

Operational Benefits

Reduced deployment complexity : Seamless migration of existing Kubernetes workloads without service disruption.

Improved interoperability : Unified networking, identity, and data access across clouds.

Enhanced cost control : Fine‑grained resource classification and dynamic reclamation based on latency sensitivity.

Higher resource utilization : Average utilization reached 63% overall, with critical clusters improving from 23% to 60%.

Future Challenges and Next‑Stage Distributed Cloud

Emerging AI workloads demand sophisticated scheduling for GPUs, FPGAs, and ASICs, requiring intelligent matching of compute resources. Data privacy concerns drive the adoption of privacy‑enhancing computation, including federated learning, trusted execution environments, and multi‑party computation.

Addressing these challenges will require platforms that support cross‑cloud development frameworks, open integration, and robust security mechanisms.

Key Visuals

big datacloud-nativeAIKubernetesmulti-cloudresource managementdistributed cloud
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.