Tagged articles
12 articles
Page 1 of 1
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 30, 2025 · Artificial Intelligence

Exploring and Practicing a Unified Compute Network for AI at Zuoyebang: Building an Innovation Engine for the AI Era

This article summarizes Zuoyebang's infrastructure leader Dong Xiaocong's presentation on the challenges of AI inference demand and supply, and describes the design and implementation of a unified compute network—including trusted networking, multi‑region container scheduling, and traffic routing—to efficiently serve large‑scale AI models.

AICompute NetworkInfrastructure
0 likes · 9 min read
Exploring and Practicing a Unified Compute Network for AI at Zuoyebang: Building an Innovation Engine for the AI Era
dbaplus Community
dbaplus Community
Apr 1, 2024 · Cloud Native

Scaling Cloud‑Native Containers at DeWu: Multi‑Cluster Management and Cost Optimization

This article details DeWu's cloud‑native transformation since August 2021, covering multi‑cluster federation, application profiling, custom scheduling plugins, resource pre‑reservation, co‑location of online and offline workloads, cost‑saving hardware choices, multi‑cloud strategy, and the development of the KubeAI platform for AI scenarios.

AI PlatformMulti-ClusterResource Optimization
0 likes · 24 min read
Scaling Cloud‑Native Containers at DeWu: Multi‑Cluster Management and Cost Optimization
Alibaba Cloud Native
Alibaba Cloud Native
Nov 24, 2023 · Cloud Native

How Koordinator Boosts CPU Utilization and Cuts Costs in Large‑Scale Mixed Workloads

Koordinator, an open‑source cloud‑native mixed‑workload scheduler born from Alibaba’s internal container orchestration experience, enables Xiaohongshu to reclaim idle resources, improve CPU utilization beyond 45%, reduce resource costs by millions of core‑hours, and seamlessly integrate Kubernetes with YARN for batch and AI workloads.

Cloud NativeResource OptimizationYARN
0 likes · 18 min read
How Koordinator Boosts CPU Utilization and Cuts Costs in Large‑Scale Mixed Workloads
Alibaba Cloud Native
Alibaba Cloud Native
Mar 16, 2023 · Cloud Native

How Koordinator Supercharges ACK Container Scheduling and Resource Efficiency

Koordinator, an open‑source cloud‑native scheduler from Alibaba, enhances container performance and reduces cluster costs by introducing mixed‑workload placement, resource profiling, load‑aware scheduling, and differentiated SLO mixing, now fully integrated into Alibaba Cloud ACK with a new v1.1.1‑ack.1 release.

ACKCloud NativeKoordinator
0 likes · 10 min read
How Koordinator Supercharges ACK Container Scheduling and Resource Efficiency
ByteDance Cloud Native
ByteDance Cloud Native
Nov 10, 2022 · Cloud Native

Explore ByteDance’s Cloud‑Native Journey: Key Articles and Insights

This collection highlights ByteDance’s evolution in cloud‑native technologies, covering their microservice runtime architecture, large‑scale computing practices, open‑source project creation, and container scheduling advancements, providing links to detailed articles for readers to gain deeper insight.

Microservicescloud-nativecontainer scheduling
0 likes · 1 min read
Explore ByteDance’s Cloud‑Native Journey: Key Articles and Insights
Alibaba Cloud Native
Alibaba Cloud Native
Dec 14, 2021 · Cloud Native

How CPU Burst Improves Container Performance Without Reducing Deployment Density

This article explains the CPU Burst feature added in Linux 5.14, how it mitigates fine‑grained CPU throttling in Kubernetes containers, presents a queue‑theoretic model and Monte‑Carlo simulations to evaluate its impact on scheduler stability, and offers practical guidance for safely enabling it in production environments.

CPU BurstCloud NativeKubernetes
0 likes · 14 min read
How CPU Burst Improves Container Performance Without Reducing Deployment Density
Alibaba Cloud Native
Alibaba Cloud Native
Dec 5, 2018 · Artificial Intelligence

How Swarm Reinforcement Learning Boosts Alibaba’s Sigma Container Scheduling

This article examines how Alibaba’s Sigma container scheduler leverages a swarm reinforcement learning (SwarmRL) algorithm to improve online resource allocation, achieving higher placement rates and lower host usage compared to traditional First‑Fit, Best‑Fit, and manual tuning strategies.

Sigmacontainer schedulingonline bin packing
0 likes · 13 min read
How Swarm Reinforcement Learning Boosts Alibaba’s Sigma Container Scheduling
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 7, 2018 · Cloud Native

How Alibaba’s Sigma‑Cerebro Simulator Boosts Cluster Utilization for Double‑11

The article explains Alibaba’s Sigma container‑scheduling system and its Cerebro simulation platform, detailing how they improve resource utilization, reduce costs during large‑scale events like Double‑11, and address challenges such as fragmentation, rapid scaling, image distribution, and accurate workload forecasting.

Cloud Nativecontainer schedulingresource utilization
0 likes · 12 min read
How Alibaba’s Sigma‑Cerebro Simulator Boosts Cluster Utilization for Double‑11
21CTO
21CTO
Dec 24, 2017 · Cloud Computing

Tencent’s Elastic Compute: Efficient Idle Resource Use Without Service Disruption

This article describes Tencent’s elastic computing platform built to harness idle on‑premise resources for massive image, video, AI, and log processing workloads, detailing the architectural layers, strategies for protecting online service capacity, latency, scheduling and fault rates, and the practical lessons learned from its deployment.

Performance Optimizationcloud infrastructurecontainer scheduling
0 likes · 15 min read
Tencent’s Elastic Compute: Efficient Idle Resource Use Without Service Disruption
Architecture Digest
Architecture Digest
Sep 12, 2017 · Cloud Computing

Elastic Computing Platform for Massive Image Compression and Multi‑Workload Services

The article describes how an elastic container‑based computing platform replaces tens of thousands of physical servers to deliver billions of daily image‑compression operations, while also supporting video transcoding, Spark jobs, and AI workloads through resource isolation, named services, dynamic scheduling, and load‑balancing techniques.

Dynamic ScalingResource Isolationcloud platform
0 likes · 8 min read
Elastic Computing Platform for Massive Image Compression and Multi‑Workload Services
Meituan Technology Team
Meituan Technology Team
May 12, 2017 · Cloud Native

Design and Implementation of the HULK Container Platform Scheduling System

The HULK Container Platform scheduling system, built for Meituan‑Dianping, combines a hybrid, actor‑based scheduler with filter‑and‑rank logic, configurable trade‑offs, and dynamic over‑commit to balance resource utilization, high availability, and massive concurrent placement decisions for thousands of containerized services.

Cloud-nativeDistributed SystemsDocker
0 likes · 17 min read
Design and Implementation of the HULK Container Platform Scheduling System