Tagged articles
24 articles
Page 1 of 1
Cloud Native Technology Community
Cloud Native Technology Community
Mar 13, 2026 · Cloud Native

How Kubernetes Evolved into a Unified AI Platform for Massive Data and Autonomous Agents

From its 2015 debut as a stateless microservice orchestrator, Kubernetes now powers large‑scale data pipelines, distributed training, high‑throughput inference, and autonomous agents, unifying these workloads on a single platform while addressing resource coordination, multi‑cluster scheduling, and GPU economics.

AICloud NativeGPU scheduling
0 likes · 10 min read
How Kubernetes Evolved into a Unified AI Platform for Massive Data and Autonomous Agents
Alimama Tech
Alimama Tech
Jan 7, 2026 · Artificial Intelligence

Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL

This article examines the limitations of Vibe Coding for large AI infrastructure, proposes a text‑driven, document‑centric workflow, and presents a time‑multiplexed GPU scheduling solution that dramatically improves rollout throughput and reduces timeouts in large‑scale Agentic RL training.

Design DocumentsGPU schedulingTime Multiplexing
0 likes · 21 min read
Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Dec 30, 2025 · Cloud Native

How HBox Boosts GPU Utilization with Multi‑Pool and NUMA‑Aware Scheduling

The HBox scheduling platform tackles large‑scale AI cluster challenges by introducing a three‑pool resource model, priority‑based preemptive scheduling, network‑topology and NUMA‑aware dispatch, and GPU virtualization techniques like MIG and vGPU, dramatically improving GPU utilization, SLA guarantees, and overall cluster efficiency.

AI clustersGPU schedulingGPU virtualization
0 likes · 24 min read
How HBox Boosts GPU Utilization with Multi‑Pool and NUMA‑Aware Scheduling
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 29, 2025 · Cloud Native

How Alibaba Cloud’s Container Stack Evolves for the AI Era

Alibaba Cloud’s container experts unveiled a comprehensive, AI‑focused upgrade across its cloud‑native stack—introducing AMD compute, dynamic scaling, AI‑native scheduling, secure execution environments, and advanced GPU profiling—to make containers the native foundation for AI workloads and accelerate enterprise AI adoption.

AI InfrastructureGPU schedulingcontainer computing
0 likes · 9 min read
How Alibaba Cloud’s Container Stack Evolves for the AI Era
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
May 15, 2025 · Cloud Native

How 360’s AI Platform Boosted GPU Utilization with Volcano Scheduler

360’s AI platform migrated its GPU clusters to a cloud‑native architecture and adopted the Volcano scheduler, achieving over 45% GPU utilization, less than 7% fragmentation, and more than 1000000 scheduled Pods, while leveraging flexible plugins, hierarchical queues, and resource pooling to optimize AI and big‑data workloads.

AI PlatformGPU schedulingKubernetes
0 likes · 13 min read
How 360’s AI Platform Boosted GPU Utilization with Volcano Scheduler
Alibaba Cloud Native
Alibaba Cloud Native
Apr 16, 2025 · Cloud Native

How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters

This guide explains how Alibaba Cloud's ACK One registered clusters can provide multi‑region, serverless GPU compute for AI workloads by using Kubernetes‑compatible labels, the ack‑co‑scheduler, and ResourcePolicy objects to dynamically allocate resources across regions, with step‑by‑step configuration examples.

ACK OneGPU schedulingServerless
0 likes · 11 min read
How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 4, 2025 · Cloud Native

Koordinator v1.6 Release: Advanced Heterogeneous Device Scheduling and GPU Management Features

The Koordinator v1.6 release introduces a suite of innovations—including GPU topology‑aware scheduling, end‑to‑end GPU & RDMA joint allocation, strong GPU isolation, differentiated GPU scoring, fine‑grained resource reservation, mixed‑workload QoS, and extensive scheduler and rescheduler optimizations—to efficiently manage heterogeneous resources in Kubernetes clusters for AI and high‑performance computing workloads.

Cloud NativeGPU schedulingHeterogeneous Resources
0 likes · 24 min read
Koordinator v1.6 Release: Advanced Heterogeneous Device Scheduling and GPU Management Features
Java Tech Enthusiast
Java Tech Enthusiast
Jan 9, 2025 · Cloud Native

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

This guide walks through installing the NVIDIA container toolkit, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in Kubernetes, labeling GPU nodes, and running a GPU‑accelerated FFmpeg pod to confirm successful GPU integration.

Container ToolkitDockerGPU
0 likes · 12 min read
Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 17, 2024 · Artificial Intelligence

Boosting LLM Inference: How NanoFlow Doubles Throughput

The article introduces NanoFlow, a novel service framework that leverages intra‑device parallelism, operation‑based pipelining, and async scheduling to significantly improve large language model serving throughput, achieving up to 1.91× higher performance while integrating with Alibaba Cloud PAI.

Alibaba Cloud PAIGPU schedulingLLM serving
0 likes · 7 min read
Boosting LLM Inference: How NanoFlow Doubles Throughput
ByteDance Cloud Native
ByteDance Cloud Native
Aug 9, 2023 · Cloud Native

How Volcano Engine’s New GPU Sharing Scheduler Boosts AI Workloads by 500%

This article explains Volcano Engine's next‑generation GPU sharing scheduling technology, detailing the two‑layer scheduler, card‑level bin‑pack/spread strategies, system architecture, API definitions, and optimization algorithms that together increase GPU deployment density over 500% and improve utilization by more than 50% for AI workloads.

GPU schedulingKubernetesmGPU
0 likes · 13 min read
How Volcano Engine’s New GPU Sharing Scheduler Boosts AI Workloads by 500%
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 27, 2023 · Artificial Intelligence

Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference

REEF introduces a reset‑based preemption mechanism and dynamic kernel padding to achieve microsecond‑scale GPU kernel preemption, enabling concurrent real‑time and best‑effort DNN inference with only 2 % added latency for real‑time tasks while boosting overall throughput by up to 7.7×, as demonstrated on the DISB benchmark.

DNN inferenceGPU schedulingREEF
0 likes · 9 min read
Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference
Baidu Tech Salon
Baidu Tech Salon
Mar 29, 2023 · Artificial Intelligence

Punica System: Enhancing AI Inference Service Efficiency Through FaaS Architecture

The Punica system unifies AI inference development, testing, deployment, and maintenance on a FaaS‑based one‑stop platform that automates resource scheduling, self‑healing, and monitoring, supporting multiple frameworks and GPUs, thereby doubling onboarding speed, quintuple scaling efficiency, and reclaiming hundreds of GPU cards.

AI inferenceFaaS architectureGPU scheduling
0 likes · 13 min read
Punica System: Enhancing AI Inference Service Efficiency Through FaaS Architecture
Huolala Tech
Huolala Tech
Mar 23, 2023 · Cloud Native

How Huolala Built a Cloud‑Native One‑Stop AI Platform on Kubernetes

Huolala’s Big Data Intelligent Platform team describes how they built a cloud‑native, one‑stop AI solution on Kubernetes, integrating Flink‑based feature engineering, a multi‑tenant Zeppelin notebook, GPU‑aware training, and a unified model‑serving platform, while addressing resource isolation, storage persistence, and cross‑cloud deployment.

AI PlatformCloud NativeGPU scheduling
0 likes · 17 min read
How Huolala Built a Cloud‑Native One‑Stop AI Platform on Kubernetes
DataFunSummit
DataFunSummit
Apr 26, 2022 · Artificial Intelligence

Elastic Distributed Training at Huya: Design, Implementation, and Results

This talk describes Huya’s elastic distributed training system, covering the motivation behind elasticity, its design using Kubernetes and ETCD for dynamic node registration and scaling, implementation details of the EFDL framework, performance evaluations on ResNet‑50, and the resulting benefits and future directions.

AI PlatformDistributed TrainingGPU scheduling
0 likes · 11 min read
Elastic Distributed Training at Huya: Design, Implementation, and Results
DataFunTalk
DataFunTalk
Apr 23, 2022 · Artificial Intelligence

Elastic Distributed Training at Huya: Design, Implementation, and Results

This article describes Huya's elastic distributed training system, explaining why elasticity is needed, the architectural design using Kubernetes and ETCD, the dynamic scaling process, performance evaluations on ResNet‑50, and future improvements for more efficient and reliable AI model training.

AI PlatformGPU schedulingKubernetes
0 likes · 10 min read
Elastic Distributed Training at Huya: Design, Implementation, and Results
Code DAO
Code DAO
Dec 11, 2021 · Artificial Intelligence

Nimble: A Lightweight Parallel GPU Scheduler Boosting Deep Learning Performance

The article analyzes how Nimble reduces GPU scheduling overhead and enables parallel execution through ahead‑of‑time scheduling and automatic multi‑stream assignment, achieving up to 22.3× inference speedup over PyTorch and significantly improving GPU utilization for deep learning workloads.

Deep LearningGPU schedulingParallel Execution
0 likes · 9 min read
Nimble: A Lightweight Parallel GPU Scheduler Boosting Deep Learning Performance
Qingyun Technology Community
Qingyun Technology Community
Nov 4, 2021 · Cloud Native

What’s New in KubeSphere 3.2.0? GPU Scheduling, Multi‑Cluster Management & More

KubeSphere 3.2.0, the latest cloud‑native distribution built on Kubernetes, introduces GPU resource scheduling and monitoring, enhanced observability with Grafana panels, multi‑cluster and multi‑tenant management, advanced storage features, a global gateway, OpenID Connect authentication, a dynamic application store, and a more independent DevOps suite, all aimed at improving user experience and operational efficiency.

Cloud NativeGPU schedulingKubernetes
0 likes · 12 min read
What’s New in KubeSphere 3.2.0? GPU Scheduling, Multi‑Cluster Management & More
58 Tech
58 Tech
Nov 20, 2020 · Artificial Intelligence

Evolution and Practice of the 58.com AI Algorithm Platform (WPAI)

The article details the development, architecture, and optimization of 58.com’s AI algorithm platform (WPAI), covering its background, overall design, large‑scale distributed machine learning, deep‑learning platform features, inference performance enhancements, GPU resource scheduling improvements, and future directions.

AI PlatformGPU schedulingInference Optimization
0 likes · 15 min read
Evolution and Practice of the 58.com AI Algorithm Platform (WPAI)
StarRing Big Data Open Lab
StarRing Big Data Open Lab
May 26, 2020 · Cloud Computing

How TCOS 2.0 Empowers Big Data, AI, and Cloud Workloads with Enhanced Compatibility

TCOS 2.0, the container operating system from Transwarp, expands compatibility to Windows, ARM, MIPS, and domestic platforms, adds GPU heterogeneous scheduling, HPA autoscaling, enhanced local storage management, and improved monitoring, providing a robust foundation for big data, AI, and cloud-native applications.

Big DataContainerGPU scheduling
0 likes · 11 min read
How TCOS 2.0 Empowers Big Data, AI, and Cloud Workloads with Enhanced Compatibility
21CTO
21CTO
Sep 17, 2017 · Artificial Intelligence

Scaling JD’s AI Platform: 5K+ Containers, GPU Management, and Multi‑Tenant Kubernetes

Since September 2016, JD’s AI foundation platform has leveraged Docker and Kubernetes to build a scalable machine‑learning infrastructure that now runs over 5,000 container instances, supports more than 20 AI services, and provides unified GPU, storage, networking, and multi‑tenant capabilities for both inference and training workloads.

AI PlatformGPU schedulingKubernetes
0 likes · 14 min read
Scaling JD’s AI Platform: 5K+ Containers, GPU Management, and Multi‑Tenant Kubernetes