Tagged articles

24 articles

Page 1 of 1

Mar 13, 2026 · Cloud Native

How Kubernetes Evolved into a Unified AI Platform for Massive Data and Autonomous Agents

From its 2015 debut as a stateless microservice orchestrator, Kubernetes now powers large‑scale data pipelines, distributed training, high‑throughput inference, and autonomous agents, unifying these workloads on a single platform while addressing resource coordination, multi‑cluster scheduling, and GPU economics.

AICloud NativeGPU scheduling

0 likes · 10 min read

How Kubernetes Evolved into a Unified AI Platform for Massive Data and Autonomous Agents

Alimama Tech

Jan 7, 2026 · Artificial Intelligence

Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL

This article examines the limitations of Vibe Coding for large AI infrastructure, proposes a text‑driven, document‑centric workflow, and presents a time‑multiplexed GPU scheduling solution that dramatically improves rollout throughput and reduces timeouts in large‑scale Agentic RL training.

Design DocumentsGPU schedulingTime Multiplexing

0 likes · 21 min read

Can Text‑Driven Vibe Coding Tame Complex AI Infra? A Deep Dive into GPU Time‑Sharing for Agentic RL

360 Zhihui Cloud Developer

Dec 30, 2025 · Cloud Native

How HBox Boosts GPU Utilization with Multi‑Pool and NUMA‑Aware Scheduling

The HBox scheduling platform tackles large‑scale AI cluster challenges by introducing a three‑pool resource model, priority‑based preemptive scheduling, network‑topology and NUMA‑aware dispatch, and GPU virtualization techniques like MIG and vGPU, dramatically improving GPU utilization, SLA guarantees, and overall cluster efficiency.

AI clustersGPU schedulingGPU virtualization

0 likes · 24 min read

How HBox Boosts GPU Utilization with Multi‑Pool and NUMA‑Aware Scheduling

Alibaba Cloud Infrastructure

Dec 8, 2025 · Cloud Native

Optimizing AI GPU Utilization with Multi‑Cluster Priority Scheduling on ACK One

In the era of large AI models, ACK One’s multi‑cluster fleet provides inventory‑aware elastic scheduling, cluster‑level priority dispatch, and hybrid‑cloud strategies to maximize GPU utilization, ensure business continuity, and reduce costs across regions and on‑premise data centers.

ACK OneAI WorkloadCloud Native

0 likes · 11 min read

Optimizing AI GPU Utilization with Multi‑Cluster Priority Scheduling on ACK One

Alibaba Cloud Infrastructure

Nov 3, 2025 · Cloud Computing

How ACK One Fleet Enables Scalable AI Workloads with Multi‑Cluster GPU Scheduling

ACK One Fleet, Alibaba Cloud's enterprise multi‑cluster solution, provides inventory‑aware elastic GPU scheduling, cross‑region resource sharing, multi‑cluster HPA and model distribution, allowing AI inference and training workloads to scale efficiently, reduce costs, and maximize GPU utilization.

AIGPU schedulingHPA

0 likes · 12 min read

How ACK One Fleet Enables Scalable AI Workloads with Multi‑Cluster GPU Scheduling

Alibaba Cloud Infrastructure

Oct 29, 2025 · Cloud Native

How Alibaba Cloud’s Container Stack Evolves for the AI Era

Alibaba Cloud’s container experts unveiled a comprehensive, AI‑focused upgrade across its cloud‑native stack—introducing AMD compute, dynamic scaling, AI‑native scheduling, secure execution environments, and advanced GPU profiling—to make containers the native foundation for AI workloads and accelerate enterprise AI adoption.

AI InfrastructureGPU schedulingcontainer computing

0 likes · 9 min read

How Alibaba Cloud’s Container Stack Evolves for the AI Era

360 Zhihui Cloud Developer

May 15, 2025 · Cloud Native

How 360’s AI Platform Boosted GPU Utilization with Volcano Scheduler

360’s AI platform migrated its GPU clusters to a cloud‑native architecture and adopted the Volcano scheduler, achieving over 45% GPU utilization, less than 7% fragmentation, and more than 1000000 scheduled Pods, while leveraging flexible plugins, hierarchical queues, and resource pooling to optimize AI and big‑data workloads.

AI PlatformGPU schedulingKubernetes

0 likes · 13 min read

How 360’s AI Platform Boosted GPU Utilization with Volcano Scheduler

Alibaba Cloud Native

Apr 16, 2025 · Cloud Native

How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters

This guide explains how Alibaba Cloud's ACK One registered clusters can provide multi‑region, serverless GPU compute for AI workloads by using Kubernetes‑compatible labels, the ack‑co‑scheduler, and ResourcePolicy objects to dynamically allocate resources across regions, with step‑by‑step configuration examples.

ACK OneGPU schedulingServerless

0 likes · 11 min read

How to Achieve Multi‑Region Serverless GPU Scheduling with ACK One Registered Clusters

Alibaba Cloud Infrastructure

Mar 4, 2025 · Cloud Native

Koordinator v1.6 Release: Advanced Heterogeneous Device Scheduling and GPU Management Features

The Koordinator v1.6 release introduces a suite of innovations—including GPU topology‑aware scheduling, end‑to‑end GPU & RDMA joint allocation, strong GPU isolation, differentiated GPU scoring, fine‑grained resource reservation, mixed‑workload QoS, and extensive scheduler and rescheduler optimizations—to efficiently manage heterogeneous resources in Kubernetes clusters for AI and high‑performance computing workloads.

Cloud NativeGPU schedulingHeterogeneous Resources

0 likes · 24 min read

Koordinator v1.6 Release: Advanced Heterogeneous Device Scheduling and GPU Management Features

Java Tech Enthusiast

Jan 9, 2025 · Cloud Native

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

This guide walks through installing the NVIDIA container toolkit, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in Kubernetes, labeling GPU nodes, and running a GPU‑accelerated FFmpeg pod to confirm successful GPU integration.

Container ToolkitDockerGPU

0 likes · 12 min read

Configuring NVIDIA Docker Plugin and GPU Access in Kubernetes

Alibaba Cloud Big Data AI Platform

Sep 17, 2024 · Artificial Intelligence

Boosting LLM Inference: How NanoFlow Doubles Throughput

The article introduces NanoFlow, a novel service framework that leverages intra‑device parallelism, operation‑based pipelining, and async scheduling to significantly improve large language model serving throughput, achieving up to 1.91× higher performance while integrating with Alibaba Cloud PAI.

Alibaba Cloud PAIGPU schedulingLLM serving

0 likes · 7 min read

Boosting LLM Inference: How NanoFlow Doubles Throughput

ByteDance Cloud Native

Aug 9, 2023 · Cloud Native

How Volcano Engine’s New GPU Sharing Scheduler Boosts AI Workloads by 500%

This article explains Volcano Engine's next‑generation GPU sharing scheduling technology, detailing the two‑layer scheduler, card‑level bin‑pack/spread strategies, system architecture, API definitions, and optimization algorithms that together increase GPU deployment density over 500% and improve utilization by more than 50% for AI workloads.

GPU schedulingKubernetesmGPU

0 likes · 13 min read

How Volcano Engine’s New GPU Sharing Scheduler Boosts AI Workloads by 500%

Network Intelligence Research Center (NIRC)

Jun 27, 2023 · Artificial Intelligence

Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference

REEF introduces a reset‑based preemption mechanism and dynamic kernel padding to achieve microsecond‑scale GPU kernel preemption, enabling concurrent real‑time and best‑effort DNN inference with only 2 % added latency for real‑time tasks while boosting overall throughput by up to 7.7×, as demonstrated on the DISB benchmark.

DNN inferenceGPU schedulingREEF

0 likes · 9 min read

Microsecond-Scale GPU Preemption Enables Concurrent Real-Time DNN Inference

Baidu Tech Salon

Mar 29, 2023 · Artificial Intelligence

Punica System: Enhancing AI Inference Service Efficiency Through FaaS Architecture

The Punica system unifies AI inference development, testing, deployment, and maintenance on a FaaS‑based one‑stop platform that automates resource scheduling, self‑healing, and monitoring, supporting multiple frameworks and GPUs, thereby doubling onboarding speed, quintuple scaling efficiency, and reclaiming hundreds of GPU cards.

AI inferenceFaaS architectureGPU scheduling

0 likes · 13 min read

Punica System: Enhancing AI Inference Service Efficiency Through FaaS Architecture

Huolala Tech

Mar 23, 2023 · Cloud Native

How Huolala Built a Cloud‑Native One‑Stop AI Platform on Kubernetes

Huolala’s Big Data Intelligent Platform team describes how they built a cloud‑native, one‑stop AI solution on Kubernetes, integrating Flink‑based feature engineering, a multi‑tenant Zeppelin notebook, GPU‑aware training, and a unified model‑serving platform, while addressing resource isolation, storage persistence, and cross‑cloud deployment.

AI PlatformCloud NativeGPU scheduling

0 likes · 17 min read

How Huolala Built a Cloud‑Native One‑Stop AI Platform on Kubernetes

DataFunSummit

Apr 26, 2022 · Artificial Intelligence

Elastic Distributed Training at Huya: Design, Implementation, and Results

This talk describes Huya’s elastic distributed training system, covering the motivation behind elasticity, its design using Kubernetes and ETCD for dynamic node registration and scaling, implementation details of the EFDL framework, performance evaluations on ResNet‑50, and the resulting benefits and future directions.

AI PlatformDistributed TrainingGPU scheduling

0 likes · 11 min read

Elastic Distributed Training at Huya: Design, Implementation, and Results

DataFunTalk

Apr 23, 2022 · Artificial Intelligence

Elastic Distributed Training at Huya: Design, Implementation, and Results

This article describes Huya's elastic distributed training system, explaining why elasticity is needed, the architectural design using Kubernetes and ETCD, the dynamic scaling process, performance evaluations on ResNet‑50, and future improvements for more efficient and reliable AI model training.

AI PlatformGPU schedulingKubernetes

0 likes · 10 min read

Code DAO

Dec 11, 2021 · Artificial Intelligence

Nimble: A Lightweight Parallel GPU Scheduler Boosting Deep Learning Performance

The article analyzes how Nimble reduces GPU scheduling overhead and enables parallel execution through ahead‑of‑time scheduling and automatic multi‑stream assignment, achieving up to 22.3× inference speedup over PyTorch and significantly improving GPU utilization for deep learning workloads.

Deep LearningGPU schedulingParallel Execution

0 likes · 9 min read

Nimble: A Lightweight Parallel GPU Scheduler Boosting Deep Learning Performance

Qingyun Technology Community

Nov 4, 2021 · Cloud Native

What’s New in KubeSphere 3.2.0? GPU Scheduling, Multi‑Cluster Management & More

KubeSphere 3.2.0, the latest cloud‑native distribution built on Kubernetes, introduces GPU resource scheduling and monitoring, enhanced observability with Grafana panels, multi‑cluster and multi‑tenant management, advanced storage features, a global gateway, OpenID Connect authentication, a dynamic application store, and a more independent DevOps suite, all aimed at improving user experience and operational efficiency.

Cloud NativeGPU schedulingKubernetes

0 likes · 12 min read

What’s New in KubeSphere 3.2.0? GPU Scheduling, Multi‑Cluster Management & More

58 Tech

Nov 20, 2020 · Artificial Intelligence

Evolution and Practice of the 58.com AI Algorithm Platform (WPAI)

The article details the development, architecture, and optimization of 58.com’s AI algorithm platform (WPAI), covering its background, overall design, large‑scale distributed machine learning, deep‑learning platform features, inference performance enhancements, GPU resource scheduling improvements, and future directions.

AI PlatformGPU schedulingInference Optimization

0 likes · 15 min read

Evolution and Practice of the 58.com AI Algorithm Platform (WPAI)

StarRing Big Data Open Lab

May 26, 2020 · Cloud Computing

How TCOS 2.0 Empowers Big Data, AI, and Cloud Workloads with Enhanced Compatibility

TCOS 2.0, the container operating system from Transwarp, expands compatibility to Windows, ARM, MIPS, and domestic platforms, adds GPU heterogeneous scheduling, HPA autoscaling, enhanced local storage management, and improved monitoring, providing a robust foundation for big data, AI, and cloud-native applications.

Big DataContainerGPU scheduling

0 likes · 11 min read

How TCOS 2.0 Empowers Big Data, AI, and Cloud Workloads with Enhanced Compatibility

360 Tech Engineering

Nov 30, 2018 · Operations

Deploying nvidia-docker2 for GPU Workloads on Large‑Scale Kubernetes Clusters

This article details the practical steps to install nvidia-docker2, configure Docker’s runtime, enable GPU support via Kubernetes device plugins, and verify GPU scheduling on a large Kubernetes cluster, providing code snippets and best‑practice recommendations for production environments.

Device PluginDockerGPU

0 likes · 8 min read

Deploying nvidia-docker2 for GPU Workloads on Large‑Scale Kubernetes Clusters

21CTO

Sep 17, 2017 · Artificial Intelligence

Scaling JD’s AI Platform: 5K+ Containers, GPU Management, and Multi‑Tenant Kubernetes

Since September 2016, JD’s AI foundation platform has leveraged Docker and Kubernetes to build a scalable machine‑learning infrastructure that now runs over 5,000 container instances, supports more than 20 AI services, and provides unified GPU, storage, networking, and multi‑tenant capabilities for both inference and training workloads.

AI PlatformGPU schedulingKubernetes

0 likes · 14 min read

Scaling JD’s AI Platform: 5K+ Containers, GPU Management, and Multi‑Tenant Kubernetes

360 Zhihui Cloud Developer

Sep 14, 2017 · Artificial Intelligence

Running TensorFlow on Kubernetes: A Practical Guide to Scalable AI Workloads

This article explains how to deploy TensorFlow on Kubernetes, addressing resource isolation, GPU scheduling, and distributed training challenges by introducing a custom TensorFlow‑on‑K8s system with client, task, and autospec modules, plus container design for reliable job execution.

AI deploymentGPU schedulingKubernetes

0 likes · 9 min read

Running TensorFlow on Kubernetes: A Practical Guide to Scalable AI Workloads