Optimizing AI Platform Resource Efficiency: Scheduling Strategies for Deep Learning Inference and Training
The article outlines a technical exchange hosted by 58.com AI Lab and Tianjin University that discusses high‑efficiency AI computing, resource‑aware scheduling for both online inference and offline training, and methods to mitigate GPU under‑utilization and gray‑interference in distributed deep‑learning platforms.
Rapid advances in artificial intelligence have dramatically increased the demand for high‑efficiency intelligent computing systems; within 58.com, many services now rely on deep‑learning models, leading to typical characteristics such as peak‑valley online inference loads, low GPU utilization during off‑peak periods, and resource contention among offline training clusters.
On November 3, 58.com AI Lab together with Tianjin University’s School of Intelligent Computing co‑hosted a technical exchange focusing on two aspects: efficient cluster resource scheduling and fine‑grained mixed online‑offline workload management, aiming to improve deep‑learning inference services and training job performance.
Schedule Introduction
Topic Analysis & Audience Benefits
Offline Training Job Resource Scheduling Optimization New techniques: (1) priority‑based scheduling for offline training tasks; (2) resource‑usage prediction and adjustment for offline jobs. Audience will learn how to apply these strategies to boost offline training resource utilization.
High‑Throughput Distributed Training Cluster Scheduling Based on Task Predictability New techniques: (1) dynamic resource scheduling for predictable tasks; (2) unified priority scheduling for mixed workloads. Audience will understand task predictability definitions, heterogeneous resource scheduling strategies, and unified priority‑based scheduling policies.
Mixed Online Inference and Offline Training in Deep‑Learning Platform New techniques: (1) automatic elastic scaling for inference services; (2) dynamic resource scheduling for mixed online‑offline workloads. Audience will gain insight into elastic scaling solutions for model inference and the implementation of mixed deployment for offline jobs and online services.
Gray Interference Research and Application Mixing in Distributed Micro‑Service Scenarios New techniques: (1) spatio‑temporal encoding for service performance and interference prediction; (2) fine‑grained application mixing at the micro‑service component level. Audience will learn about the “gray interference” phenomenon in cloud services and how fine‑grained resource management can improve system efficiency.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.