MLOps Practices on the Beike Inference Platform: Architecture, Evolution, and Future Plans
This article presents a comprehensive overview of Beike's machine learning platform and its inference service, detailing the platform's architecture, GPU virtualization, cloud‑native migration, MLOps implementation, and future roadmap to achieve cost‑effective, automated AI model deployment at scale.
Introduction
The rapid adoption of AI across enterprises has created a need for efficient GPU utilization and end‑to‑end toolchains that support model development, training, and deployment. MLOps, an extension of DevOps for machine learning, addresses these challenges by covering the full lifecycle of AI applications.
Beike Machine Learning Platform Overview
Since the second half of 2019, Beike has built a systematic AI platform aimed at agile, one‑stop AI application construction. The platform follows a maturity model similar to autonomous driving, progressing from partial automation (L2) to full‑process automation (L4). Its core capabilities are organized into three layers:
Data Platform (Accurate Computing)
Training Platform (Fast Computing)
Inference Platform (Cost‑Effective Computing)
The inference platform currently hosts nearly 2,000 instances serving OCR, face recognition, ASR, NLP, and recommendation models, with a resource pool of about 250 GPU cards (mainly T4) and 10,000 CPU cores, reflecting a >50% growth since 2021.
Platform Architecture
The architecture consists of four layers:
Physical Infrastructure Layer : heterogeneous compute (GPU, CPU), RDMA/RoCE networking, high‑performance storage.
Resource Scheduling Layer : full migration to Kubernetes for container‑based resource management.
Deep‑Learning Business Support Layer : parameter servers and elastic inference services with high availability.
Business Scenario Layer : supports training, offline batch prediction, and online inference workloads.
Inference Platform Technical Evolution
Three major focus areas have driven the evolution:
Cost Reduction (TCO) : GPU virtualization and multi‑task scheduling (e.g., MIG, gpushare, qGPU) to improve utilization; model optimization using OpenVINO, TensorRT, or proprietary pipelines; adoption of inference‑specific accelerators (Intel Xeon, Baidu Kunlun, Huawei Ascend) to lower GPU dependence.
Cloud‑Native Migration : Transition from Hadoop to Kubernetes in 2020, introducing Docker for environment standardization, K8s for high‑availability orchestration, and Istio for service mesh, traffic management, and tracing.
CI/CD Workflow : Development of client tools that integrate with Beike Cloud, automate domain registration, log collection, and monitoring, enabling continuous integration and deployment of inference services.
MLOps Understanding and Practice at Beike
MLOps extends DevOps by handling model decay, continuous training, and automated deployment. Beike’s MLOps implementation includes:
Model repository with version and permission management.
QA‑driven evaluation pipeline that blocks deployment of non‑compliant models.
Automated CI/CD pipelines for model packaging, testing, and rollout.
Monitoring that triggers retraining when performance degrades.
Future Planning for the Inference Service
The roadmap focuses on four pillars:
CI/CD : Enhance pre‑ and post‑processing DAGs and add new features.
Micro‑service Governance : Refine traffic control and routing mechanisms.
Model Optimization : Continuous cost‑aware model improvements.
Model Monitoring : Strengthen post‑deployment monitoring to detect model drift and trigger the MLOps loop.
Q&A Session
Key questions covered GPU virtualization, resource isolation, model engine choices (TensorFlow Serving, Triton), lack of a complete open‑source MLOps framework, reasons for moving from Hadoop to K8s, and the decision not to adopt Kubeflow due to resource overhead.
Conclusion
The Beike inference platform demonstrates a cost‑effective, cloud‑native MLOps implementation that balances rapid AI model deployment with operational efficiency, while acknowledging ongoing challenges such as workflow standardization and automation depth.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.