Artificial Intelligence 13 min read

MLOps Practices on the Beike Inference Platform: Architecture, Evolution, and Future Plans

This article presents a comprehensive overview of Beike's machine learning platform and its inference service, detailing the platform's architecture, GPU virtualization, cloud‑native migration, MLOps implementation, and future roadmap to achieve cost‑effective, automated AI model deployment at scale.

DataFunSummit

Jun 30, 2022

MLOps Practices on the Beike Inference Platform: Architecture, Evolution, and Future Plans

Introduction

The rapid adoption of AI across enterprises has created a need for efficient GPU utilization and end‑to‑end toolchains that support model development, training, and deployment. MLOps, an extension of DevOps for machine learning, addresses these challenges by covering the full lifecycle of AI applications.

Beike Machine Learning Platform Overview

Since the second half of 2019, Beike has built a systematic AI platform aimed at agile, one‑stop AI application construction. The platform follows a maturity model similar to autonomous driving, progressing from partial automation (L2) to full‑process automation (L4). Its core capabilities are organized into three layers:

Data Platform (Accurate Computing)

Training Platform (Fast Computing)

Inference Platform (Cost‑Effective Computing)

The inference platform currently hosts nearly 2,000 instances serving OCR, face recognition, ASR, NLP, and recommendation models, with a resource pool of about 250 GPU cards (mainly T4) and 10,000 CPU cores, reflecting a >50% growth since 2021.

Platform Architecture

The architecture consists of four layers:

Physical Infrastructure Layer : heterogeneous compute (GPU, CPU), RDMA/RoCE networking, high‑performance storage.

Resource Scheduling Layer : full migration to Kubernetes for container‑based resource management.

Deep‑Learning Business Support Layer : parameter servers and elastic inference services with high availability.

Business Scenario Layer : supports training, offline batch prediction, and online inference workloads.

Inference Platform Technical Evolution

Three major focus areas have driven the evolution:

Cost Reduction (TCO) : GPU virtualization and multi‑task scheduling (e.g., MIG, gpushare, qGPU) to improve utilization; model optimization using OpenVINO, TensorRT, or proprietary pipelines; adoption of inference‑specific accelerators (Intel Xeon, Baidu Kunlun, Huawei Ascend) to lower GPU dependence.

Cloud‑Native Migration : Transition from Hadoop to Kubernetes in 2020, introducing Docker for environment standardization, K8s for high‑availability orchestration, and Istio for service mesh, traffic management, and tracing.

CI/CD Workflow : Development of client tools that integrate with Beike Cloud, automate domain registration, log collection, and monitoring, enabling continuous integration and deployment of inference services.

MLOps Understanding and Practice at Beike

MLOps extends DevOps by handling model decay, continuous training, and automated deployment. Beike’s MLOps implementation includes:

Model repository with version and permission management.

QA‑driven evaluation pipeline that blocks deployment of non‑compliant models.

Automated CI/CD pipelines for model packaging, testing, and rollout.

Monitoring that triggers retraining when performance degrades.

Future Planning for the Inference Service

The roadmap focuses on four pillars:

CI/CD : Enhance pre‑ and post‑processing DAGs and add new features.

Micro‑service Governance : Refine traffic control and routing mechanisms.

Model Optimization : Continuous cost‑aware model improvements.

Model Monitoring : Strengthen post‑deployment monitoring to detect model drift and trigger the MLOps loop.

Q&A Session

Key questions covered GPU virtualization, resource isolation, model engine choices (TensorFlow Serving, Triton), lack of a complete open‑source MLOps framework, reasons for moving from Hadoop to K8s, and the decision not to adopt Kubeflow due to resource overhead.

Conclusion

The Beike inference platform demonstrates a cost‑effective, cloud‑native MLOps implementation that balances rapid AI model deployment with operational efficiency, while acknowledging ongoing challenges such as workflow standardization and automation depth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native AI Kubernetes MLOps GPU virtualization Inference Platform

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.