How Ctrip Streamlined ML Model Development and Deployment with MLOps
This article explains how Ctrip tackled the long, costly ML model development‑to‑deployment pipeline by adopting and extending MLflow for full lifecycle management, covering model persistence, tracking, serving, custom pyfunc models, Dockerized deployment, scaling, and performance monitoring.
Background and Challenges
Machine‑learning model development and online prediction have become widespread, but the end‑to‑end pipeline is often long, with high iteration cost. Traditional hand‑off using PMML forces both algorithm analysts and system developers to duplicate data‑pre‑processing and post‑processing code, increasing translation effort and slowing model rollout.
Full Lifecycle Management with MLflow
MLflow, an open‑source project from Databricks, offers a unified platform for tracking experiments, registering models, and serving them. Ctrip adapted MLflow to manage the entire lifecycle of its ticket‑price prediction models.
Key Components
Tracking Server – the central hub that stores model artifacts and metadata.
Artifacts are saved to an artifacts server (supporting HDFS, S3, FTP, etc.).
Metadata is persisted in a backend store such as MySQL, SQL Server, or PostgreSQL.
Basic Model Persistence Example
Using scikit‑learn, a simple train‑and‑save workflow looks like:
from sklearn import svm, datasets
clf = svm.SVC()
X, y = datasets.load_iris(return_X_y=True)
clf.fit(X, y)
import pickle
model_filename = 'finalized_model.sav'
pickle.dump(clf, open(model_filename, 'wb'))Loading the model for inference:
model_filename = 'finalized_model.sav'
clf2 = pickle.load(open(model_filename, 'rb'))
clf2.predict(X[0:1])MLflow Tracking Example
The following script demonstrates a typical MLflow tracking run for an ElasticNet regression on the wine‑quality dataset. It logs parameters, metrics, and the model to the tracking server.
import os, warnings, sys
import pandas as pd, numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow, mlflow.sklearn
import logging
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)
def eval_metrics(actual, pred):
rmse = np.sqrt(mean_squared_error(actual, pred))
mae = mean_absolute_error(actual, pred)
r2 = r2_score(actual, pred)
return rmse, mae, r2
if __name__ == "__main__":
warnings.filterwarnings("ignore")
np.random.seed(40)
csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data = pd.read_csv(csv_url, sep=";")
train, test = train_test_split(data)
train_x = train.drop(["quality"], axis=1)
test_x = test.drop(["quality"], axis=1)
train_y = train[["quality"]]
test_y = test[["quality"]]
alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
with mlflow.start_run():
lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
lr.fit(train_x, train_y)
predicted_qualities = lr.predict(test_x)
rmse, mae, r2 = eval_metrics(test_y, predicted_qualities)
print(f"Elasticnet model (alpha={alpha}, l1_ratio={l1_ratio}):")
print(f" RMSE: {rmse}")
print(f" MAE: {mae}")
print(f" R2: {r2}")
mlflow.log_param("alpha", alpha)
mlflow.log_param("l1_ratio", l1_ratio)
mlflow.log_metric("rmse", rmse)
mlflow.log_metric("r2", r2)
mlflow.log_metric("mae", mae)
tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme
if tracking_url_type_store != "file":
mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
else:
mlflow.sklearn.log_model(lr, "model")Model Serving
MLflow’s built‑in mlflow models serve command can launch a REST API for a saved model. The command requires the model artifact path and a port, e.g.:
mlflow models serve -m /path/to/artifacts/model -p 1234The serving container must have the same Python dependencies as the training environment, which are declared in a conda.yaml file.
Custom PyFunc Model
When a model combines multiple libraries (e.g., scikit‑learn and Keras), a custom pyfunc class can encapsulate preprocessing, prediction, and post‑processing logic.
class MyModel(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# load artifacts such as trained models
pass
def predict(self, context, model_input):
# custom preprocessing
# call underlying model
# custom post‑processing
return my_predict(model_input.values)Saving and loading the custom model:
mlflow.pyfunc.save_model(path=mlflow_pyfunc_model_path,
python_model=MyModel(),
artifacts=artifacts)
loaded_model = mlflow.pyfunc.load_model(mlflow_pyfunc_model_path)Unified Deployment
Ctrip built an “Easy Model One‑Stop Service” (EMOSS) on top of the open‑source MLflow community edition. The deployment stack includes:
Dockerized Model RestAPI Server built with FastAPI and served by uvicorn.
SOA Server that routes client requests to the appropriate Model RestAPI based on model name.
Horizontal scaling via a 7‑layer proxy (SLB) that can be expanded in minutes.
Performance monitoring is achieved by instrumenting both the SOA Server and the RestAPI Server. Metrics flow into Kafka, are synchronized to PostgreSQL via Flink, and stored in TimescaleDB for time‑series analysis.
Key architecture diagrams (included as images) illustrate the tracking server, artifact storage, and the end‑to‑end service flow.
Conclusion
Ctrip’s experience shows that adopting MLflow dramatically shortens the model development‑to‑deployment cycle, provides reproducible experiment tracking, and enables automated serving. While the current implementation covers core MLflow features, future work includes handling highly complex models that require distributed inference and exploring service‑mesh based scaling.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
