How Ctrip Streamlined ML Model Development and Deployment with MLOps

This article explains how Ctrip tackled the long, costly ML model development‑to‑deployment pipeline by adopting and extending MLflow for full lifecycle management, covering model persistence, tracking, serving, custom pyfunc models, Dockerized deployment, scaling, and performance monitoring.

dbaplus Community
dbaplus Community
dbaplus Community
How Ctrip Streamlined ML Model Development and Deployment with MLOps

Background and Challenges

Machine‑learning model development and online prediction have become widespread, but the end‑to‑end pipeline is often long, with high iteration cost. Traditional hand‑off using PMML forces both algorithm analysts and system developers to duplicate data‑pre‑processing and post‑processing code, increasing translation effort and slowing model rollout.

Full Lifecycle Management with MLflow

MLflow, an open‑source project from Databricks, offers a unified platform for tracking experiments, registering models, and serving them. Ctrip adapted MLflow to manage the entire lifecycle of its ticket‑price prediction models.

Key Components

Tracking Server – the central hub that stores model artifacts and metadata.

Artifacts are saved to an artifacts server (supporting HDFS, S3, FTP, etc.).

Metadata is persisted in a backend store such as MySQL, SQL Server, or PostgreSQL.

Basic Model Persistence Example

Using scikit‑learn, a simple train‑and‑save workflow looks like:

from sklearn import svm, datasets
clf = svm.SVC()
X, y = datasets.load_iris(return_X_y=True)
clf.fit(X, y)
import pickle
model_filename = 'finalized_model.sav'
pickle.dump(clf, open(model_filename, 'wb'))

Loading the model for inference:

model_filename = 'finalized_model.sav'
clf2 = pickle.load(open(model_filename, 'rb'))
clf2.predict(X[0:1])

MLflow Tracking Example

The following script demonstrates a typical MLflow tracking run for an ElasticNet regression on the wine‑quality dataset. It logs parameters, metrics, and the model to the tracking server.

import os, warnings, sys
import pandas as pd, numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet
from urllib.parse import urlparse
import mlflow, mlflow.sklearn
import logging
logging.basicConfig(level=logging.WARN)
logger = logging.getLogger(__name__)

def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2

if __name__ == "__main__":
    warnings.filterwarnings("ignore")
    np.random.seed(40)
    csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    data = pd.read_csv(csv_url, sep=";")
    train, test = train_test_split(data)
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]
    alpha = float(sys.argv[1]) if len(sys.argv) > 1 else 0.5
    l1_ratio = float(sys.argv[2]) if len(sys.argv) > 2 else 0.5
    with mlflow.start_run():
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)
        predicted_qualities = lr.predict(test_x)
        rmse, mae, r2 = eval_metrics(test_y, predicted_qualities)
        print(f"Elasticnet model (alpha={alpha}, l1_ratio={l1_ratio}):")
        print(f"  RMSE: {rmse}")
        print(f"  MAE: {mae}")
        print(f"  R2: {r2}")
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        tracking_url_type_store = urlparse(mlflow.get_tracking_uri()).scheme
        if tracking_url_type_store != "file":
            mlflow.sklearn.log_model(lr, "model", registered_model_name="ElasticnetWineModel")
        else:
            mlflow.sklearn.log_model(lr, "model")

Model Serving

MLflow’s built‑in mlflow models serve command can launch a REST API for a saved model. The command requires the model artifact path and a port, e.g.:

mlflow models serve -m /path/to/artifacts/model -p 1234

The serving container must have the same Python dependencies as the training environment, which are declared in a conda.yaml file.

Custom PyFunc Model

When a model combines multiple libraries (e.g., scikit‑learn and Keras), a custom pyfunc class can encapsulate preprocessing, prediction, and post‑processing logic.

class MyModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        # load artifacts such as trained models
        pass
    def predict(self, context, model_input):
        # custom preprocessing
        # call underlying model
        # custom post‑processing
        return my_predict(model_input.values)

Saving and loading the custom model:

mlflow.pyfunc.save_model(path=mlflow_pyfunc_model_path,
                        python_model=MyModel(),
                        artifacts=artifacts)
loaded_model = mlflow.pyfunc.load_model(mlflow_pyfunc_model_path)

Unified Deployment

Ctrip built an “Easy Model One‑Stop Service” (EMOSS) on top of the open‑source MLflow community edition. The deployment stack includes:

Dockerized Model RestAPI Server built with FastAPI and served by uvicorn.

SOA Server that routes client requests to the appropriate Model RestAPI based on model name.

Horizontal scaling via a 7‑layer proxy (SLB) that can be expanded in minutes.

Performance monitoring is achieved by instrumenting both the SOA Server and the RestAPI Server. Metrics flow into Kafka, are synchronized to PostgreSQL via Flink, and stored in TimescaleDB for time‑series analysis.

Key architecture diagrams (included as images) illustrate the tracking server, artifact storage, and the end‑to‑end service flow.

Conclusion

Ctrip’s experience shows that adopting MLflow dramatically shortens the model development‑to‑deployment cycle, provides reproducible experiment tracking, and enables automated serving. While the current implementation covers core MLflow features, future work includes handling highly complex models that require distributed inference and exploring service‑mesh based scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerPythonMLOpsFastAPImlflowModel Lifecycle
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.