Artificial Intelligence 8 min read

File Release Application Prediction Model Using GBDT

This article describes how a GBDT‑based prediction model was built to forecast file release application parameters such as volume ratio, target audience, and gray level, covering data collection, feature engineering, model training, service deployment, and practical considerations for handling bad cases.

360 Quality & Efficiency

Jan 17, 2020

File Release Application Prediction Model Using GBDT

The 360 Process Management System's file release module manages new and updated file deployments, but manual entry of volume ratios, targets, and gray levels often leads to inefficiencies. To address this, historical data and a Gradient Boosting Decision Tree (GBDT) model were employed to predict these parameters, reducing workload.

Dataset Construction: SQL queries extracted historical file release and adjustment requests. Noisy, unlabeled records were removed, resulting in a clean dataset for modeling.

Feature Selection and Processing: Relevant fields such as file name, release mode, current volume, ratio, V5 condition, and priority were retained, while irrelevant ones like unique release paths were discarded. Categorical features were encoded (e.g., unique IDs for file names) and numerical features were standardized, with units unified (e.g., gray level converted to a single unit).

GBDT Algorithm: Separate models were trained for new releases and adjustments due to differing feature spaces. For each release type, three sub‑models predict target audience, volume ratio, and gray level respectively, with the ratio model also using the predicted audience as input. In total, six models were built.

def train_and_test_online_datas_by_GBDT(datas, labels):
    train_datas, test_datas, train_targets, test_targets, \
    train_rates, test_rates, train_grays, test_grays = __split_train_test(datas, labels)
    print test_datas[1], test_targets[1], test_rates[1],  test_grays[1]
    print "**************target***********************"
    __train_datas = train_datas
    __test_datas = test_datas
    target_clf = GradientBoostingClassifier()
    target_clf.fit(__train_datas, train_targets)
    print target_clf.score(__train_datas, train_targets)
    print target_clf.score(__test_datas, test_targets)
    print "**************rate***********************"
    train_datas = np.insert(train_datas, 1, values=train_targets, axis=1)
    test_datas = np.insert(test_datas, 1, values=test_targets, axis=1)
    __train_datas = train_datas
    __test_datas = test_datas
    rate_clf = GradientBoostingClassifier()
    rate_clf.fit(__train_datas, train_rates)
    print rate_clf.score(__train_datas, train_rates)
    print rate_clf.score(__test_datas, test_rates)
    print "**************gray***********************"
    train_datas = np.insert(train_datas, 2, values=train_rates, axis=1)
    test_datas = np.insert(test_datas, 2, values=test_rates, axis=1)
    __train_datas = train_datas
    __test_datas = test_datas
    gray_clf = GradientBoostingClassifier()
    gray_clf.fit(__train_datas, train_grays)
    print gray_clf.score(__train_datas, train_grays)
    print gray_clf.score(__test_datas, test_grays)

Prediction Service Deployment: The trained models were wrapped into a Tornado‑based web service, exposing APIs for the existing workflow system. Requests are cached to reduce latency, and the model is retrained weekly with new data to adapt to evolving release patterns.

Summary & Reflections: In production, occasional "bad cases" arise where predictions conflict with business logic (e.g., predicting a "formal" release instead of "full‑network"). Post‑processing rules and incremental data labeling are used to correct such errors, allowing the model to improve over time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT machine learning Data preprocessing file release prediction model service deployment

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.