Mastering Industrial Machine Learning: From Problem Modeling to Model Optimization

This article outlines a complete industrial machine‑learning workflow—starting with problem modeling, through data preparation, feature extraction, model training, and ending with model optimization—illustrated with a real‑world DEAL revenue‑prediction case and practical tips for handling data, features, and model selection.

21CTO
21CTO
21CTO
Mastering Industrial Machine Learning: From Problem Modeling to Model Optimization

Machine learning is a hot field in both academia and industry, with academia focusing on theory and industry on solving real problems. This series shares Meituan's practical experience, covering the end‑to‑end workflow.

Machine Learning Overview

Machine learning is defined as a scientific discipline that constructs and studies algorithms that can learn from data. In industry, supervised learning is most common. The offline training pipeline includes data cleaning, feature extraction, model training, and model optimization, while the online inference pipeline applies the trained model to new data.

What is a model?

A model maps the feature space to the output space and is typically represented by a hypothesis function and parameters w. Common industrial models include Logistic Regression (LR), Gradient Boosting Decision Tree (GBDT), Support Vector Machine (SVM) and Deep Neural Network (DNN).

Why use machine learning?

Massive data volumes make simple rule‑based processing insufficient.

Cheap high‑performance computing reduces learning cost.

Cheap large‑scale storage enables efficient data handling.

High‑value problems yield substantial returns when solved with machine learning.

Problem Modeling

Example: estimating the transaction amount of a DEAL (group‑buying order). Steps: collect problem information, become an expert, decompose the problem into machine‑predictable sub‑problems.

Two modeling options: a single model predicting total amount, or multiple models (e.g., a user‑count model and a visit‑to‑purchase‑rate model) whose predictions are combined.

Table comparing single‑model and multi‑model approaches:

Mode

Drawbacks

Advantages

Single model

1. High estimation difficulty

2. Higher risk

1. Potentially optimal estimation

2. Solves the problem in one step

Multiple models

1. Accumulated error

2. Higher training and serving cost

1. Easier to achieve accurate sub‑model predictions

2. Flexible fusion for best effect

Model selection considerations include data size, feature level (high‑level vs low‑level), industry adoption, tool maturity, and personal familiarity.

Model selection diagram
Model selection diagram

Preparing Training Data

Key points: keep data distribution consistent with the online environment, minimize label noise, avoid unnecessary sampling, handle distribution shifts, and ensure coverage of different DEAL types.

Common issues and solutions include inconsistent data distribution, temporal distribution changes, noisy labels, and biased sampling.

Specific data collection rules for the visit‑rate problem:

Collect N months of DEAL data and corresponding visit‑rate.

Exclude holidays and irregular periods.

Only keep DEALs with online duration > T and visitor count > U.

Consider DEAL lifecycle and regional differences.

Feature Extraction

After data cleaning, transform raw data into features, converting the input space to a feature space. Linear models require extensive feature engineering, while non‑linear models are more tolerant.

Features are divided into high‑level (generic) and low‑level (specific). Example: POI is a low‑level feature, per‑capita consumption is a high‑level feature.

Feature types diagram
Feature types diagram

Feature Normalization

Rescaling to [0,1] or [-1,1].

Standardization using mean and standard deviation.

Scaling to unit length.

Feature Selection

Filter methods (e.g., chi‑square, information gain).

Wrapper methods (evaluate subsets with a model).

Embedded methods (e.g., L1/L2 regularization).

Training Model

Example using Logistic Regression (LR). Given m training samples (x, y), the goal is to learn parameters w that minimize a loss function, typically the negative log‑likelihood.

Model Function

1) Hypothesis function: assume a functional relationship between x and y.

Loss Function

2) Loss function: maximize the likelihood of (x, y) under the LR model.

Optimization Algorithms

Gradient Descent (stochastic and batch variants).

Newton's Method and quasi‑Newton methods (BFGS, L‑BFGS, OWLQN).

Coordinate Descent.

Gradient descent illustration
Gradient descent illustration
Newton method illustration
Newton method illustration

Model Optimization

After training, evaluate whether the model meets expectations. First verify target feasibility and data/feature quality, then diagnose underfitting or overfitting using train and test performance.

Underfitting vs Overfitting

Underfitting occurs when the model cannot capture underlying patterns (model hypothesis space too small). Overfitting occurs when the model captures noise (hypothesis space too large).

Underfitting and overfitting diagram
Underfitting and overfitting diagram

Diagnostic table and remedy strategies:

Problem

Data

Feature

Model

Underfitting

Clean data

Increase or denoise features

Reduce regularization, use more complex model, ensemble

Overfitting

Increase data

Select or reduce features, dimensionality reduction

Increase regularization, fewer training epochs, use simpler model

Summary

Understand business goals and decompose them into predictable modeling steps.

Ensure high‑quality, representative training data with minimal label noise.

Leverage domain knowledge for comprehensive feature engineering and appropriate feature selection.

Select models that match the data and business objectives, and iteratively refine them based on diagnostics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningfeature engineeringmodel trainingdata preparationIndustrial Application
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.