Mastering Industrial Machine Learning: From Problem Modeling to Model Optimization
This article outlines a complete industrial machine‑learning workflow—starting with problem modeling, through data preparation, feature extraction, model training, and ending with model optimization—illustrated with a real‑world DEAL revenue‑prediction case and practical tips for handling data, features, and model selection.
Machine learning is a hot field in both academia and industry, with academia focusing on theory and industry on solving real problems. This series shares Meituan's practical experience, covering the end‑to‑end workflow.
Machine Learning Overview
Machine learning is defined as a scientific discipline that constructs and studies algorithms that can learn from data. In industry, supervised learning is most common. The offline training pipeline includes data cleaning, feature extraction, model training, and model optimization, while the online inference pipeline applies the trained model to new data.
What is a model?
A model maps the feature space to the output space and is typically represented by a hypothesis function and parameters w. Common industrial models include Logistic Regression (LR), Gradient Boosting Decision Tree (GBDT), Support Vector Machine (SVM) and Deep Neural Network (DNN).
Why use machine learning?
Massive data volumes make simple rule‑based processing insufficient.
Cheap high‑performance computing reduces learning cost.
Cheap large‑scale storage enables efficient data handling.
High‑value problems yield substantial returns when solved with machine learning.
Problem Modeling
Example: estimating the transaction amount of a DEAL (group‑buying order). Steps: collect problem information, become an expert, decompose the problem into machine‑predictable sub‑problems.
Two modeling options: a single model predicting total amount, or multiple models (e.g., a user‑count model and a visit‑to‑purchase‑rate model) whose predictions are combined.
Table comparing single‑model and multi‑model approaches:
Mode
Drawbacks
Advantages
Single model
1. High estimation difficulty
2. Higher risk
1. Potentially optimal estimation
2. Solves the problem in one step
Multiple models
1. Accumulated error
2. Higher training and serving cost
1. Easier to achieve accurate sub‑model predictions
2. Flexible fusion for best effect
Model selection considerations include data size, feature level (high‑level vs low‑level), industry adoption, tool maturity, and personal familiarity.
Preparing Training Data
Key points: keep data distribution consistent with the online environment, minimize label noise, avoid unnecessary sampling, handle distribution shifts, and ensure coverage of different DEAL types.
Common issues and solutions include inconsistent data distribution, temporal distribution changes, noisy labels, and biased sampling.
Specific data collection rules for the visit‑rate problem:
Collect N months of DEAL data and corresponding visit‑rate.
Exclude holidays and irregular periods.
Only keep DEALs with online duration > T and visitor count > U.
Consider DEAL lifecycle and regional differences.
Feature Extraction
After data cleaning, transform raw data into features, converting the input space to a feature space. Linear models require extensive feature engineering, while non‑linear models are more tolerant.
Features are divided into high‑level (generic) and low‑level (specific). Example: POI is a low‑level feature, per‑capita consumption is a high‑level feature.
Feature Normalization
Rescaling to [0,1] or [-1,1].
Standardization using mean and standard deviation.
Scaling to unit length.
Feature Selection
Filter methods (e.g., chi‑square, information gain).
Wrapper methods (evaluate subsets with a model).
Embedded methods (e.g., L1/L2 regularization).
Training Model
Example using Logistic Regression (LR). Given m training samples (x, y), the goal is to learn parameters w that minimize a loss function, typically the negative log‑likelihood.
Model Function
1) Hypothesis function: assume a functional relationship between x and y.Loss Function
2) Loss function: maximize the likelihood of (x, y) under the LR model.Optimization Algorithms
Gradient Descent (stochastic and batch variants).
Newton's Method and quasi‑Newton methods (BFGS, L‑BFGS, OWLQN).
Coordinate Descent.
Model Optimization
After training, evaluate whether the model meets expectations. First verify target feasibility and data/feature quality, then diagnose underfitting or overfitting using train and test performance.
Underfitting vs Overfitting
Underfitting occurs when the model cannot capture underlying patterns (model hypothesis space too small). Overfitting occurs when the model captures noise (hypothesis space too large).
Diagnostic table and remedy strategies:
Problem
Data
Feature
Model
Underfitting
Clean data
Increase or denoise features
Reduce regularization, use more complex model, ensemble
Overfitting
Increase data
Select or reduce features, dimensionality reduction
Increase regularization, fewer training epochs, use simpler model
Summary
Understand business goals and decompose them into predictable modeling steps.
Ensure high‑quality, representative training data with minimal label noise.
Leverage domain knowledge for comprehensive feature engineering and appropriate feature selection.
Select models that match the data and business objectives, and iteratively refine them based on diagnostics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
