Artificial Intelligence 16 min read

Practical Guide to Machine Learning at Meituan: From Problem Modeling to Model Optimization

This guide walks through Meituan’s end‑to‑end offline ML workflow—from problem modeling and data preparation, through feature engineering and normalization, to model selection, training optimization, evaluation, and iterative improvement—emphasizing business alignment, data quality, and practical diagnostics for real‑world deployment.

Meituan Technology Team

Feb 10, 2015

Practical Guide to Machine Learning at Meituan: From Problem Modeling to Model Optimization

This article presents an end‑to‑end practical guide on applying machine learning to solve real‑world problems at Meituan, focusing on the offline training pipeline.

It begins with a brief overview of machine learning, defining it as the discipline that builds algorithms capable of learning from data, and distinguishes supervised from unsupervised learning, emphasizing that supervised methods dominate industrial applications.

The first technical step is problem modeling. Using the example of estimating the revenue of a DEAL (group‑buying order), the article explains how to collect domain knowledge, decompose the business goal into machine‑learnable sub‑tasks, and decide between a single‑model or multi‑model approach.

Data preparation follows. The guide stresses the importance of consistent data distribution between training, testing, and online environments, low label noise, and careful sampling only when necessary.

Feature extraction is then discussed. High‑level (generic) and low‑level (specific) features are introduced, with examples of POI and per‑capita consumption. The article notes that linear models require richer feature engineering, while non‑linear models can rely more on high‑level features.

Feature normalization techniques such as rescaling, standardization, and scaling to unit length are described, along with feature selection methods (filter, wrapper, embedded) to avoid over‑parameterization.

Model selection considerations include alignment with business objectives, compatibility with data and features, and the prevalence of the model in industry. Typical choices mentioned are Logistic Regression, GBDT, Random Forest, Linear SVM, and Deep Neural Networks.

The training section details the formulation of a hypothesis function and loss function (e.g., maximum likelihood for logistic regression) and introduces several optimization algorithms: gradient descent (stochastic and batch), Newton’s method, quasi‑Newton methods (BFGS, L‑BFGS, OWL‑QN), and coordinate descent.

After training, the guide addresses model evaluation and common pitfalls such as under‑fitting and over‑fitting, providing a diagnostic table and practical remedies.

Finally, the article summarizes key take‑aways: understand the business, ensure high‑quality data, perform thoughtful feature engineering, choose models that match the task, and iteratively diagnose and improve the model pipeline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

feature engineering Model Training optimization algorithms Industrial Application

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.