Four Essential Elements for Advancing Machine Learning Projects: Model, Data, Features, and Business

Advancing a machine‑learning project requires focusing first on the core business problem, then designing comprehensive features, ensuring high‑quality data, and finally selecting an appropriate model, because business impact drives success while features and data set the performance ceiling and model choice balances accuracy with interpretability.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Four Essential Elements for Advancing Machine Learning Projects: Model, Data, Features, and Business

This article is compiled from the sharing of Meituan-Dianping algorithm engineer Hu Hao on April 22 at the "Ctrip Technology Salon". The text integrates the version from the "Ctrip Technology Center" WeChat public account and Hu Hao's own Weibo posts, with minor editorial processing.

Figure 1. Knowledge graph of a machine learning engineer

The diagram lists the points a successful ML engineer should focus on and accumulate: classification, regression, unsupervised models, Kaggle feature‑engineering tricks, handling class imbalance, missing‑value imputation, etc. These can be grouped into two major categories – models and features. Mastering both gives you a "green card" that lets you cross the entry threshold for data‑driven industries.

At this stage, a critical step is efficient technical monetization ability – the capability to solve core business problems. This article describes the four elements of model optimization: model, data, features, and business, and discusses their relative priority in a project.

Four Elements of Model Project Advancement

During project execution, the priority order is roughly: Business > Features > Data > Model.

Figure 2. Problem‑solving hierarchy and priority of the four elements

Business

A good technical selection, complete feature system, and high‑quality data are valuable, but the ultimate determinant of a project's success is whether its technical goal addresses the core business problem.

Business problems involve two aspects: KPI and deadline. For example, reducing the risk of phone‑loss‑related Alipay fraud within two weeks. A solution that merely refines a core feature (e.g., password change logic) is unlikely to meet the deadline, whereas a quick model or rule‑based patch that addresses the most common fraud patterns can be effective even if its accuracy is modest.

Often, business partners do not know how to communicate their goals effectively. Two illustrative questions were provided:

How to define and prioritize risk scores for offline stores in an anti‑fraud model?

How to predict delivery time for a region in the next 10 minutes under adverse weather, and whether training only on bad‑weather data suffices?

To keep a project on track, clarify three items at the start:

The core business problem and key scenarios.

Success metrics and evaluation criteria.

The key information the project will deliver to the business and how it will be used.

Throughout the project, continuously revisit the business side to monitor health.

Data and features set the performance ceiling of a model – "garbage in, garbage out".

A common confusion is whether data and features are the same. Data are the collected raw information; features are engineered representations optimized for models (e.g., converting unstructured text into embeddings via word2vec).

Feature engineering is a meticulous, deliberate process built on data, ranging from traditional transformations and interactions to embeddings, word2vec, and high‑dimensional categorical encoding.

Two practical tips for comprehensive feature design:

Leverage existing base data.

Construct a business "2‑dimensional map" that abstracts the workflow into core dimensions.

Examples of dimension maps:

Food‑delivery ETA: (a) delivery stages, (b) granularity (order, merchant, region), (c) delivery type (crowdsourced, self‑operated).

Anti‑fraud variables: (a) fraud stages (login, registration, transfer, etc.), (b) fraud mediums (account, device, IP, WiFi, bank card).

Figure 3. Variable system and development workflow

By mapping these dimensions, you can spot missing features, as shown in Figure 4.

Figure 4. Sparse features in account‑transfer and red‑packet scenarios; missing WiFi medium and event data.

Deep learning excels in image, speech, translation, and autonomous driving because data collection is mature. In bio‑informatics and genomics, data acquisition is costly and noisy, limiting AI impact.

Model

Figure 5. Technical selection and feature‑engineering roadmap for a full‑room booking system

Model selection often balances accuracy with business interpretability. For tasks requiring strong explainability (e.g., pricing, anti‑fraud), statistical learning models are preferable. A typical performance ordering is: Glmnet > LASSO >= Ridge > Linear/Logistic Regression.

Glmnet, from Stanford, provides an efficient solution for regularized linear models and is available in R, Python, and Spark.

For more complex models, the usual hierarchy is: Random Forest ≤ GBDT ≤ XGBoost. In 29 Kaggle winner solutions, 17 used Boosting frameworks, followed by DNNs; Random Forests were rare.

RF and GBDT stem from CART (Breiman & Friedman, 1984). Ensemble methods split into two schools: Bagging/Stacking (independent trees) and Boosting (sequential refinement). GBDT is generally simpler, uses less memory, and XGBoost further improves speed and model size.

Author Bio

Hu Hao, algorithm engineer at Meituan‑Dianping, Columbia University graduate, previously worked on algorithms at Ctrip and Alipay. Expertise includes risk control, genomics, travel, and real‑time logistics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Model OptimizationData Sciencebusiness alignment
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.