Artificial Intelligence 7 min read

Machine Learning Q&A: Data Imputation, Feature Selection, Recommendation Systems and More

The article answers ten machine‑learning questions, explaining how to impute missing behavior data, extract and select features, describe Meituan‑Dianping’s recommendation pipeline, suggest a beginner learning path, clarify L1 sparsity, recommend TextCNN for text, discuss search‑ranking sample bias, label generation for wide‑deep models, the shift to deep‑learning video detection, and the use of factorization machines for CTR with open‑source examples.

Meituan Technology Team

Oct 12, 2017

Machine Learning Q&A: Data Imputation, Feature Selection, Recommendation Systems and More

Q1: How can missing user behavior data be inferred from relatively complete data?

If the missing behavior data are numeric, build a predictive model using the existing data, train it, and fill the gaps with predictions. For non‑numeric actions (e.g., event logs), analyze the distribution of existing data and fill missing parts with random functions plus rule‑based logic.

Q2: How to perform feature extraction and remove useless features? Which recommendation algorithms does Meituan‑Dianping use?

1. Feature extraction requires understanding business data and logic; transform raw features when needed. Feature selection can use tree models, L1 regularization, etc., with tools like XGBoost. 2. Meituan‑Dianping’s recommendation system uses a recall stage that combines collaborative filtering, location, search queries, real‑time user behavior, etc., and a ranking stage based on Learning‑to‑Rank techniques.

Q3: What is the optimal learning path for beginners in machine learning? Any recommended book list?

Start with Li Hang’s “Statistical Learning Methods” for fundamentals, complemented by Coursera’s “Machine Learning” video course. For video learners, consider Lin Xuantian’s two video series from National Taiwan University. Then practice with simple competition problems (e.g., basic click‑through rate prediction) using tools such as pandas and scikit‑learn.

Q4: How is feature selection usually performed for machine learning models?

Two main categories: (1) Use fixed evaluation metrics like information gain from an ID3 decision tree to rank features; selected features are then fed into the model. (2) Use model feedback: iteratively add features that most improve prediction performance until gains plateau, or iteratively remove features. The first method is fast but lacks feedback; the second yields better results but is slower.

Q5: What is the mathematical principle behind L1 regularization yielding sparse solutions?

From the perspective of gradient descent, the gradient of the L1 norm is constant, so the penalty is independent of the parameter’s location, making it easier for the optimizer to drive some coefficients to zero.

Q6: Are there text classification algorithms that significantly outperform the traditional tf‑idf/word2vec + linear‑SVM/Bayes pipeline?

Strongly recommend TextCNN; if the training data for embeddings are insufficient, word2vec can be used as a substitute, still delivering good performance.

Q7: How should the training sample set for search ranking be selected? Does using only exposed items bias the model?

Training typically uses exposed items, but feature extraction is generalized enough that unexposed results can still be learned effectively. The recall layer aims to reduce the computational load of the ranking stage while preserving user‑relevant results.

Q8: In wide & deep recommendation models, are training labels manually annotated or generated automatically?

Labels for wide & deep models are automatically generated from user behavior data.

Q9: Previously, video target detection used the VIBE algorithm. Are deep‑learning methods now dominant?

Deep‑learning methods now dominate video object detection; the top five entries of the ILSVRC 2016 VID competition all used deep learning. VIBE offers good real‑time performance but lags behind deep‑learning approaches in accuracy.

Q10: Has anyone used Factorization Machines (FM) for click‑through rate prediction? Any open‑source engineering code available?

Various CTR competitions on Kaggle (e.g., Criteo Display Advertising Challenge, Avazu CTR Prediction) provide public code that can be referenced for FM implementations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning deep learning Recommendation Systems feature selection Text Classification data imputation L1 Regularization

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.