Artificial Intelligence 23 min read

Design and Implementation of a Distributed Causal Forest Framework on Meituan's Fulfillment Platform

Meituan’s Fulfillment Platform team built a high‑performance distributed causal‑forest framework—named Causal On Spark—that trains hundreds of trees on hundreds of millions of samples within minutes using MapReduce‑based histogram splitting, extensive memory optimizations, Parquet model serving, and novel distributed evaluation metrics, enabling scalable causal inference for pricing, subsidies, and marketing.

Meituan Technology Team

Jan 25, 2024

Design and Implementation of a Distributed Causal Forest Framework on Meituan's Fulfillment Platform

Meituan's Fulfillment Platform technology team has built a series of distributed tools for causal inference. This article introduces the implementation of a distributed causal tree algorithm, discusses the design of the framework, and addresses shortcomings of existing qini_curve/qini_score metrics.

Business background : Causal inference is increasingly used in pricing, subsidies, and marketing to provide counterfactual predictions. Tree‑based causal models, especially causal forests, offer strong interpretability and are easier to tune than meta‑learners or deep representation methods.

Need for a distributed solution : Existing open‑source projects (EconML, DoWhy, CausalML, grf‑lab) are single‑machine only and cannot handle billions of samples required in industrial scenarios. Meituan therefore created a high‑performance distributed causal forest framework that can train 100 trees on a hundred‑million‑sample dataset within half an hour.

Framework architecture : The system consists of four modules – (1) training entry and parameter abstraction, (2) sample conversion (histogram construction and feature discretization), (3) forest growth implemented with MapReduce, and (4) model persistence and serving. The design abstracts loss functions so new causal forest algorithms can be added by only implementing tree‑growth logic.

Technical choices : After evaluating pre‑sorted vs. histogram‑based split finding, the histogram approach was selected for its lower time complexity and memory usage. For distributed computation, MapReduce was chosen over AllReduce and ParameterServer due to the massive sample size and development cost considerations.

Performance optimizations : Optimizations include using a signed byte for histogram bins (reducing memory 4×), BitSet for sample‑tree membership flags (up to 1/32 memory), eliminating redundant histograms on grown nodes, and reducing cache size by storing only necessary node histograms. These changes lowered memory consumption to roughly one‑sixth of the original implementation and enabled training of hundreds of trees on billions of samples.

Serving implementation : Models are stored in Parquet format for field extensibility, compact storage, and efficient columnar reads. The serving layer uses a lightweight JAR (cos‑serving) that can load models without requiring a full platform upgrade, ensuring backward and forward compatibility.

Distributed causal effect evaluation : The article describes unbiasedness checks (X⊥T for data, ITE⊥T for model), extensions to qini curves (qini_pred_curve_counterfactual and qini_pred_curve) to assess both ranking and magnitude of treatment effects, and distributed implementations of these metrics using Spark. Additional metrics such as avgITE vs. CATE, MAE/MSE/RMSE, and multi‑treatment evaluation are also provided.

Conclusion : After two years of iteration, the “Causal On Spark” (COS) toolkit now supports end‑to‑end causal inference workflows—including training, evaluation, bias correction, and serving—and has been integrated into Meituan's Turing Machine Learning Platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

causal inference distributed machine learning Spark model serving causal forest

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.