Big Data 23 min read

Feature Production Scheduling: Architecture Evolution and Core Technologies

Using Meituan‑Dianping’s hospitality online feature system as a case study, the article describes how feature production scheduling evolved from offline batch ETL to automated, metadata‑driven pipelines and sub‑second streaming, detailing the underlying architecture, incremental updates, storage abstraction, write‑shaving, atomicity, and recovery mechanisms.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Feature Production Scheduling: Architecture Evolution and Core Technologies

In the previous article "Data Access Techniques in Online Feature Systems", the authors introduced storage and retrieval aspects of online feature systems. This follow‑up focuses on the equally important topic of feature production scheduling, using Meituan-Dianping's hospitality online feature system as a case study.

Feature Production Scheduling Evolution

From Offline to Online

The goal of an online feature system is to expose offline‑computed features via an API for downstream strategy services. Requirements include daily updates, hundred‑billion‑scale data, and sub‑20 ms latency at peak QPS of millions. The initial architecture writes daily offline features into a distributed KV store (Tair) via ETL and serves them through a Thrift‑based RPC service. Features are abstracted as Domain objects (e.g., Domain=ABC for user profile features) that encapsulate the feature set and its query dimension.

From Manual to Automated

As the number of Domains grew, manual ETL development became a bottleneck. The team introduced a metadata‑driven, platform‑based import workflow: users fill a small form, the system stores metadata (source DB, table, storage engine, key/value fields, update schedule, partitioning, etc.) in a MySQL Settings module, and a scheduler automatically generates and runs the import jobs. This reduced onboarding time from hours to minutes and added support for multiple storage engines (Tair, Squirrel, etc.).

From Day‑Level to Second‑Level

Real‑time features require sub‑second freshness. The team built a streaming platform based on Storm that consumes Kafka topics, applies configurable aggregation logic (sum, count, max, min, avg, distinct count, last, list) over fixed, sliding, or infinite windows, and writes results back to the KV store. A delay‑queue mechanism enables sliding‑window updates without retaining all raw events.

Real‑time Feature Computation Platform

The platform supports 24 common feature types (combinations of three window kinds and eight aggregation functions). The processing flow consists of three abstract steps: read prior state, compute new value, and write back. Implementations for fixed windows embed the timestamp in the key; sliding windows use a delay queue to offset expired contributions; infinite windows rely on offline baselines plus incremental real‑time updates.

Real‑time Feature Optimization

To handle high QPS, the system adopts incremental computation (e.g., maintaining sum and count for averages) and approximate algorithms such as HyperLogLog for distinct counts.

Feature Production Scheduling Techniques

Logical Storage Layer

Domain metadata is decoupled from storage details via a Storage entity. This enables versioned data, read/write separation, and atomic switches between storage versions.

Incremental Update and Data Consistency

Instead of full daily reloads, the team computes diffs between successive snapshots (SNAPSHOT) and only writes changed keys, dramatically reducing load. Each record carries a lease; expired leases force inclusion in the next diff, guaranteeing eventual consistency.

Write Peak Shaving

Offline jobs are throttled by the scheduler (max concurrent sync jobs × per‑job concurrency ≤ storage write capacity). Real‑time writes are mediated by an Updater service that enforces per‑client rate limits and can reject or delay excess traffic.

Atomic Update

Offline updates are day‑level atomic via the logical storage layer. Real‑time updates achieve atomicity either by single‑threaded key groups or by exposing a CAS (compare‑and‑swap) API.

Data Fusion and Recovery

Offline calculations provide baselines for long‑term windows, while real‑time streams handle recent data. Periodic offline snapshots allow fast recovery: if a real‑time failure occurs, the system can roll back to the latest snapshot and replay streams from that point.

Conclusion

The online feature system now covers loading, computation, import, storage, and retrieval, but further work remains: supporting more offline frameworks, richer real‑time types, high‑availability real‑time computation, faster recovery, and integrated monitoring. The authors invite interested engineers to join Meituan’s data mining team.

Authors

Yang Hao – Head of Data Mining Systems, Meituan Platform & Hospitality Group (Peking University, 2011). Wei Bin – Data Mining Systems Engineer, Meituan Platform & Hospitality Group (Dalian University of Technology, 2015).

Recruitment notice: Meituan’s data mining team is hiring for algorithm, big‑data system development, and Java backend positions. Interested candidates may send resumes to yanghao13#meituan.com.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System ArchitectureBig Datadata pipelineReal-time Processingfeature engineeringonline feature system
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.