Design and Implementation of the Eagle Algorithm Strategy Platform at Ctrip
This article details the architecture, component design, DAG execution engine, optimization techniques, and real‑world performance gains of Ctrip's Eagle algorithm strategy platform, illustrating how modular, visualized, and automated workflow management improves development efficiency, stability, and resource utilization for large‑scale recommendation services.
In Ctrip's search‑advertising‑promotion business, the Eagle technology ecosystem serves as a core middle‑platform framework, addressing challenges from business expansion by providing a unified, configurable, and visualized algorithm strategy platform.
Background : Eagle unifies model training/inference, feature production/service, online strategy service, layered experiments, and monitoring, significantly boosting R&D efficiency and business outcomes. However, growing complexity introduced code redundancy, parameter explosion, black‑box logic, and iteration risks.
What is the Algorithm Strategy Platform? It is a highly configurable, transparent platform for search, advertising, and recommendation systems, offering visual tools for each pipeline stage (recall, ranking, model prediction) and integrating automated testing, real‑time monitoring, layered experiments, and debugging.
Overall Design :
Process componentization: three decoupled stages – operator development, strategy orchestration, task execution.
Visual orchestration: DAG‑based visual layout of strategy nodes and their relationships.
Logical operatorization: unified OP interfaces (OP‑Lib) with independent parameter contracts, reducing code duplication.
The platform is divided into three parts:
1) OP‑Organization : visual management of OP‑Lib code, strategy components, and DAG orchestration.
2) OP‑Lib : unified OP interface library, providing metadata, monitoring, and exception handling.
3) OP‑engine : real‑time listener for DAG changes, supporting automatic optimization, dynamic trimming, rate limiting, circuit breaking, and monitoring.
Strategy Orchestration provides a visual, "what‑you‑see‑is‑what‑you‑get" interface for recall, coarse‑ranking, fine‑ranking, and re‑ranking stages, allowing users to edit strategy cards, view component groupings, and trace version history.
Standardized Release Process ensures safe, repeatable deployments via automated testing (Docker + k8s), gray‑release, and configuration rollback.
Design‑First Development encourages users to design reusable OPs before coding, reducing patchwork and improving maintainability.
OP metadata (class, I/O types, validation) is defined in configuration files and reported at runtime. OP implementations inherit from generic base classes such as NodeOp0<I,O,P> (accepts multiple same‑type upstream outputs) or NodeOpN<I1,I2,...,IN,O,P> (accepts N distinct upstream outputs), providing process() , fallback() , and paramFn() methods.
Debugging and Monitoring includes online debug mode, automatic testing, and extensive monitoring hooks for traffic, latency, and exceptions.
DAG Executor is a high‑performance engine supporting standard DAG compliance, OP reuse, nested DAGs, dynamic trimming, and extensible optimizers.
Three execution strategies are provided:
BFS Serial Scheduling : single‑threaded layer‑by‑layer execution.
BFS Parallel Scheduling : thread‑pool parallelism for same‑level nodes.
Fully Asynchronous Reactive Scheduling : event‑driven task queue with main and worker threads, enabling nodes to run as soon as dependencies are satisfied.
The platform also includes a chain‑of‑responsibility based DAG optimizer that merges adjacent nodes to reduce scheduling overhead.
Practical Impact in Recommendation Feed :
Strategy rollout time reduced from days to hours (simple) or half‑day (complex).
Transparent DAG visualization improves communication and debugging.
Performance gains: overall latency reduced ~30%, CPU cores reduced from 2400 to 600 (75% reduction) after two optimization rounds.
Future plans include further executor optimization, OP performance improvements, full‑chain debugging, and broader platform adoption across Ctrip's business lines.
Recommended Reading :
Trip.com QUIC High Availability and Performance
Design and Practice of Ctrip Ticket Flash‑Sale System
Ctrip Registration Center Architecture and Design Trade‑offs
Domain‑Centric, Mid‑Platform, Multi‑Region Evolution of Ctrip Account System
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.