Design, Evolution, and Future of Ctrip's Operations Workflow Platform
This article details the challenges, architectural evolution, key components, implementation experiences, and future directions of Ctrip's operations workflow platform, illustrating how a multi‑stage, layered design and standardized services have transformed manual IT operations into an automated, observable, and scalable system.
Author Introduction Xu Haojie, a senior software engineer in Ctrip's Process Tools team, joined in 2013 and has extensive experience in designing and building Ctrip's workflow platform.
Preface With rapid internet development, operations have become increasingly complex, prompting the need for a platform that coordinates people, tools, and processes to free operators from inefficient, error‑prone manual tasks.
Challenges and Platform Evolution Ctrip's operations workflow platform evolved over three years: starting with the commercial BMC Remedy ARS engine, then abstracting a platform that supports multiple open‑source engines, and finally layering the system with standardized interfaces for service standardization, process flow, and automation.
Three Evolution Stages
Stage 1 – Exploration (late 2013 to early 2014): Adopted BMC Remedy ARS and began building custom workflows on its engine.
Stage 2 – Maturity (late 2014 to early 2016): Developed numerous processes (server/app lifecycle, ENP, API gateway) and formed a nascent platform.
Stage 3 – Innovation (since late 2016): Addressed visualization, data mining, and single‑engine limitations by redesigning the platform with abstracted models, standardized visualization, and monitoring.
Platform Architecture The platform consists of three layers:
Underlying Workflow Engine : Initially BMC Remedy, later extended to support multiple engines.
Middle Interface Gateway Layer : Includes OSG (Open Service Gateway), ENP (Event Notification Platform), and auxiliary tools, providing unified service registration, access control, quality monitoring, logging, and protocol adaptation (RESTful vs. SOAP).
Top External Tool Services : All external tools consume the platform’s services.
Standard Service Gateway – OSG OSG registers external services, routes requests, and enforces flow control (rate limiting, circuit breaking). All interaction logs are collected in Elasticsearch for real‑time monitoring and troubleshooting.
Event Notification Platform – ENP ENP formats and forwards messages between the workflow engine and external tools, reducing coupling and handling heterogeneous protocols (SOAP, RESTful). It supports subscription, notification, and feedback loops.
Server Onboarding Process Example The process consists of tasks and sub‑processes (e.g., VM pool entry), with defined business rules such as approvals, SLA, and CMDB updates. Visual diagrams illustrate task sequencing, parallel branches, and conditional gateways.
Visualization of Running Instances A dashboard shows all active workflow instances, their status, duration, SLA compliance, and allows drill‑down to task details and responsible teams.
Key Achievements
10× improvement in service delivery speed, reducing onboarding cycle from up to two weeks to hours.
Standardized, process‑driven, and fully automated operations, eliminating low‑efficiency manual steps.
Enhanced service standardization, business process flow, and full automation.
Next‑Generation Platform Design The improved architecture introduces adapters for multiple BPMN engines (Camunda, Activiti, Airflow, etc.), a BPMN module (repository, runtime, history, identity), and an enhanced interface gateway. Front‑end Mario provides unified visualization and reporting.
Advanced Features
Rich visualization of business‑related metrics and historical trends.
Monitoring and alerting via EITS, with email notifications and detailed alarm handling.
Support for complex workflows: parallel branches, exclusive/ inclusive gateways, and merging, enabling sophisticated process modeling.
Future Outlook
Intelligence : Automatic analysis of alarm data, self‑healing, and assisted diagnostics to reduce manual effort.
Self‑service Orchestration : Enable users to compose and publish workflow fragments in a marketplace, allowing rapid, low‑code process creation.
Recommended reading links are provided at the end of the original article.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.