Operations 16 min read

Design, Evolution, and Future of Ctrip's Operations Workflow Platform

This article details the challenges, architectural evolution, key components, implementation experiences, and future directions of Ctrip's operations workflow platform, illustrating how a multi‑stage, layered design and standardized services have transformed manual IT operations into an automated, observable, and scalable system.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design, Evolution, and Future of Ctrip's Operations Workflow Platform

Author Introduction Xu Haojie, a senior software engineer in Ctrip's Process Tools team, joined in 2013 and has extensive experience in designing and building Ctrip's workflow platform.

Preface With rapid internet development, operations have become increasingly complex, prompting the need for a platform that coordinates people, tools, and processes to free operators from inefficient, error‑prone manual tasks.

Challenges and Platform Evolution Ctrip's operations workflow platform evolved over three years: starting with the commercial BMC Remedy ARS engine, then abstracting a platform that supports multiple open‑source engines, and finally layering the system with standardized interfaces for service standardization, process flow, and automation.

Three Evolution Stages

Stage 1 – Exploration (late 2013 to early 2014): Adopted BMC Remedy ARS and began building custom workflows on its engine.

Stage 2 – Maturity (late 2014 to early 2016): Developed numerous processes (server/app lifecycle, ENP, API gateway) and formed a nascent platform.

Stage 3 – Innovation (since late 2016): Addressed visualization, data mining, and single‑engine limitations by redesigning the platform with abstracted models, standardized visualization, and monitoring.

Platform Architecture The platform consists of three layers:

Underlying Workflow Engine : Initially BMC Remedy, later extended to support multiple engines.

Middle Interface Gateway Layer : Includes OSG (Open Service Gateway), ENP (Event Notification Platform), and auxiliary tools, providing unified service registration, access control, quality monitoring, logging, and protocol adaptation (RESTful vs. SOAP).

Top External Tool Services : All external tools consume the platform’s services.

Standard Service Gateway – OSG OSG registers external services, routes requests, and enforces flow control (rate limiting, circuit breaking). All interaction logs are collected in Elasticsearch for real‑time monitoring and troubleshooting.

Event Notification Platform – ENP ENP formats and forwards messages between the workflow engine and external tools, reducing coupling and handling heterogeneous protocols (SOAP, RESTful). It supports subscription, notification, and feedback loops.

Server Onboarding Process Example The process consists of tasks and sub‑processes (e.g., VM pool entry), with defined business rules such as approvals, SLA, and CMDB updates. Visual diagrams illustrate task sequencing, parallel branches, and conditional gateways.

Visualization of Running Instances A dashboard shows all active workflow instances, their status, duration, SLA compliance, and allows drill‑down to task details and responsible teams.

Key Achievements

10× improvement in service delivery speed, reducing onboarding cycle from up to two weeks to hours.

Standardized, process‑driven, and fully automated operations, eliminating low‑efficiency manual steps.

Enhanced service standardization, business process flow, and full automation.

Next‑Generation Platform Design The improved architecture introduces adapters for multiple BPMN engines (Camunda, Activiti, Airflow, etc.), a BPMN module (repository, runtime, history, identity), and an enhanced interface gateway. Front‑end Mario provides unified visualization and reporting.

Advanced Features

Rich visualization of business‑related metrics and historical trends.

Monitoring and alerting via EITS, with email notifications and detailed alarm handling.

Support for complex workflows: parallel branches, exclusive/ inclusive gateways, and merging, enabling sophisticated process modeling.

Future Outlook

Intelligence : Automatic analysis of alarm data, self‑healing, and assisted diagnostics to reduce manual effort.

Self‑service Orchestration : Enable users to compose and publish workflow fragments in a marketplace, allowing rapid, low‑code process creation.

Recommended reading links are provided at the end of the original article.

Monitoringservice integrationworkflowplatform architectureoperations automationprocess design
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.