Operations 13 min read

How Opsflow Revolutionized Youzan's DevOps Workflow Management

This article examines the evolution of Youzan's Opsflow workflow engine, detailing its architecture, components, and how it solved numerous operational challenges such as low customizability, lack of progress visibility, and fragmented approval processes, while outlining its current status and future roadmap.

Youzan Coder
Youzan Coder
Youzan Coder
How Opsflow Revolutionized Youzan's DevOps Workflow Management

Background

As Youzan’s scale grew, its DevOps operations became increasingly complex, creating a need for a more efficient coordination between development, operations, tools, and processes to free engineers from low‑efficiency, high‑intensity manual tasks.

Opsflow, the workflow engine of Youzan’s DevOps platform, evolved over two years from a simple fixed‑order script system to a highly customizable, GUI‑driven, visual, and progress‑aware engine that now supports hundreds of daily workflows such as permission requests, component provisioning, big‑data approvals, release approvals, and CI/CD pipelines.

Problems Before Opsflow

Low customizability of processes

Participants could not perceive workflow progress

Lack of visualisation leading to manual checks and errors

Limited front‑end customisation

Duplicated approval processes across applications

No support for dynamic branching

Legacy system could not handle approver leave

Insufficient participant type support

High cost of onboarding new processes

No centralized reporting for operational insight

Opsflow System Design

2.1 Architecture

Opsflow consists of five core modules:

Opsflow‑FSM: manages finite‑state machines (FSM) for each workflow.

Opsflow‑Web: wraps FSM with RESTful APIs, handles authentication, and interacts with other DevOps subsystems.

Opsflow‑Plugins: an extensible plugin system that reacts to FSM events.

Worker: distributed task executor (based on Celery) that processes script nodes.

Monitoring module: tracks task consumption.

2.1.1 Opsflow‑FSM

Each workflow is represented as an FSM. When an administrator creates a new workflow via the GUI, a new FSM instance is stored in RDS. The FSM drives the ticket lifecycle through states such as "ES Administrator Approval" and actions like "Approve", "Reject", or "Close". The FSM advances the ticket to the "End" state upon completion.

2.1.2 Opsflow‑Web

Opsflow‑Web exposes the FSM through RESTful APIs, adds permission checks, and renders actionable buttons on the front‑end based on the FSM’s possible transitions. For example, during the "New ES Request" flow, the web layer presents three buttons corresponding to the three possible transitions.

2.1.3 Opsflow‑Plugins

Plugins receive events emitted by the FSM during state transitions. Simple plugins (e.g., enterprise‑WeChat notifications, task reminders) can be added by implementing a callback interface, enabling rapid feature extensions without bloating the core engine.

2.1.4 Worker

When a workflow reaches a "script" node, Opsflow‑Web pushes a task to a message queue. Workers consume these tasks, execute the required actions (e.g., provisioning ES resources), and then trigger the FSM to continue. Workers use Celery, allowing horizontal scaling when the queue builds up.

2.1.5 Front‑end

The front‑end provides default components such as process diagrams, ticket progress, and detail views. Administrators can configure visibility and order of these components. Custom React components can be loaded dynamically (via react‑loadable) for specific workflows, receiving rich properties that contain all ticket data.

Process Diagram Rendering

Opsflow‑FSM stores the workflow as a Directed Acyclic Graph (DAG). It uses the dagre‑d3 library, which implements a rank‑based layout algorithm, to render elegant flow diagrams.

How Opsflow Addresses the Original Problems

Problems 1‑4, 9: A GUI allows administrators to configure FSM nodes and edges, instantly visualising issues and adjusting them. The new front‑end structure resolves low customizability and lack of progress visibility.

Problem 5: Consolidating most processes onto Opsflow eliminates duplicated effort across applications.

Problem 6: Conditional expressions enable dynamic branching. Example expression: {row_count} >= 1000000 and not {upload} During execution, placeholders like row_count and upload are replaced with ticket attributes, allowing the FSM to decide the next node based on runtime data.

Problem 7: Integration with internal OA automatically substitutes approvers with their leave proxies.

Problem 8: Opsflow supports a wide range of participant types, including configurable users, logical AND/OR groups, team leaders, custom scripts (Python), leave proxies, internal app approvals, external system notifications, enterprise‑WeChat alerts, and escalation reminders.

Problem 10: Periodic statistical jobs generate dashboards that give administrators a clear view of each workflow’s operational health.

Current Status

After more than a year of iteration, Opsflow now supports over 90 distinct workflows covering all aspects of DevOps, big‑data platforms, and even the beauty‑industry department. The system has markedly improved usability, functionality, extensibility, and stability.

Roadmap

Future plans include cross‑environment workflow synchronization (e.g., QA ↔︎ Prod), an open management console for developers to create and edit workflows, workflow cloning, a more user‑friendly mobile approval experience, and additional features to further accelerate new process onboarding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

automationOperationsDevOpsWorkflow EngineFinite State MachineOpsflow
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.