Artificial Intelligence 19 min read

Design and Application of Kuaishou's Dragonfly Strategy Engine Framework

This article explains how Kuaishou tackled the growing complexity of its recommendation system by developing the Dragonfly strategy engine framework, detailing the challenges, architectural abstractions, DSL-based workflow composition, data handling, ecosystem tools, and future development plans.

DataFunSummit
DataFunSummit
DataFunSummit
Design and Application of Kuaishou's Dragonfly Strategy Engine Framework

Problem and Challenges – Since 2018 Kuaishou’s daily active users surged from 100 million to 376 million, expanding recommendation scenarios from a few pages to hundreds, which caused rapid team growth and two main demands: quickly building new recommendation scenes and efficiently reusing effective strategies. The existing practice of copying C++ architecture code became unsustainable, leading to high maintenance cost, tight coupling between algorithm and engineering code, and frequent large‑scale refactoring cycles.

Dragonfly Framework Overview – Dragonfly is a general‑purpose graph‑engine framework for Kuaishou’s search‑advertising domain. It provides a unified base engine for strategy services and a flexible DSL for workflow orchestration, shifting most algorithm development from C++ to Python while still delivering high‑performance C++ execution.

Strategy Orchestration – Using Python DSL, developers define a flow object composed of operators (e.g., recall, filter, ranking). The DSL is compiled into JSON and executed by the C++ runtime, enabling both synchronous and asynchronous operators without exposing low‑level details.

Process Abstraction – The framework abstracts business logic into reusable operators organized as a DAG. Custom operators can be added by algorithm teams, while common operators are maintained by architecture engineers, allowing modular reuse and easy re‑composition.

Data Abstraction – Dragonfly introduces a high‑performance DataFrame structure similar to columnar tables, offering a schema‑free key‑value interface that supports zero‑copy data transfer and logical tables (read‑write views) for flexible data management.

DSL Layer and High‑Level Features – The DSL hides synchronization details, provides decorators such as @async and @parallel for asynchronous execution and data‑parallel computation, and supports branching and modular components to build complex business logic with simple Python syntax.

Layered Decoupling – By separating the DSL (algorithm space) from the underlying operators (engineering space), Dragonfly reduces coupling between teams, allowing each side to focus on its expertise without interfering with the other.

Application Status – Dragonfly now powers thousands of online services across the recommendation pipeline (strategy, recall, coarse‑ranking, fine‑ranking, re‑ranking), enabling unified monitoring, resource tracking, and code reuse, and has reduced C++ code volume by 50‑80%.

Ecosystem Construction – A suite of tools supports the full lifecycle: a web‑based Playground for DSL debugging, white‑box tracing for request‑level analysis, visualization of workflow graphs, and code‑governance utilities that detect and clean unused operators.

Planning Outlook – Future work focuses on NUMA‑aware performance optimizations, automated data‑lineage management and cleanup, and AI‑driven tooling to further enhance productivity and enable B2B capabilities.

Q&A Highlights – Answers cover custom operator expressiveness, granularity decisions between algorithm and engineering teams, differences from TensorFlow’s control flow, and how DSL configurations map to micro‑service deployments.

backend architectureDSLrecommendation systemStrategy EngineKuaishou
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.