Backend Development 16 min read

How Volcano Engine Rebuilt Its Ad‑Testing Platform for Scalability and Reliability

This article explains how Volcano Engine identified the tangled authorization, data‑fetching, and performance problems of its advertising AB‑testing platform and refactored it by splitting services, redesigning the data model with MySQL and ClickHouse, applying DAG scheduling, time‑wheel algorithms, Domain‑Driven Design, and rigorous unit testing to achieve a more stable, extensible backend solution.

ByteDance Data Platform
ByteDance Data Platform
ByteDance Data Platform
How Volcano Engine Rebuilt Its Ad‑Testing Platform for Scalability and Reliability

Overview

Volcano Engine’s AB‑testing platform for advertising needed a scientific way to compare different ad strategies, but early implementations relied on ad‑hoc testing and suffered from tangled authorization logic, excessive scheduled tasks, slow queries, and hard‑to‑maintain code.

Challenges before Refactor

Support for multiple ad platforms made authorization logic increasingly complex.

Authorization, data collection, and business logic were tightly coupled, making debugging difficult.

Each data‑capture type required a separate timed job, leading to an unmanageable number of tasks.

Inefficient data model caused slow report queries as data volume grew.

Over‑customized features resulted in fragile code.

Refactoring Solutions

Service decomposition: split into an Authorization Service, Data‑Fetch Service, Business Backend Service, and a minimal set of scheduled tasks.

Data model redesign: store metadata in MySQL for fast updates and report data in ClickHouse for high‑performance analytics.

Adopt Domain‑Driven Design (DDD) with interface‑driven programming, allowing each ad platform to implement its own adapter.

Enforce strict unit‑test coverage and CI/CD pipelines to ensure code quality and rapid bug detection.

Unified codebase for SaaS and on‑premise deployments using environment‑variable configuration.

Core Modules

Authorization Service handles granting ad‑account tokens (OAuth2 or password‑based) and stores credentials for downstream tasks.

Data‑Fetch Service synchronizes ad‑platform data at hour‑ and day‑level, supports custom token refresh intervals, and provides real‑time fetch APIs.

Business Backend Service uses authorized accounts to create campaigns, manage assets, and aggregate query results.

Data Model and Storage

Metadata (IDs, names, timestamps) is stored in MySQL, while high‑volume report metrics (clicks, impressions, spend) reside in ClickHouse, leveraging its Map type for flexible schema expansion.

DAG Scheduling

Tasks are expressed as a Directed Acyclic Graph (DAG) to capture dependencies; the Scheduler parses the DAG and dispatches jobs to Workers. Example DAG definition:

<code>{
    "schedule_interval":"*/60 * * * *",
    "dag_id":"${account_id}_today_insights",
    "tasks":[
        {"task_id":"dummy_task","downstream_task_ids":["account_meta_task","ad_meta_task"],"is_dummy":true,"operator_name":"DummyOperator"},
        {"task_id":"account_meta_task","operator_type":"basic","operator_name":"ad_meta_operator"},
        {"task_id":"ad_meta_task","downstream_task_ids":["ad_daily_insight_task"],"operator_name":"ad_meta_operator"},
        {"task_id":"ad_daily_insight_task","operator_name":"insight_operator"}
    ]
}</code>

Time‑Wheel Algorithm

To efficiently execute millions of scheduled tasks, a hierarchical time‑wheel is used: a day‑level wheel (7×24 slots) feeds tasks into a second‑level wheel (3600 slots) for second‑precision execution, reducing traversal from tens of thousands of tasks to a few dozen time slots.

Domain‑Driven Design (DDD)

Four layers structure the system:

User Interface Layer : receives requests, performs simple validation, returns results.

Application Layer : orchestrates use‑cases without embedding business rules.

Domain Layer : core business logic expressed as rich, encapsulated models, independent of external frameworks.

Infrastructure Layer : provides technical implementations such as databases, caches, and message queues.

Unit Testing Benefits

Accelerates development and refactoring by quickly locating bugs.

Enforces high cohesion and low coupling in code design.

Improves overall code quality and prevents regressions.

Encourages mock‑based isolation of external dependencies.

Integrates with CI/CD pipelines to enforce coverage thresholds.

backendAB testingadvertisingdata pipelineDAGUnit TestingDDD
ByteDance Data Platform
Written by

ByteDance Data Platform

The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.