Big Data 14 min read

How 360’s Titan Platform Evolved: From Script Templates to Real‑Time DAG‑Based Data Processing

This article outlines the evolution of 360’s Titan big‑data processing platform, describing the challenges of traditional script‑based development, the three architectural stages (pre‑Titan, Titan 1.0, Titan 2.0), the functional modules, the DITTO component framework, and key takeaways for building flexible, self‑service data pipelines.

dbaplus Community
dbaplus Community
dbaplus Community
How 360’s Titan Platform Evolved: From Script Templates to Real‑Time DAG‑Based Data Processing

Background and Challenges

Modern big‑data ecosystems have diversified from early Hadoop to third‑ and fourth‑generation engines such as Spark and Flink, and storage options now include MPP relational stores, distributed stores, and time‑series databases. This diversity raises learning costs and makes script‑based development inefficient: business users cannot participate, data flows become black boxes, and lack of unified scheduling leads to resource waste.

Platform Evolution

Stage 1 – Pre‑Titan

During the early distributed‑computing era the platform abstracted script templates on top of engines such as Hadoop. This improved efficiency compared with manual single‑machine jobs, but rapid product growth soon exhausted the template approach, requiring excessive manual effort.

Stage 2 – Titan 1.0

Titan 1.0 introduced a reusable template library that allowed business users to build data pipelines with less code. The platform added richer data sources and supported multiple products (reporting, online query, analytics), handing more data‑development responsibilities to business teams. However, the architecture lacked real‑time stream ingestion, the template library remained semi‑custom, and task operations still depended on platform engineers.

Stage 3 – Titan 2.0

Titan 2.0 adopts a third‑generation compute engine (Spark/Flink) that supports internal DAGs and real‑time processing. It introduces the DITTO component framework and a rule engine, providing a rich component library and a drag‑and‑drop, no‑code UI. Integrated scheduling, monitoring, and permission management enable self‑service operations and data‑security guarantees while supporting heterogeneous data sources.

Functional Modules

First‑Level Functions

The platform groups core capabilities: data integration, synchronization, computation, analysis, and streaming.

Second‑Level Functions

Data source management – unified handling of batch and streaming sources with security and quality constraints.

Task management – graph‑based configuration, rule‑driven operations, visual monitoring of task instances.

Scheduling engine – instant, periodic, historical, and real‑time scheduling with monitoring.

Permission management – role, operation, menu, and data‑source permissions to ensure data safety.

DITTO Component Framework

DITTO provides a unified entry point that abstracts heterogeneous data sources and unifies offline and real‑time computation on top of Spark/Flink.

Three‑layer task structure : Application → Job → Task.

Application initializes the environment (components, engines, time, scheduler).

Job orchestrates a DAG of Tasks.

Task executes the actual computation.

Component abstraction separates computation logic, engine abstraction, and runtime environment, enabling engine‑agnostic execution.

The component lifecycle includes prepare, execute, declare, and clearup. Dependencies between components are expressed via a dependencies list.

Context handles initialization of components, compute engines, time, environment, and scheduler, and transports data between components.

The rule engine separates business rules from application code. It supports logical, built‑in, arithmetic, and text operations and is packaged as an independent product with rule and function libraries.

Performance Optimizations

DITTO includes optimizations at the component level to mitigate data skew, prevent memory overflow, and handle small‑file merging and caching, ensuring stable performance for both batch and streaming workloads.

Key Takeaways

Multi‑source support : unified handling of diverse applications and heterogeneous storage systems.

No‑code usability : drag‑and‑drop UI enables simple calculations (e.g., DAU) without programming.

Self‑service flexibility : users can configure, monitor, and schedule tasks independently, reducing reliance on developers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureDAGData PlatformTitanDITTO
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.