How Data Ingestion Evolved at Youzu: From HTTP to Real‑Time DTS & ETL
This article traces the evolution of Youzu's data platform ingestion, comparing early HTTP/script methods with modern DTS and real‑time ETL solutions, evaluating middleware choices, detailing core system architectures, and outlining future improvements for reliable, scalable data access.
Data Access Evolution
The Youzu data platform has undergone significant changes in recent years; early external access relied on HTTP, Scribe, and script methods, which suffered performance and management issues. The current approach uses a DTS system and a real‑time ETL system, greatly improving platform management, performance, and monitoring.
Comparison of Data Access Methods
Different stages employed various access methods, as shown below.
Middleware Comparison
When designing the DTS real‑time ingestion architecture and selecting the underlying ETL engine, the data center evaluated several open‑source data collection middlewares, resulting in the comparison below.
Core Data Access Systems Overview
DTS System
DTS (Data Transfer Service/System) aims to reduce development and maintenance costs of synchronization tasks and standardize data ingestion channels. Its main architecture is illustrated below.
The system includes:
Data sources: local server files and MySQL databases, supporting both real‑time and batch synchronization.
Real‑time collection uses the Fluentd middleware, deployed as a Fluentd cluster for load balancing.
All collection stages incorporate statistics, node monitoring, and fault tolerance to ensure data traceability and reliability.
Batch synchronization deploys multiple file service nodes globally; data is routed to the nearest node, handling up to 800 GB per day for overseas projects and 1 TB for domestic games.
Heartbeat, statistics, and operations of all nodes are viewable and manageable via the DTS management console.
ETL System
The real‑time ETL system was built to meet massive real‑time data ingestion and query engine demands, allowing configuration and management of real‑time sync tasks. Its architecture is shown below.
Main functionalities include:
Configuring tasks via a management console, dispatching them to one or more resource machines, with load‑aware instance selection.
Core service nodes operate in cluster mode to avoid single‑point failures.
Front‑end provides centralized task management (start, stop, update configuration, etc.) for ease of operation.
Real‑time monitoring of task status and history through the front‑end interface.
Support for various plugin types, illustrated below.
Business Parties and Data Types for Data Access
The data center currently handles several data types, as shown:
Key business parties involved include:
Future Outlook
Data ingestion is the foundation of the data platform; a solid base is essential for stable, reliable services and full data value extraction. Future improvements will focus on:
Standardizing and unifying access methods to lower integration costs.
Strengthening underlying data transport infrastructure for stable, reliable transmission.
Enhancing the platform to offer self‑service ingestion, enabling users to manage data independently.
Appendix
Data center ingestion specifications and related log standards can be found at the wiki: http://wiki.youzu.com/pages/viewpage.action?pageId=22626330
The data center portal provides detailed product and service information: http://dc.youzu.com
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
YooTech Youzu Tech Team
Official tech account of Youzu Network, sharing insights and discussions on technology, research, and product.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
