Big Data 6 min read

How Data Ingestion Evolved at Youzu: From HTTP to Real‑Time DTS & ETL

This article traces the evolution of Youzu's data platform ingestion, comparing early HTTP/script methods with modern DTS and real‑time ETL solutions, evaluating middleware choices, detailing core system architectures, and outlining future improvements for reliable, scalable data access.

YooTech Youzu Tech Team
YooTech Youzu Tech Team
YooTech Youzu Tech Team
How Data Ingestion Evolved at Youzu: From HTTP to Real‑Time DTS & ETL

Data Access Evolution

The Youzu data platform has undergone significant changes in recent years; early external access relied on HTTP, Scribe, and script methods, which suffered performance and management issues. The current approach uses a DTS system and a real‑time ETL system, greatly improving platform management, performance, and monitoring.

Evolution diagram
Evolution diagram

Comparison of Data Access Methods

Different stages employed various access methods, as shown below.

Data access methods comparison
Data access methods comparison

Middleware Comparison

When designing the DTS real‑time ingestion architecture and selecting the underlying ETL engine, the data center evaluated several open‑source data collection middlewares, resulting in the comparison below.

Middleware comparison
Middleware comparison

Core Data Access Systems Overview

DTS System

DTS (Data Transfer Service/System) aims to reduce development and maintenance costs of synchronization tasks and standardize data ingestion channels. Its main architecture is illustrated below.

DTS architecture
DTS architecture

The system includes:

Data sources: local server files and MySQL databases, supporting both real‑time and batch synchronization.

Real‑time collection uses the Fluentd middleware, deployed as a Fluentd cluster for load balancing.

All collection stages incorporate statistics, node monitoring, and fault tolerance to ensure data traceability and reliability.

Batch synchronization deploys multiple file service nodes globally; data is routed to the nearest node, handling up to 800 GB per day for overseas projects and 1 TB for domestic games.

Heartbeat, statistics, and operations of all nodes are viewable and manageable via the DTS management console.

ETL System

The real‑time ETL system was built to meet massive real‑time data ingestion and query engine demands, allowing configuration and management of real‑time sync tasks. Its architecture is shown below.

ETL architecture
ETL architecture

Main functionalities include:

Configuring tasks via a management console, dispatching them to one or more resource machines, with load‑aware instance selection.

Core service nodes operate in cluster mode to avoid single‑point failures.

Front‑end provides centralized task management (start, stop, update configuration, etc.) for ease of operation.

Real‑time monitoring of task status and history through the front‑end interface.

Support for various plugin types, illustrated below.

ETL supported plugin types
ETL supported plugin types

Business Parties and Data Types for Data Access

The data center currently handles several data types, as shown:

Data types
Data types

Key business parties involved include:

Related business parties
Related business parties

Future Outlook

Data ingestion is the foundation of the data platform; a solid base is essential for stable, reliable services and full data value extraction. Future improvements will focus on:

Standardizing and unifying access methods to lower integration costs.

Strengthening underlying data transport infrastructure for stable, reliable transmission.

Enhancing the platform to offer self‑service ingestion, enabling users to manage data independently.

Appendix

Data center ingestion specifications and related log standards can be found at the wiki: http://wiki.youzu.com/pages/viewpage.action?pageId=22626330

The data center portal provides detailed product and service information: http://dc.youzu.com

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datareal-time processingMiddlewareETLDTSdata ingestion
YooTech Youzu Tech Team
Written by

YooTech Youzu Tech Team

Official tech account of Youzu Network, sharing insights and discussions on technology, research, and product.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.