Inside Xinghuan Tech’s Next‑Gen Big Data 3.0 Architecture: Unified, Cloud‑Native, Real‑Time
This article details Xinghuan Technology’s evolution from 2013 to the present, describing its self‑developed Big Data 3.0 stack—including a unified data platform, SQL‑centric development, cloud‑native resource scheduling, distributed storage managed by Raft, DAG‑based compute engines, and real‑time stream processing—while highlighting key milestones and design principles that differentiate it from traditional Hadoop‑based solutions.
Overview
Since its founding in 2013, Xinghuan Technology has focused on integrating big‑data foundational technologies with enterprise data services, creating a series of world‑class breakthroughs tailored to China’s complex data‑application scenarios.
Big Data 3.0 Technology Stack
To meet new data‑business demands and resolve legacy issues, Xinghuan redesigned its stack into a highly unified platform that addresses the four V’s of big data, enabling a value‑chain from persistence to ecosystem.
Design Considerations & Overall Architecture
Unified data platform replaces mixed architectures (data lake, warehouse, marts, search) with a one‑stop solution that eliminates redundancy and latency.
SQL as a unified interface leverages the mature, widely‑adopted language to support warehouses, OLTP, search, spatio‑temporal databases, reducing development difficulty.
Cloud‑native deployment uses containers and Kubernetes to provide elastic, on‑demand resources across CPU, GPU, network, and storage.
Data‑business integration creates a unified data warehouse, model marketplace, and application market to support both data‑centric and application‑centric workflows.
Layered Architecture
Resource Scheduling Layer
Built on Kubernetes, this layer manages configuration, physical resource pools, distributed storage, and cloud networking, enabling precise scheduling of big‑data, AI, and database workloads.
Unified Storage Management Layer
Abstracts common storage functions (consistency, MVCC, transaction, metadata, partitioning, fault‑tolerance) behind a Raft‑based control plane, allowing plug‑in storage engines to become highly available distributed systems.
Distributed Block Storage Layer
Provides unified block storage with strong consistency guarantees via Raft, supporting various specialized engines (graph, GIS, high‑dimensional features) without reinventing core mechanisms.
Compute Engine Layer
Adopts a DAG‑based execution model with vectorized processing and quantized execution, delivering superior scalability and performance for batch, interactive, and real‑time workloads.
Development Interface Layer
Offers a SQL compiler, optimizer suite, and distributed transaction unit, enabling developers to work with a familiar SQL interface while the system handles warehouses, OLTP, search, and graph queries.
lRBO (Rule‑Based Optimizer) : Hundreds of expert rules for IO reduction (filter push‑down, partition pruning, etc.).
ISO (Inter SQL Optimizer) : Merges similar SQLs inside stored procedures into a single DAG for parallel execution.
MBO (Materialize‑Based Optimizer) : Leverages materialized views or cubes to reduce computation.
CBO (Cost‑Based Optimizer) : Chooses plans based on estimated IO, network, and compute costs; future integration of ML‑driven costing is planned.
Real‑Time Stream Processing
Designed a low‑latency (<5 ms) stream engine with a custom StreamSQL extension, CEP engine for complex event patterns, rule engine for business logic, and in‑memory distributed cache for fast metric storage.
USE APPLICATION cep_example;
CREATE STREAM robotarm_2(armid STRING, location STRING) tblproperties(
"topic"="arm_t2",
"kafka.ZooKeeper"="localhost:2181",
"kafka.broker.list"="localhost:9092"
);
CREATE TABLE coords_miss(armid STRING, location STRING);
INSERT INTO coords_miss
SELECT e1.armid, e1.location
FROM PATTERN(
e1=robotarm_2[e1.location='A'] NOTNEXT
e2=robotarm_2[e2.armid=e1.armid AND e2.location='B'] ) WITHIN ('1' minute);Historical Milestones
2015: First Hadoop‑based distributed analytical DB supporting full SQL, stored procedures, and distributed transactions; launched low‑latency (<5 ms) stream engine with StreamSQL.
2017: Early adoption of Docker & Kubernetes for cloud‑native big‑data services, predating Cloudera’s similar effort.
2018: Released trillion‑scale distributed graph DB and flash‑based columnar analytical DB using Raft and custom storage, boosting interactive analysis performance.
Conclusion
Xinghuan Technology will continue to enrich this architecture with new storage and compute capabilities, machine‑learning‑driven data governance, and data‑service publishing, aiming to bridge the gap between data and business and unlock greater value from big data.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
