How Fluid Transforms Large‑Scale Data Retrieval on Kubernetes
This article explains how Zuoyebang redesigned its massive data retrieval platform by separating compute and storage with the Fluid project on Kubernetes, achieving minute‑level hundred‑TB distribution, elastic caching, and improved stability for real‑time educational services.
Background Analysis
Zuoyebang’s intelligent search and analysis of learning materials rely on a large‑scale retrieval system that underpins many platform services. The system runs on ultra‑large clusters, stores hundreds of terabytes of data, and requires millisecond‑level incremental updates and minute‑level full updates, with strict performance (P90 1.6 ms), throughput (hundreds of GB per peak), and 99.99%+ stability.
Original Architecture and Issues
The previous architecture emphasized data locality, leading to two main problems:
Data fragmentation: each node required the full dataset, making full data distribution difficult and requiring multi‑level cascading delivery with long cycles and extensive verification.
Weak elasticity of business and resources: tight coupling of compute and storage limited flexible scaling; capacity expansion took days, insufficient for sudden traffic spikes.
New Architecture with Fluid
To address these issues, the team adopted a compute‑storage separation architecture using the open‑source Fluid project as the core connector.
Data controllability: Fluid provides distributed caching and flexible data loading/eviction via dataload, enabling controlled cache updates.
Elastic cache scaling: Workers can be dynamically added or removed based on local or nearby data access demands.
Fluid runs on Kubernetes as a scalable distributed data orchestration and acceleration system, abstracting data from storage (e.g., HDFS, OSS, Ceph) so it flows like a fluid between storage sources and compute workloads. Users access data via native Kubernetes volumes, while Fluid handles movement, replication, eviction, and transformation transparently.
Implementation Details
Separation of cache and compute nodes: Although combining fuse and workers can improve data locality, the online scenario chose separate cache and compute nodes to isolate stability concerns and gain better elasticity.
Fluid supports dataset schedulability; by setting nodeAffinity for datasets, cache nodes are efficiently placed.
High online requirements: The system demands fast, complete, and consistent data access, prohibiting partial updates or unexpected back‑source requests.
Cache strategy: The full‑cache mode routes all requests to the cache, eliminating back‑source latency. Data loading is governed by the update workflow, ensuring safety and standardization.
Atomic updates: Since models consist of many files, the entire model must be cached before use; thus, the dataload process guarantees atomicity—new versions become visible only after loading completes.
Advantages and Outlook
With Fluid‑enabled compute‑storage separation, the platform now achieves minute‑level distribution of hundreds of terabytes, stateless compute services that scale horizontally, and reduces full‑data update cycles from weeks to hours, markedly improving stability and availability.
Future work includes optimizing scheduling and execution strategies for upstream jobs, expanding model training and distribution, and enhancing observability and high‑availability features to benefit more developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
