Big Data 16 min read

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

This article presents Qichacha's comprehensive data‑cost‑reduction strategy, detailing its Hadoop‑based three‑pillar architecture, layered data warehouse, Hive upgrades, unified metadata across multi‑cloud clusters, middleware choices such as Alluxio and JuiceFS, version‑compatible hybrid clouds, and Kubernetes‑driven resource orchestration to achieve scalable, low‑cost data processing.

DataFunSummit
DataFunSummit
DataFunSummit
Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

In the era of rapid digitalization, data has become a critical asset, prompting Qichacha to explore cost‑effective and efficient data management practices.

1. Core Data Architecture – Qichacha builds on Hadoop’s classic three‑pillar model (storage, compute, scheduling) and extends it with a four‑layer data warehouse (ODS, DWD, DW, Application). Due to high query costs on public clouds, an on‑premise big‑data platform was deployed for unrestricted SQL execution.

2. Data Warehouse Upgrade – The upgraded warehouse adds Hive2, Kyuubi, and Trino to boost compute performance, while addressing compatibility challenges between Hive versions and metadata synchronization.

3. Unified Metadata Layer – Multiple compute engines (Hive1, Hive2, SparkSQL, Kyuubi, Trino) share a single metadata catalog, enabling consistent table definitions across clusters.

4. Hybrid Cloud Architecture – Offline and real‑time clusters are separated; real‑time stores (TiDB, MongoDB) complement batch processing (Flink, Spark Streaming). Object storage (OBS, OSS, COS, CEPH, MinIO) and middleware (Alluxio, JuiceFS) provide a unified storage interface.

5. Middleware Compatibility – Alluxio supports Hadoop 2.2‑3.3 and JuiceFS supports Hadoop 2.0‑3.0, allowing cross‑version data access without extensive code changes.

6. Multi‑Cloud Unified Architecture – Two independent Hadoop clusters (e.g., Hadoop 2 and Hadoop 3) are linked via a shared metadata layer and middleware, achieving cross‑cloud data access, version‑agnostic federation, and cost‑effective EC storage in Hadoop 3.

7. Engine Unification & Kubernetes Integration – An intelligent engine (e.g., Coral) can translate SQL across engines, while Koordinator and Kubernetes orchestrate resources, reducing over‑provisioning and improving real‑time workload efficiency.

Conclusion – By iteratively innovating the architecture while cautiously implementing it, Qichacha achieves lower storage costs, flexible compute, unified metadata, and a scalable multi‑cloud data platform.

Big DataMulti-Clouddata warehousemetadata managementHadoopdata architecture
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.