Big Data 16 min read

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

This article presents Qichacha's comprehensive data‑cost‑reduction strategy, detailing its Hadoop‑based three‑pillar architecture, layered data warehouse, Hive upgrades, unified metadata across multi‑cloud clusters, middleware choices such as Alluxio and JuiceFS, version‑compatible hybrid clouds, and Kubernetes‑driven resource orchestration to achieve scalable, low‑cost data processing.

DataFunSummit

Aug 13, 2024

Data Cost Reduction and Efficiency: Qichacha's Data Architecture and Multi‑Cloud Unified Design

In the era of rapid digitalization, data has become a critical asset, prompting Qichacha to explore cost‑effective and efficient data management practices.

1. Core Data Architecture – Qichacha builds on Hadoop’s classic three‑pillar model (storage, compute, scheduling) and extends it with a four‑layer data warehouse (ODS, DWD, DW, Application). Due to high query costs on public clouds, an on‑premise big‑data platform was deployed for unrestricted SQL execution.

2. Data Warehouse Upgrade – The upgraded warehouse adds Hive2, Kyuubi, and Trino to boost compute performance, while addressing compatibility challenges between Hive versions and metadata synchronization.

3. Unified Metadata Layer – Multiple compute engines (Hive1, Hive2, SparkSQL, Kyuubi, Trino) share a single metadata catalog, enabling consistent table definitions across clusters.

4. Hybrid Cloud Architecture – Offline and real‑time clusters are separated; real‑time stores (TiDB, MongoDB) complement batch processing (Flink, Spark Streaming). Object storage (OBS, OSS, COS, CEPH, MinIO) and middleware (Alluxio, JuiceFS) provide a unified storage interface.

5. Middleware Compatibility – Alluxio supports Hadoop 2.2‑3.3 and JuiceFS supports Hadoop 2.0‑3.0, allowing cross‑version data access without extensive code changes.

6. Multi‑Cloud Unified Architecture – Two independent Hadoop clusters (e.g., Hadoop 2 and Hadoop 3) are linked via a shared metadata layer and middleware, achieving cross‑cloud data access, version‑agnostic federation, and cost‑effective EC storage in Hadoop 3.

7. Engine Unification & Kubernetes Integration – An intelligent engine (e.g., Coral) can translate SQL across engines, while Koordinator and Kubernetes orchestrate resources, reducing over‑provisioning and improving real‑time workload efficiency.

Conclusion – By iteratively innovating the architecture while cautiously implementing it, Qichacha achieves lower storage costs, flexible compute, unified metadata, and a scalable multi‑cloud data platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Multi-Cloud Data Warehouse metadata management Hadoop

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.