DataCake: A Multi‑Cloud Self‑Service Big Data Platform from SHAREit Group
The article introduces DataCake, a cloud‑native, multi‑cloud big data platform built by SHAREit Group that addresses massive data volume, diverse application scenarios, and governance challenges through a Data Mesh‑inspired self‑service architecture, offering unified data management, intelligent governance, and a roadmap for future enhancements.
SHAREit Group (formerly Qiezi Technology) has rapidly grown its user base to over 2.4 billion installations worldwide, creating massive data demands that require a sophisticated big‑data platform.
The article first outlines the background and challenges: exponential data growth, expanding application scenarios, and untapped data potential, leading to three core problems—data not delivering business value, and difficulty in data governance due to complex, fragmented pipelines.
Three stakeholder perspectives are highlighted: business owners struggling to turn data into value, analysts facing steep learning curves and long development cycles, and technical leads dealing with exploding ETL tasks, unclear data lineage, and opaque cloud‑native tools.
To solve these issues, DataCake adopts a Data Mesh philosophy, shifting from a centralized data team to domain‑driven ownership, and implements three key concepts: a self‑serve platform, treating data as a product, and federated governance that combines distributed development with centralized control.
DataCake’s functional pillars include:
Self‑service big‑data application platform: low‑code pipelines, unified analytics, visualisation, and custom reporting.
Intelligent data governance and security: cost‑aware billing, AI‑assisted governance, and fine‑grained permission management.
Unified data management: metadata cataloguing, data‑asset discovery, quality monitoring, and breaking data silos across lakes, warehouses, and databases.
Lake‑warehouse integration: direct ingestion of raw data into the lake with optional warehousing for less‑time‑critical workloads.
The technical architecture is described across three layers:
IaaS: built on multiple public‑cloud providers to avoid vendor lock‑in.
PaaS: serverless compute supporting ad‑hoc, batch, streaming, and native cloud engines, with elastic scaling.
SaaS: integration with tools like HUE and Tableau, unified resource management, and cross‑cloud data access. Additional capabilities include minimal‑code data analysis, low‑threshold data development via templated pipelines, unified data management with lineage visualisation, and AI‑driven automated governance. The roadmap foresees a fully managed SaaS offering across multiple clouds and continued development of an open‑source, intelligent, one‑stop big‑data platform that maximises business value.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.