Cloud Native 13 min read

Next‑Generation Cloud‑Native Data Lake Architecture: Value, Principles, Challenges, and Tencent Solutions

The talk outlines a next‑generation cloud‑native data lake that leverages elastic Kubernetes compute, object‑storage, and Apache Iceberg to cut costs 3‑10× while boosting performance, and presents Tencent’s Data Lake Compute and Data Lake Fabric solutions that address scalability, reliability, and operational challenges through serverless, unified, multi‑engine architecture.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Next‑Generation Cloud‑Native Data Lake Architecture: Value, Principles, Challenges, and Tencent Solutions

This article summarizes a talk by Tencent big‑data expert Yu Huali at the Techo TVP Developer Summit, introducing the concept, value, principles, challenges, and Tencent’s solutions for a cloud‑native data lake.

The presentation is organized into four stages: (1) defining what a cloud‑native data lake is, its background, value, and challenges; (2) how to build one, covering technical difficulties; (3) an overview of two Tencent products—Data Lake Compute (DLC) and Data Lake Fabric (DLF); and (4) a holistic solution architecture on Tencent Cloud.

The speaker defines a cloud‑native data lake as a high‑performance, cost‑effective big‑data platform that fully leverages elastic computing and object‑storage advantages.

Traditional data‑lake architectures suffer from four major pain points: high cost (mismatched HDFS storage and compute), low flexibility (poor ad‑hoc and back‑fill support, difficult upgrades), poor performance (NameNode bottlenecks, shuffle limited by disks), and reliability issues (lack of multi‑AZ HA). Cloud‑native approaches address these by using elastic compute (spot instances, auto‑scaling) for a 3‑5× cost advantage and object storage for a 5‑10× cost advantage, along with better cross‑AZ bandwidth, lifecycle management, and cold‑data handling.

Building a cloud‑native data lake presents challenges such as rapid scale‑up/down, handling spot‑instance revocation, and mitigating the lack of rename semantics, eventual consistency, and limited list performance of object storage. Solutions include using Kubernetes for container orchestration, implementing caching layers, applying sparse indexing and predicate push‑down, and redesigning commit algorithms.

Tencent’s two products address these challenges. DLC focuses on serverless analytics and federated computing, while DLF provides unified metadata management and efficient data ingestion. Both share a unified backend consisting of: (1) container service (Tencent Cloud K8s) with a standard SQL entry point and support for multiple engines (Presto, Spark, Hive, TEG Supersql); (2) a scalable metadata service built on an extended Hive Metastore; and (3) object‑storage‑based table‑format storage.

The core enabling technology highlighted is Apache Iceberg. Iceberg solves the rename‑semantic problem, caches file lists to eliminate costly list operations, and supports sparse indexing and predicate push‑down, thereby improving performance and mitigating eventual‑consistency impacts.

The overall solution delivers three key benefits: low cost (3‑5× compute, 5‑10× storage), high performance (caching, indexing, optimized shuffle and commit), and a serverless, fully managed experience that removes operational overhead. The architecture is unified and open, allowing various engines (EMR, Flink, etc.) to operate on the same data lake.

Speaker bio: Yu Huali, Tencent big‑data expert engineer, head of cloud‑native data lake kernel R&D, Fudan University mathematics graduate, with extensive public‑cloud big‑data experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeCost OptimizationData LakeIcebergTencent Cloud
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.