Cloud Native 10 min read

How Tencent Cloud’s Native Data Lake Redefines Big Data Analytics

This article examines the evolution of data lakes, outlines the challenges enterprises face with massive, heterogeneous data, and details Tencent Cloud’s native data lake architecture and its serverless Data Lake Compute service, highlighting performance, cost‑efficiency, and future development directions.

Tencent Cloud Developer

Mar 29, 2021

How Tencent Cloud’s Native Data Lake Redefines Big Data Analytics

Introduction: Enterprise Pain Points

Many organizations struggle with simple business‑statistics requests that become bottlenecks, experience resource waste during peak periods, and find large‑scale data‑lake operations overly complex, requiring extensive engineering effort.

1. The Past and Present of Data Lakes

James Dixon of Pentaho coined the term "data lake" in 2010, describing it as a raw‑water‑like repository for unprocessed structured and unstructured data. Early implementations focused on storage capacity. With the rise of HDFS and object storage, low‑cost massive storage became feasible, shifting the focus to agile data analysis and AI model training.

2. Tencent Cloud’s Native Data Lake Architecture

Tencent Cloud offers a cloud‑native data‑lake system that integrates massive heterogeneous storage, diversified analytical capabilities, and AI‑driven services. The platform aims to lower storage and computation costs while boosting data‑driven decision‑making agility.

3. Cloud‑Native Data Lake Compute (DLC)

The Data Lake Compute (DLC) service is a serverless solution that lets users run standard SQL over COS (Object Storage) and other cloud data sources without managing underlying resources. It eliminates the need for traditional data‑layer modeling, dramatically reducing preparation time for massive data analysis.

Key technical features include:

High‑Performance Compute: A serverless Presto engine optimized for data‑lake storage delivers stable, fast query performance.

Cache‑Accelerated Scans: Read‑only workloads achieve 75‑85% cache hit rates, leveraging data‑skew to speed up scans.

Storage‑Side Optimizations: Sparse indexing, partitioning, and bucketing reduce scan volume; AP‑to‑TP format alignment further improves query speed.

Three‑Tier Storage Acceleration: Data moves from COS to a near‑compute cache layer, providing order‑of‑magnitude access speed improvements.

Unlimited, Low‑Cost Serverless Resources: Built on Tencent Cloud EKS, compute resources scale elastically in seconds, with automatic expansion when DLC predicts insufficient capacity.

4. Future Outlook

Tencent Cloud plans to deepen five key areas:

More flexible and efficient compute‑engine scheduling, leveraging multi‑engine selection and cost‑based optimization.

Enhanced data‑ingestion capabilities with ACID transaction support to speed up ETL pipelines.

Improved stream‑batch processing using a unified high‑performance storage model.

Better compatibility and extensibility for Hadoop ecosystems and object‑storage semantics, including intelligent row‑column hybrid storage.

Lower‑cost serverless compute via upcoming competitive‑pricing container services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Serverless Analytics Data Lake Tencent Cloud

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.