Big Data 14 min read

Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase’s data‑lake technology manager explores the distinction between data lakes and lakehouses, the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, their maturity across key capabilities, and the practical adoption challenges faced by enterprises.

DataFunSummit
DataFunSummit
DataFunSummit
Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

The article is an interview compiled by DataFun with NetEase data‑lake technology manager Ma Jin, aiming to help readers understand the current state and challenges of data lake and lakehouse applications.

Data Lake vs. Lakehouse – Data lakes store both structured and unstructured data and have matured on Hadoop, while lakehouses (or lake‑house integration) are in a promotional phase, offering incremental performance gains and cost benefits but lacking strong demand from large enterprises.

Technology Evolution – The lakehouse concept originated from Databricks to showcase AI and BI support. Delta Lake tightly integrates with Spark, while open‑source projects Hudi and Iceberg pursue incremental updates and table‑format standards respectively. Iceberg’s clear table‑API, strong ecosystem, and cloud‑friendly design have made it the most mature among the three.

Feature Maturity and Importance – ACID transactions, rollback, and concurrency control are highly mature; change‑data‑capture is moderate; time‑travel and schema evolution are also relatively mature. Streaming‑batch integration, efficient concurrent updates, and file‑size optimization remain less mature. The importance ranking places ACID/rollback/concurrency first, followed by CDC, time‑travel, and schema evolution.

Stream‑Batch Integration – Table‑format lakehouses enable ACID, update/delete, time‑travel, and schema evolution, improving data freshness from hourly to minute‑level and supporting real‑time training scenarios such as recommendation‑system feature engineering.

Adoption Challenges – Large enterprises often apply lakehouse technology sporadically rather than systematically, due to limited cost‑benefit and modest performance gains compared with dedicated OLAP engines like ClickHouse or Doris. Smaller teams find lakehouse features more attractive, while systematic users may avoid ACID and rollback because their workflows rely on scheduled batch jobs.

Economic Considerations – Lakehouse offers lower storage costs and unified data models, but extreme query performance is better served by specialized OLAP engines. Enterprises weigh cost savings against the effort required to migrate existing Hadoop/Hive stacks to lakehouse solutions.

Future Outlook – The industry is moving toward cloud‑native, storage‑compute separation, increasing focus on cost efficiency. However, the pace of lakehouse adoption remains slow, and significant innovation at the upper layers is needed to unlock its full value.

The interview concludes that lakehouse technology is more appealing to small‑to‑mid‑size companies for cost and flexibility, while large enterprises prioritize performance and may adopt lakehouse features selectively.

Big Datadata lakeIcebergLakehouseHudiDelta LakeTable Format
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.