Tag

Lakehouse

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarBig Data
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
DataFunSummit
DataFunSummit
Jan 14, 2025 · Big Data

Tencent Real-Time Lakehouse Intelligent Optimization Practice

This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.

Auto OptimizeBig DataData Optimization
0 likes · 12 min read
Tencent Real-Time Lakehouse Intelligent Optimization Practice
DataFunSummit
DataFunSummit
Jan 3, 2025 · Big Data

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.

Big DataData OptimizationIceberg
0 likes · 11 min read
Tencent Real‑Time Lakehouse Intelligent Optimization Practices
DataFunSummit
DataFunSummit
Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineCompactionData Optimization
0 likes · 11 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
DataFunSummit
DataFunSummit
Dec 2, 2024 · Big Data

Gravitino Powers TBDS Product Architecture Upgrade with a Unified Metadata Lake

This article explains how Tencent Cloud's TBDS platform evolves its architecture by adopting Apache Gravitino as a unified metadata lake, detailing the challenges of legacy versus new lakehouse designs, storage and compute separation, unified data access, permission management, and the resulting benefits for big‑data and AI workloads.

Big DataData ArchitectureGravitino
0 likes · 15 min read
Gravitino Powers TBDS Product Architecture Upgrade with a Unified Metadata Lake
DataFunSummit
DataFunSummit
Nov 5, 2024 · Big Data

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

This article presents Tencent's real-time lakehouse architecture, detailing its three-layer design, the Auto Optimize Service with compaction, indexing, clustering and engine acceleration, scenario capabilities such as multi‑stream joins and in‑place migration, and outlines future optimization directions.

Big DataData LakeIceberg
0 likes · 11 min read
Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices
DataFunTalk
DataFunTalk
Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataData ArchitectureData Lake
0 likes · 10 min read
Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions
DataFunTalk
DataFunTalk
Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataData LakeDelta Lake
0 likes · 12 min read
Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

Tencent’s real‑time lakehouse combines Spark, Flink, StarRocks and Presto compute layers with Iceberg‑based management and HDFS/COS storage, and its Intelligent Optimize Service—comprising Compaction, Expiration, Cleaning, Clustering, Index and Auto‑Engine modules—automatically reduces merge time, improves query performance, enables secondary indexing, and dynamically routes hot partitions, while future plans target cold/hot separation, materialized view acceleration, and AI‑driven optimizations.

Big DataClusteringCompaction
0 likes · 12 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
DataFunSummit
DataFunSummit
Sep 7, 2024 · Big Data

Observations on the Third Evolution of Data Infrastructure and the Next‑Generation Data Platform Architecture

This article reviews the current state of data platforms, analyzes the third wave of data infrastructure evolution driven by databases, big data and generative AI, proposes next‑generation lakehouse and cloud‑native architectural directions, and outlines future trends and unresolved challenges for AI‑centric data platforms.

AI integrationBig DataData Architecture
0 likes · 21 min read
Observations on the Third Evolution of Data Infrastructure and the Next‑Generation Data Platform Architecture
DataFunSummit
DataFunSummit
Aug 26, 2024 · Big Data

Building a Doris‑Based Lakehouse Integrated Analytics System at Kuaishou

This article presents Kuaishou's experience of designing and implementing a Doris‑driven lakehouse integrated analytics system, covering the current OLAP landscape, challenges of data duplication and governance, the new architecture with caching and auto‑materialization, implementation details, performance impact, and future work.

Auto MaterializationBig DataCaching
0 likes · 24 min read
Building a Doris‑Based Lakehouse Integrated Analytics System at Kuaishou
DataFunSummit
DataFunSummit
Aug 6, 2024 · Big Data

Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company

This article details how a SaaS CRM provider built a cloud‑native Lakehouse platform to support multi‑tenant real‑time analytics, describing data challenges, metadata‑driven architecture, virtual database design, query optimization, BI integration, AI readiness, migration steps, and the resulting performance and scalability gains.

Big DataData PlatformLakehouse
0 likes · 19 min read
Implementing a Multi‑Tenant Lakehouse Data Platform for Real‑Time Analytics at a SaaS CRM Company
Wukong Talks Architecture
Wukong Talks Architecture
Aug 6, 2024 · Databases

Migrating Tencent Music's Data Infrastructure from ClickHouse and Druid to StarRocks: Strategy, Implementation, and Best Practices

This article details how Tencent Music’s data‑infrastructure team migrated thousands of ClickHouse and Druid nodes to a StarRocks compute‑storage‑separated lakehouse, achieving 40‑50% cost reduction while maintaining query performance, and shares the technical challenges, solutions, and best‑practice recommendations gathered during the process.

ClickHouseCost ReductionDruid
0 likes · 19 min read
Migrating Tencent Music's Data Infrastructure from ClickHouse and Druid to StarRocks: Strategy, Implementation, and Best Practices
DataFunSummit
DataFunSummit
Jul 12, 2024 · Big Data

Data Lake Development Trends, Architecture, Integration, Lakehouse Core Capabilities, and Open Design

This article examines the current evolution of data lakes, detailing their overall architecture, batch and real‑time integration methods, Lakehouse core functionalities such as enhanced DML, schema evolution, ACID support, and open‑design principles that enable multi‑cloud deployment and seamless interaction with diverse compute engines.

Data LakeLakehouseOpen Data Formats
0 likes · 12 min read
Data Lake Development Trends, Architecture, Integration, Lakehouse Core Capabilities, and Open Design
DataFunTalk
DataFunTalk
Jul 1, 2024 · Big Data

DataFunCon2024 Beijing: Real‑Time Lakehouse and Big Data Sessions

The DataFunCon2024 Beijing conference on July 5‑6 showcases a series of technical talks about real‑time lakehouse architectures, big‑data analytics, and cloud‑native data warehouses, offering practitioners insights into Apache Paimon, SelectDB, and Doris implementations for faster, more agile data processing.

Apache PaimonBig DataConference
0 likes · 8 min read
DataFunCon2024 Beijing: Real‑Time Lakehouse and Big Data Sessions
DataFunTalk
DataFunTalk
Jun 10, 2024 · Big Data

Data Lake Development Trends, Architecture, Integration, and Lakehouse Core Capabilities

This article reviews the latest developments in data lakes, including trend analysis, overall architecture, data integration methods, Lakehouse core capabilities, open design principles, stream‑batch unified processing, real‑time OLAP, and lake‑internal warehousing, highlighting how these advances reduce complexity and cost while improving data sharing and performance.

Data LakeLakehouseReal-time OLAP
0 likes · 14 min read
Data Lake Development Trends, Architecture, Integration, and Lakehouse Core Capabilities
DataFunTalk
DataFunTalk
Jun 9, 2024 · Big Data

Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration

This article details how the WeChat team leverages ClickHouse at massive scale, introduces a suite of performance observation tools, describes lakehouse reading and bitmap optimizations, and explains the integration of AI workloads, demonstrating overall query speedups of up to tenfold across diverse scenarios.

Artificial IntelligenceBig DataBitmap
0 likes · 10 min read
Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration
DataFunSummit
DataFunSummit
Jun 5, 2024 · Big Data

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Databricks announced the acquisition of Tabular, the company founded by the original creators of Apache Iceberg, aiming to integrate Delta Lake and Iceberg into a unified, open lakehouse architecture that enhances format compatibility, reduces data silos, and supports AI workloads.

Apache IcebergBig DataDelta Lake
0 likes · 5 min read
Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse
DataFunSummit
DataFunSummit
May 12, 2024 · Big Data

Practice of Lakehouse‑Integrated Data Platform Architecture in the Financial Innovation Sector

This article presents the evolution of data platform architectures, the specific challenges of financial‑sector information‑technology innovation, and the design, core components, deployment path, and real‑world case studies of the cloud‑native lakehouse solution DataCyber developed by Shuxin Network.

Big DataData PlatformFinancial Innovation
0 likes · 21 min read
Practice of Lakehouse‑Integrated Data Platform Architecture in the Financial Innovation Sector