Tag

Z-Order

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jul 23, 2024 · Big Data

Practical Experience with Apache Kyuubi and Apache Celeborn in Big Data Platforms

This article shares detailed practical experiences from DingXiangYuan's big‑data platform on using Apache Kyuubi and Apache Celeborn, covering architecture, flexible configuration, AuthZ fine‑grained permissions, small‑file and Z‑Order optimizations, Arrow‑based large result transmission, and operational tips such as connection‑level issues and Netty cache handling.

ARROWApache CelebornApache Kyuubi
0 likes · 17 min read
Practical Experience with Apache Kyuubi and Apache Celeborn in Big Data Platforms
DataFunSummit
DataFunSummit
Nov 25, 2023 · Big Data

Practical Experience with Apache Kyuubi and Celeborn on the DXY Big Data Platform

This article presents a comprehensive technical overview of how DXY's big data platform leverages Apache Kyuubi and Celeborn to unify Spark entry points, configure flexible task isolation, implement fine‑grained AuthZ, optimize small files and Z‑Order sorting, and accelerate large result set transmission with Arrow, while also discussing operational challenges and upcoming features.

ARROWApache KyuubiCeleborn
0 likes · 17 min read
Practical Experience with Apache Kyuubi and Celeborn on the DXY Big Data Platform
DataFunSummit
DataFunSummit
Oct 11, 2022 · Big Data

Building Lakehouse Architecture with Delta Lake: Core Concepts, Technologies, Ecosystem, and Use Cases

This article explains how to construct a lakehouse architecture using Delta Lake by covering its basic concepts, version‑2 features, internal kernel and key technologies, ecosystem integrations, and classic data‑warehouse use cases such as G‑SCD and change‑data‑capture, providing practical guidance for modern big‑data engineering.

ACID TransactionsChange Data CaptureDelta Lake
0 likes · 27 min read
Building Lakehouse Architecture with Delta Lake: Core Concepts, Technologies, Ecosystem, and Use Cases
DataFunSummit
DataFunSummit
Sep 27, 2022 · Big Data

Apache Spark Adaptive Query Execution and Kyuubi Optimization Practices for Data Warehousing

This article presents a detailed overview of Apache Spark's Adaptive Query Execution evolution, its optimization techniques, and performance gains, followed by an in‑depth discussion of Apache Kyuubi's architecture, security integrations, cloud‑native capabilities, and practical Rebalance + Z‑Order strategies that enhance data‑warehouse task efficiency and query performance.

Adaptive Query ExecutionApache SparkBig Data Optimization
0 likes · 19 min read
Apache Spark Adaptive Query Execution and Kyuubi Optimization Practices for Data Warehousing
DataFunTalk
DataFunTalk
Aug 1, 2022 · Big Data

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.

AlluxioData OptimizationZ-Order
0 likes · 21 min read
Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices
DataFunTalk
DataFunTalk
Jul 15, 2022 · Big Data

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

This article explains Bilibili's lake‑warehouse integrated architecture, describing how Iceberg, MagnuS, Trino, and Alluxio are used to achieve flexible data storage, high‑performance query acceleration, and automated indexing through Z‑Order, Hilbert curve, Bloom filter, and advanced BitMap techniques.

Index OptimizationQuery AccelerationZ-Order
0 likes · 18 min read
Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices
DataFunSummit
DataFunSummit
May 30, 2022 · Big Data

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

This article explains Bilibili's lake‑warehouse integrated architecture, describing how Iceberg, Z‑Order sorting, and advanced indexing techniques such as BloomFilter and BitMap are used to accelerate queries and improve data organization in large‑scale analytics workloads.

IndexingZ-Orderbig data
0 likes · 18 min read
Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices
DataFunSummit
DataFunSummit
Apr 29, 2022 · Big Data

Optimizing Query Performance in Apache Iceberg with Z‑Order Data Organization

This article explains how Apache Iceberg’s DataSkipping technique can lose efficiency when many filter columns are used, and presents a data‑organization optimization using space‑filling curves and Z‑Order to improve query I/O, details the OPTIMIZE implementation, and shares performance benchmark results and future plans.

Apache IcebergData SkippingPerformance Benchmark
0 likes · 12 min read
Optimizing Query Performance in Apache Iceberg with Z‑Order Data Organization
DataFunTalk
DataFunTalk
Apr 9, 2022 · Big Data

Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization

This talk explains how Apache Iceberg’s DataSkipping can lose efficiency with many filter columns, and presents a data‑organization redesign using space‑filling curves and Z‑Order to improve query I/O, detailing the OPTIMIZE syntax, implementation steps, performance benchmarks, and future roadmap.

Apache IcebergData SkippingQuery Optimization
0 likes · 12 min read
Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization
DataFunTalk
DataFunTalk
Feb 25, 2022 · Big Data

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

This article explains how Tencent leverages Apache Iceberg together with Flink to build a real‑time data lake pipeline, covering data ingestion, Iceberg's snapshot‑based read/write model, compaction and governance services, Z‑order based query optimization, performance results, and future roadmap.

Apache IcebergCompactionFlink
0 likes · 24 min read
Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization
Big Data Technology Architecture
Big Data Technology Architecture
Mar 4, 2021 · Big Data

Improving Interactive Analysis on Massive Datasets with Data Clustering and Data Skipping Using Spark and Iceberg

This article explores how data clustering techniques such as linear order, Z‑order, and Hilbert‑curve ordering can be applied in Apache Spark and Apache Iceberg to achieve efficient data skipping on terabyte‑scale tables, dramatically reducing file scans and enabling sub‑second interactive analytics for multi‑dimensional queries.

Data SkippingSparkZ-Order
0 likes · 20 min read
Improving Interactive Analysis on Massive Datasets with Data Clustering and Data Skipping Using Spark and Iceberg