Tag

Compaction

0 views collected around this technical thread.

DeWu Technology
DeWu Technology
Mar 3, 2025 · Databases

Implementing an LSM‑Tree in Zig: Core Components, Write/Read Logic, and Compaction

The article walks through a complete Zig implementation of an LSM‑Tree, detailing its in‑memory skip‑list MemTable, immutable SSTable blocks with compression and Bloom filters, write‑ahead logging, iterator hierarchy for reads, and multi‑level compaction logic that merges and rewrites SSTables.

CompactionDatabaseIterators
0 likes · 42 min read
Implementing an LSM‑Tree in Zig: Core Components, Write/Read Logic, and Compaction
DataFunSummit
DataFunSummit
Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineCompactionData Optimization
0 likes · 11 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
Tencent Advertising Technology
Tencent Advertising Technology
Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCCompaction
0 likes · 25 min read
Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent
Aikesheng Open Source Community
Aikesheng Open Source Community
Oct 15, 2024 · Databases

Troubleshooting Compaction Stuck Issue in OceanBase: Diagnosis and Resolution

This article details a step‑by‑step investigation of a compaction‑stuck problem in OceanBase, covering background, environment setup, view and log analysis, root‑cause identification related to clock drift, and the corrective actions taken to restore normal merging.

Clock DriftCompactionDatabase
0 likes · 13 min read
Troubleshooting Compaction Stuck Issue in OceanBase: Diagnosis and Resolution
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

Tencent’s real‑time lakehouse combines Spark, Flink, StarRocks and Presto compute layers with Iceberg‑based management and HDFS/COS storage, and its Intelligent Optimize Service—comprising Compaction, Expiration, Cleaning, Clustering, Index and Auto‑Engine modules—automatically reduces merge time, improves query performance, enables secondary indexing, and dynamically routes hot partitions, while future plans target cold/hot separation, materialized view acceleration, and AI‑driven optimizations.

Big DataClusteringCompaction
0 likes · 12 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
Top Architecture Tech Stack
Top Architecture Tech Stack
Jul 16, 2024 · Databases

Understanding LSM-Tree Architecture and Its Applications in Big Data Systems

The article explains the Log-Structured Merge-Tree (LSM) architecture, its core components, advantages and disadvantages, and demonstrates how it is employed in big‑data platforms such as HBase and Apache Druid to achieve high‑throughput writes and scalable query processing.

Big DataCompactionLSM Tree
0 likes · 7 min read
Understanding LSM-Tree Architecture and Its Applications in Big Data Systems
Cognitive Technology Team
Cognitive Technology Team
Jan 21, 2024 · Databases

Understanding LSM-Tree (Log-Structured Merge Tree) and Its Storage Mechanisms

This article explains the Log-Structured Merge Tree (LSM-Tree) architecture, describing its immutable storage design, the roles of WAL, MemTable, ImmuTable, and SSTable, and detailing the write workflow, compaction process, and the associated read, space, and write amplification challenges.

CompactionLSM TreeLog-Structured Merge Tree
0 likes · 7 min read
Understanding LSM-Tree (Log-Structured Merge Tree) and Its Storage Mechanisms
Deepin Linux
Deepin Linux
Dec 9, 2023 · Fundamentals

Linux Page Reclaim Mechanism and Memory Compaction: Detailed Source Code Analysis

This article explains the Linux page‑reclaim mechanism, its goals, common techniques, the allocation paths, LRU data structures, and provides an in‑depth walkthrough of the kernel source code for slow‑path reclaim, direct reclaim, and memory compaction, including all relevant functions and code snippets.

CompactionLinuxMemory Management
0 likes · 80 min read
Linux Page Reclaim Mechanism and Memory Compaction: Detailed Source Code Analysis
Aikesheng Open Source Community
Aikesheng Open Source Community
Nov 8, 2023 · Databases

Analyzing OceanBase Freeze Dump Process via Log Parsing

This article explains how to parse OceanBase logs to trace the tenant freeze dump workflow, detailing the roles and log sequences of the freeze check thread, LSFreeze, Flush, DagScheduler, and MiniMerge threads, and illustrating each step with actual log excerpts and code snippets.

CompactionDAGDatabase
0 likes · 16 min read
Analyzing OceanBase Freeze Dump Process via Log Parsing
DataFunTalk
DataFunTalk
Aug 30, 2023 · Big Data

Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data

This article presents Baidu Cloud's block storage architecture, comparing replication and erasure‑coding fault‑tolerance methods, detailing the challenges of applying EC to mutable block data, and describing a two‑layer append‑engine solution with selective 3‑replica caching, cost‑benefit compaction, and performance optimizations for low‑cost, high‑throughput storage.

Append EngineBig DataCompaction
0 likes · 14 min read
Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data
Sohu Tech Products
Sohu Tech Products
Aug 16, 2023 · Big Data

Understanding HBase Compaction: Principles, Process, Throttling Strategies and Real‑World Optimizations

This article explains HBase’s LSM‑Tree compaction fundamentals—including minor and major compaction triggers, file‑selection policies, dynamic throughput throttling, and practical tuning examples that show how adjusting size limits, thread pools, and off‑peak settings can dramatically improve read latency and cluster stability.

Big DataCompactionHBase
0 likes · 35 min read
Understanding HBase Compaction: Principles, Process, Throttling Strategies and Real‑World Optimizations
vivo Internet Technology
vivo Internet Technology
Jul 26, 2023 · Big Data

Understanding HBase Compaction: Principles, Process, Throttling Strategies, and Optimization Cases

Understanding HBase compaction involves knowing its minor and major merge types, trigger mechanisms, file‑selection policies such as RatioBased and Exploring, throttling controls based on file count, and practical tuning of key parameters to avoid latency spikes, as illustrated by real‑world production cases.

Big DataCompactionHBase
0 likes · 36 min read
Understanding HBase Compaction: Principles, Process, Throttling Strategies, and Optimization Cases
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 9, 2023 · Databases

In‑Depth Exploration of OceanBase Hierarchical Dump and Compaction Mechanisms

This article explains the LSM‑Tree foundation of OceanBase, details its tiered and leveled compaction strategies, and presents two experiments that observe Mini and Minor compactions under different configuration parameters, revealing how minor freeze and trigger settings affect data movement between L0 and L1 layers.

CompactionDatabase StorageLSM Tree
0 likes · 13 min read
In‑Depth Exploration of OceanBase Hierarchical Dump and Compaction Mechanisms
DataFunSummit
DataFunSummit
Feb 28, 2023 · Big Data

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

This article introduces the Iceberg table format, explains its core architecture and advantages such as transactionality, implicit partitioning and row‑level updates, details Xiaomi's practical deployments—including CDC pipelines, partition strategies, compaction services, and stream‑batch integration—and outlines future development directions.

Big DataCompactionFlink
0 likes · 20 min read
Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans
Big Data Technology Architecture
Big Data Technology Architecture
Aug 13, 2022 · Big Data

Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices

This article details Xiaomi's three‑year journey of adopting Apache Doris across dozens of internal services, describing the transition from a Spark‑SQL‑based Lambda architecture to a unified MPP database, performance benchmarks, data ingestion pipelines, compaction tuning, two‑phase commit, single‑replica writes, monitoring, and community contributions.

Apache DorisCompactionData Warehouse
0 likes · 19 min read
Apache Doris at Xiaomi: Architecture Evolution, Performance Optimizations, and Production Practices
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Mar 16, 2022 · Databases

RDB: Cloud Music's Customized Algorithm Feature KV Storage System Based on RocksDB

To meet Cloud Music’s massive algorithm‑feature KV storage needs, the team built RDB—a RocksDB‑based engine within Tair—adding bulk‑load, dual‑version imports, KV‑separation, in‑place sequence appends and protobuf field updates, cutting storage cost, write amplification and latency while scaling to billions of records and millions of QPS.

Algorithm FeaturesBulkloadCompaction
0 likes · 16 min read
RDB: Cloud Music's Customized Algorithm Feature KV Storage System Based on RocksDB
DataFunTalk
DataFunTalk
Feb 25, 2022 · Big Data

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

This article explains how Tencent leverages Apache Iceberg together with Flink to build a real‑time data lake pipeline, covering data ingestion, Iceberg's snapshot‑based read/write model, compaction and governance services, Z‑order based query optimization, performance results, and future roadmap.

Apache IcebergBig DataCompaction
0 likes · 24 min read
Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization
Tencent Cloud Developer
Tencent Cloud Developer
Dec 10, 2020 · Databases

Understanding LevelDB Architecture, Read/Write Flow, and Compaction Process

LevelDB stores data using an in‑memory Memtable that flushes to immutable tables and disk‑based SSTables, writes are logged then batched and applied through a writer queue, reads check Memtable, immutable Memtable, then SSTables, and background compactions merge tables to improve read performance and reclaim space.

CompactionDatabase InternalsLSM Tree
0 likes · 16 min read
Understanding LevelDB Architecture, Read/Write Flow, and Compaction Process
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Nov 4, 2020 · Databases

How to Safely Drop Massive Data in TiDB Without Causing Write Stall

This article explains why dropping large amounts of data in a TiDB cluster can trigger compaction flow‑control, leading to write stalls and QPS jitter, and provides step‑by‑step troubleshooting, configuration tweaks, and best‑practice recommendations to resolve the issue.

CompactionData CleanupRegion Merge
0 likes · 20 min read
How to Safely Drop Massive Data in TiDB Without Causing Write Stall