Tagged articles

Columnar Storage

76 articles · Page 1 of 1

Mar 22, 2026 · Big Data

How Dremel Encodes Nested Data: Definition & Repetition Levels Explained

This article breaks down Dremel's columnar encoding for nested data, detailing the definition‑level and repetition‑level concepts, showing step‑by‑step examples of encoding and reconstructing JSON‑like schemas, and explaining the limits of single‑column reconstruction.

Apache ArrowColumnar StorageDremel

0 likes · 9 min read

How Dremel Encodes Nested Data: Definition & Repetition Levels Explained

TonyBai

Mar 13, 2026 · Backend Development

Why DuckDB Beats ClickHouse for Light‑Weight Analytics: 18 M Rows/sec in a Single Go Binary

The article analyzes why traditional row‑oriented databases struggle with high‑volume analytics, introduces DuckDB as an embedded columnar engine for Go, presents benchmark results of up to 18.6 M rows per second writes and 6 M rows per second scans, walks through the Appender API code, and outlines the trade‑offs and ideal hybrid architecture.

Columnar StorageDuckDBEmbedded Database

0 likes · 11 min read

Why DuckDB Beats ClickHouse for Light‑Weight Analytics: 18 M Rows/sec in a Single Go Binary

Alibaba Cloud Developer

Feb 27, 2026 · Databases

How DuckDB Compression Supercharges AliSQL Storage and Cuts MySQL Costs

AliSQL integrates DuckDB as its storage engine to achieve high‑density columnar compression and fast analytical scans, detailing DuckDB’s multi‑layer storage format, adaptive compression algorithm selection, performance benchmarks versus InnoDB, HBase, ClickHouse, OceanBase, and the engineering optimizations AliSQL adds for throughput and cost reduction.

AliSQLColumnar StorageDuckDB

0 likes · 12 min read

How DuckDB Compression Supercharges AliSQL Storage and Cuts MySQL Costs

Alibaba Cloud Big Data AI Platform

Dec 24, 2025 · Big Data

How Paimon’s Column‑Separation Architecture Powers Real‑Time Multi‑Modal Lakehouse for AI

This article explains the challenges of frequent column changes in AI feature engineering, introduces Paimon’s column‑separation storage with a global continuous Row ID, details its Blob data type for efficient multi‑modal handling, and outlines production results and future roadmap for building an AI‑native data lakehouse.

Apache PaimonBLOBBig Data

0 likes · 11 min read

How Paimon’s Column‑Separation Architecture Powers Real‑Time Multi‑Modal Lakehouse for AI

Data STUDIO

Dec 5, 2025 · Big Data

Why Parquet Is the Default Choice for Big Data Storage

The article explains how Apache Parquet’s columnar layout, multi‑level row‑group structure, projection and predicate push‑down, and advanced compression and encoding make it the high‑performance, space‑efficient storage format that powers modern big‑data ecosystems and tools like Spark, Python pandas, and ClickHouse.

Big DataClickHouseColumnar Storage

0 likes · 11 min read

Why Parquet Is the Default Choice for Big Data Storage

StarRocks

Nov 5, 2025 · Databases

How FlatJSON Transforms JSON Queries in StarRocks 4.0 for Near‑Columnar Performance

StarRocks 4.0 introduces FlatJSON, a columnar storage and execution engine that converts high‑frequency JSON fields into native columns, dramatically reducing I/O and CPU costs and enabling JSON queries to run with performance close to that of traditional columnar data.

Columnar StorageDatabase PerformanceFlatJSON

0 likes · 19 min read

How FlatJSON Transforms JSON Queries in StarRocks 4.0 for Near‑Columnar Performance

Big Data Technology Tribe

Oct 18, 2025 · Databases

How Adaptive Structural Encoding Boosts Random Access in Columnar Storage

This article examines how adaptive structural encoding in columnar formats like Lance dramatically improves random‑access performance on NVMe storage, compares it with Apache Parquet and Arrow, and discusses the trade‑offs between scan speed, memory usage, and compression.

Columnar StorageLanceNVMe

0 likes · 17 min read

How Adaptive Structural Encoding Boosts Random Access in Columnar Storage

ITPUB

Oct 11, 2025 · Databases

How OceanBase Achieves Real‑Time HTAP: Inside Its Unified Storage and Vectorized Engine

This article details OceanBase's evolution from a distributed OLTP system to a unified HTAP database, covering its cost‑based optimizer, vectorized execution, integrated row‑column storage, bypass import, materialized views, external tables, full‑text search, and real‑world use cases for real‑time analytics.

Columnar StorageHTAPOceanBase

0 likes · 12 min read

How OceanBase Achieves Real‑Time HTAP: Inside Its Unified Storage and Vectorized Engine

JD Tech Talk

Sep 2, 2025 · Databases

Unlock ClickHouse’s Secret Weapons: The 9 Techniques Behind Lightning‑Fast Queries

This article explores ClickHouse’s high‑performance OLAP architecture, covering its MPP design, columnar storage, vectorized execution, pre‑sorting, table engines, data types, sharding and replication strategies, as well as index designs that together enable rapid analysis of massive datasets.

ClickHouseColumnar StorageVectorized Execution

0 likes · 15 min read

Unlock ClickHouse’s Secret Weapons: The 9 Techniques Behind Lightning‑Fast Queries

JD Cloud Developers

Sep 2, 2025 · Databases

Unlocking ClickHouse’s Lightning‑Fast Queries: The ‘Nine Swords’ Architecture Explained

This article explores ClickHouse’s high‑performance OLAP design—including its MPP architecture, columnar storage, vectorized execution, pre‑sorting, sharding, replication, index strategies, and compute engine—showing how each innovation contributes to ultra‑fast, scalable data analysis in the big‑data era.

ClickHouseColumnar StorageOLAP

0 likes · 14 min read

Unlocking ClickHouse’s Lightning‑Fast Queries: The ‘Nine Swords’ Architecture Explained

Tech Freedom Circle

Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

ClickHouseColumnar StorageDistributed Query

0 likes · 29 min read

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

JD Tech

May 13, 2025 · Databases

Unlock ClickHouse’s Lightning‑Fast Queries: Architecture, Storage, and Index Secrets

This article examines ClickHouse’s high‑performance OLAP design, covering its MPP architecture, columnar storage, vectorized execution, pre‑sorting, table engines, extensive data‑type system, sharding and replication strategies, as well as its sparse and skip‑index mechanisms that together enable ultra‑fast analytics on massive datasets.

Big DataClickHouseColumnar Storage

0 likes · 16 min read

Unlock ClickHouse’s Lightning‑Fast Queries: Architecture, Storage, and Index Secrets

JD Retail Technology

Apr 8, 2025 · Databases

ClickHouse Architecture and Core Technologies Overview

ClickHouse is an open‑source, massively parallel, column‑oriented OLAP database that integrates its own columnar storage, vectorized batch processing, pre‑sorted data, diverse table engines, extensive data types, sharding with replication, sparse primary‑key and skip indexes, and a multithreaded query engine, delivering high‑throughput real‑time analytics on massive datasets.

Big DataClickHouseColumnar Storage

0 likes · 15 min read

ClickHouse Architecture and Core Technologies Overview

JD Tech Talk

Dec 26, 2024 · Databases

Using ClickHouse for Efficient Tag Bitmap Storage and Group Computation in a CDP

This article explains how ClickHouse’s columnar storage, bitmap functions, and distributed architecture can be leveraged to store billions of tag bitmaps, combine them efficiently, and support fast group calculations for customer data platforms, while addressing data‑warehouse integration, storage format, and performance challenges.

Columnar StorageOLAPbitmap

0 likes · 10 min read

Using ClickHouse for Efficient Tag Bitmap Storage and Group Computation in a CDP

Tencent Cloud Developer

Nov 1, 2024 · Databases

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Tencent Cloud's TDSQL shattered world records in both TPC‑DS (OLAP) and TPC‑C (OLTP) benchmarks, achieving a 7260 M QphDS score at a cost of 37.52 CNY/kQphDS, and the article explains the three self‑developed technologies—MPP execution, parallel execution framework, and columnar‑vectorized engine—that made this performance possible.

Columnar StorageDatabase PerformanceMPP

0 likes · 7 min read

How TDSQL Dominated Global OLAP & OLTP Benchmarks: Inside the Technical Secrets

Senior Tony

Sep 19, 2024 · Databases

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

This article compares ClickHouse and MySQL by examining benchmark results, MPP architecture, columnar storage, compression techniques, vectorized execution, and index designs, showing why ClickHouse delivers dramatically higher query performance on massive data sets.

ClickHouseColumnar StorageDatabases

0 likes · 8 min read

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

ITPUB

Aug 29, 2024 · Databases

How TeleDB Evolved from Centralized to Native Distributed Architecture

TeleDB’s journey from a centralized MySQL/PostgreSQL‑based system to a native distributed HTAP database showcases innovations such as share‑nothing architecture, columnar storage, vectorized execution, Remote Data Access, global caching, and advanced dead‑lock detection, dramatically improving query performance, storage efficiency, and scalability.

Columnar StorageHTAPTeleDB

0 likes · 13 min read

How TeleDB Evolved from Centralized to Native Distributed Architecture

21CTO

Jul 30, 2024 · Databases

How Database Architectures Evolved Over 20 Years: From Columnar to Cloud & Beyond

This article surveys two decades of database system architecture innovations—including columnar stores, cloud databases, data lakes, NewSQL, hardware accelerators, and blockchain databases—highlighting their motivations, trade‑offs, and the shifting landscape that shapes modern DBMS design.

Columnar StorageDBMSDatabases

0 likes · 23 min read

How Database Architectures Evolved Over 20 Years: From Columnar to Cloud & Beyond

vivo Internet Technology

Jul 10, 2024 · Databases

HBase Optimization Practice in Vivo's Unified Content Platform

Vivo's unified content platform replaced its unwieldy 60 TB MongoDB store with HBase, then upgraded the cluster, introduced table‑specific connection pools, column‑only reads, tuned compaction, and leveraged multi‑version cells, cutting response times from seconds to under ten milliseconds and dramatically lowering operational costs while boosting read/write performance.

Columnar StorageCompaction OptimizationHBase

0 likes · 16 min read

HBase Optimization Practice in Vivo's Unified Content Platform

DataFunSummit

Jun 21, 2024 · Big Data

Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes dynamic read‑time modeling, outlines the system’s execution flow, storage and indexing strategies, and shares practical tips and extensions for building scalable big‑data solutions.

AceroApache ArrowBig Data

0 likes · 20 min read

Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips

DataFunSummit

Apr 23, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow’s columnar in‑memory format and its zero‑copy advantages, describes how to model data at read time, outlines the execution flow with Acero and SQL planning, and shares practical tips and extensions for building robust, dynamic‑schema data platforms.

AceroApache ArrowBig Data

0 likes · 20 min read

Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips

Sohu Tech Products

Mar 6, 2024 · Big Data

Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution

The article explains how Apache Arrow’s columnar, cross‑language in‑memory format enables high‑performance, interoperable data systems—replacing traditional row‑oriented databases—by supporting dynamic schemas, zero‑copy data exchange, efficient indexing, Acero‑based query execution, and Flight/ADBC connectivity, while offering practical guidance and highlighting challenges.

Apache ArrowBig DataColumnar Storage

0 likes · 20 min read

Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution

DataFunTalk

Feb 28, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Modeling, and Execution

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes read‑time modeling and dynamic schema handling, and shows how Arrow can be used to build a complete data processing pipeline with indexing, SQL planning, and zero‑copy data exchange.

Apache ArrowBig DataColumnar Storage

0 likes · 20 min read

Building a Data System with Apache Arrow: Design, Modeling, and Execution

DataFunTalk

Jan 1, 2024 · Big Data

MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits

This article explains the nature of semi‑structured data, compares traditional schema‑on‑read and schema‑on‑write approaches, and details MaxCompute's columnar storage solution that balances flexibility, performance, and cost for large‑scale data warehouses.

Big DataColumnar StorageData Warehouse

0 likes · 19 min read

MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits

DataFunTalk

Dec 11, 2023 · Databases

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Modern Database Systems

The interview with Wu Li, a research engineer at Shanghai Yanhuang Data, explores how columnar storage, JIT compilation, and push-mode processing are reshaping modern database performance, highlighting hardware constraints, software optimizations, and product‑centric goals in the era of big data analytics.

Columnar StorageDatabasesJIT Compilation

0 likes · 11 min read

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Modern Database Systems

DataFunSummit

Dec 10, 2023 · Databases

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

In this technical interview, Wu Li, a research engineer at Shanghai Yanhuang Data, explains how hardware constraints drive database evolution, why columnar storage and SIMD acceleration are crucial for OLAP, and how JIT compilation and push‑mode processing improve query performance and product experience.

Columnar StorageDatabasesJIT Compilation

0 likes · 10 min read

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

DataFunSummit

Dec 9, 2023 · Databases

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Database Development

The article presents an interview with Wu Li, a senior R&D engineer at Shanghai Yanhuang Data, discussing how columnar storage, JIT compilation, and push‑mode execution are reshaping database performance in the era of big‑data analytics and evolving hardware constraints.

Apache ArrowColumnar StorageDatabases

0 likes · 10 min read

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Database Development

DataFunSummit

Dec 8, 2023 · Databases

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Modern Database Systems

The article presents an interview with Wu Li, a senior engineer at Shanghai Yanhuang Data, discussing how columnar storage, JIT compilation, and push‑mode processing are reshaping database performance and product strategy in the era of large‑scale data analytics.

Columnar StorageDatabasesJIT Compilation

0 likes · 9 min read

Huawei Cloud Developer Alliance

Nov 17, 2023 · Databases

How openGemini’s New Columnar Engine Solves High‑Cardinality Time‑Series Challenges

This article explains why time‑series databases are ideal for massive telemetry data, describes the high‑cardinality problem that degrades performance, and shows how openGemini’s newly introduced columnar engine—combined with sorting and clustering indexes—effectively mitigates those issues while delivering impressive write and query speeds.

Columnar StorageDatabaseshigh-cardinality

0 likes · 7 min read

How openGemini’s New Columnar Engine Solves High‑Cardinality Time‑Series Challenges

Alibaba Cloud Big Data AI Platform

Sep 14, 2023 · Big Data

How MaxCompute Turns Semi‑Structured Data into High‑Performance Columnar Storage

This article explains the nature of semi‑structured data, compares schema‑on‑read and schema‑on‑write approaches, and shows how Alibaba Cloud MaxCompute leverages columnar storage and dynamic parsing to achieve low‑cost, high‑performance analytics for large‑scale data workloads.

Columnar StorageMaxComputeSemi‑structured Data

0 likes · 20 min read

How MaxCompute Turns Semi‑Structured Data into High‑Performance Columnar Storage

DataFunSummit

Jul 9, 2023 · Big Data

Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases

This article explains how a data‑ecosystem team governs and applies behavior‑analysis data by describing common analysis scenarios, data‑warehouse modeling methods and their pros and cons, the concepts and overall architecture of behavior‑centric analytics, key system components, and several concrete analysis examples such as retention, funnel and path analysis.

Big DataColumnar StorageUser Segmentation

0 likes · 12 min read

Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases

Alibaba Cloud Developer

Mar 16, 2023 · Big Data

How SLS’s Schema‑on‑Read Scanning Boosts Log Analytics Flexibility and Cuts Costs

This article explains the motivation, design, and implementation of Alibaba Cloud's SLS Schema‑on‑Read scanning mode, showing how it enables SQL analysis on raw log data without pre‑built indexes, improves flexibility for evolving schemas, and reduces storage and index costs in various log‑analysis scenarios.

Big DataColumnar StorageLog Analytics

0 likes · 27 min read

How SLS’s Schema‑on‑Read Scanning Boosts Log Analytics Flexibility and Cuts Costs

ITPUB

Dec 18, 2022 · Databases

Why ClickHouse Is So Fast: Deep Dive into Storage and Compute Engine Optimizations

This article explains how ClickHouse achieves high query performance by leveraging storage‑engine designs such as pre‑sorting, columnar layout, and block‑level compression, and by exploiting a vectorized compute engine while avoiding joins and using built‑in functions.

Big DataClickHouseColumnar Storage

0 likes · 9 min read

Why ClickHouse Is So Fast: Deep Dive into Storage and Compute Engine Optimizations

Architects' Tech Alliance

Nov 20, 2022 · Databases

Columnar Storage vs Row Storage: Overview, Write/Read Comparison, Pros, Cons, and Use Cases

This article explains the differences between row-based and column-based storage, comparing their write and read performance, outlining advantages and disadvantages, and describing suitable scenarios such as OLAP queries, column families, compression, and indexing, to help choose the appropriate storage model.

Big DataColumnar StorageOLAP

0 likes · 10 min read

Columnar Storage vs Row Storage: Overview, Write/Read Comparison, Pros, Cons, and Use Cases

DataFunSummit

Sep 30, 2022 · Big Data

MercsDB: Architecture, Storage, Computation, and Optimization of Tencent's MPP Data Warehouse Engine

The article presents a comprehensive technical overview of MercsDB—formerly HermesDB—including its background, storage and indexing designs, native and Presto computation engines, vectorization optimizations, benchmark results, real‑world applications, and future development plans.

Big DataColumnar StorageMPP

0 likes · 20 min read

MercsDB: Architecture, Storage, Computation, and Optimization of Tencent's MPP Data Warehouse Engine

Java Architecture Diary

Sep 19, 2022 · Databases

ClickHouse vs Oracle vs esProc SPL: Real‑World TPC‑H Benchmark Reveals Surprising Performance Gaps

A comprehensive TPC‑H benchmark compares ClickHouse, Oracle, and the open‑source esProc SPL across simple and complex queries, showing ClickHouse excels at single‑table scans, while SPL consistently outperforms both in complex calculations and offers more concise code.

ClickHouseColumnar StorageDatabase Performance

0 likes · 12 min read

ClickHouse vs Oracle vs esProc SPL: Real‑World TPC‑H Benchmark Reveals Surprising Performance Gaps

Architect's Tech Stack

Aug 15, 2022 · Databases

Performance Comparison of ClickHouse, Oracle, and esProc SPL Using TPC‑H Benchmarks

This article benchmarks ClickHouse, Oracle, and the open‑source esProc SPL on the TPC‑H suite, showing ClickHouse excels at simple scans, Oracle handles many complex queries, while SPL consistently outperforms both in speed and code simplicity across a range of workloads.

ClickHouseColumnar StorageDatabase Performance

0 likes · 12 min read

Performance Comparison of ClickHouse, Oracle, and esProc SPL Using TPC‑H Benchmarks

Architects' Tech Alliance

Jun 26, 2022 · Databases

June 2022 China Database Popularity Rankings and an Overview of Columnar Databases

The article reports the June 2022 China database popularity ranking, highlights TiDB's comeback, introduces OtterTune's new financing, announces PostgreSQL 15 Beta 1, explains Google AlloyDB columnar features, and provides a detailed overview of columnar database concepts, history, advantages, and evolution.

AIColumnar StorageDatabase Ranking

0 likes · 8 min read

June 2022 China Database Popularity Rankings and an Overview of Columnar Databases

IT Architects Alliance

Jun 19, 2022 · Databases

Understanding ClickHouse: From OLAP Basics to Advanced Table Engines and Deployment

This guide explains ClickHouse fundamentals, OLAP versus OLTP concepts, columnar storage benefits, core performance techniques, the MergeTree family and its indexing, specialized table engines, installation on Linux, Docker deployment, and integration with HDFS, MySQL, and Kafka for modern analytical workloads.

ClickHouseColumnar StorageDocker

0 likes · 30 min read

Understanding ClickHouse: From OLAP Basics to Advanced Table Engines and Deployment

ByteDance Data Platform

May 30, 2022 · Databases

How UniqueMergeTree Boosts Real-Time Updates in ClickHouse Column Stores

UniqueMergeTree, a new ClickHouse table engine, addresses real‑time data update challenges by combining upsert semantics, unique key enforcement, and efficient delete‑bitmap handling, offering higher query performance at modest write cost, with detailed design, sharding strategies, conflict resolution, and performance evaluation.

ClickHouseColumnar StorageDatabase Engine

0 likes · 14 min read

How UniqueMergeTree Boosts Real-Time Updates in ClickHouse Column Stores

DataFunSummit

May 19, 2022 · Databases

Designing a One‑Stop IoT Storage Solution: Architecture, Cost Optimization, and Performance

The talk outlines IoT data classifications, requirements, and proposes a one‑stop storage product using multi‑model support, columnar formats, compute‑storage separation, tiered storage, and query optimization to achieve ten‑fold cost reduction and ten‑fold performance gains.

Cloud NativeColumnar StorageDatabase Design

0 likes · 20 min read

Designing a One‑Stop IoT Storage Solution: Architecture, Cost Optimization, and Performance

dbaplus Community

Mar 28, 2022 · Databases

Why We Switched from MongoDB to ClickHouse: Lessons from a Frontend Monitoring System

After months of using MongoDB for frontend monitoring logs, the data grew to billions of records causing severe query slowdown, prompting a migration to ClickHouse where columnar storage, partitioning, and OLAP capabilities dramatically improved performance and storage efficiency.

ClickHouseColumnar StorageMongoDB

0 likes · 17 min read

Why We Switched from MongoDB to ClickHouse: Lessons from a Frontend Monitoring System

DataFunSummit

Mar 21, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

This article explains how Apache Doris adopts CPU‑level vectorization and columnar storage to boost query performance, details the design and current status of its vectorized engine, and outlines future work such as JOIN acceleration, storage‑layer vectorization, import optimization, and extensive SQL function support.

Apache DorisColumnar StoragePerformance Optimization

0 likes · 21 min read

Vectorization in Apache Doris: Design, Implementation, and Future Roadmap

Efficient Ops

Mar 8, 2022 · Databases

From MongoDB to ClickHouse: Lessons Learned and Performance Gains

This article recounts the author's journey from using MongoDB for front‑end monitoring logs to migrating to ClickHouse, detailing the challenges with large‑scale data, optimization attempts, the fundamental differences between row‑ and column‑oriented databases, and the resulting performance and storage improvements.

Columnar StorageMongoDBNode.js

0 likes · 19 min read

From MongoDB to ClickHouse: Lessons Learned and Performance Gains

DataFunTalk

Feb 27, 2022 · Databases

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

This article explains how Apache Doris adopts CPU vectorization techniques—such as SIMD, columnar storage, and cache‑friendly designs—to boost query performance, detailing its current vectorized engine architecture, recent benchmarks, ongoing work on JOIN, storage, import, and future enhancements.

Apache DorisColumnar StorageDatabase Performance

0 likes · 22 min read

Vectorization in Apache Doris: Design, Implementation, Current Status, and Future Plans

Tencent Database Technology

Jan 19, 2022 · Databases

Deep Dive into Tencent's Self‑Developed MySQL Kernel TXSQL and Its Architecture

This article provides a comprehensive overview of Tencent's self‑developed MySQL kernel TXSQL, covering its evolution, overall architecture, columnar storage engine, instant DDL capabilities, enterprise‑grade features, high‑availability mechanisms, performance optimizations, and the rigorous development and testing processes behind the product.

Columnar StorageHigh AvailabilityPerformance Optimization

0 likes · 11 min read

Deep Dive into Tencent's Self‑Developed MySQL Kernel TXSQL and Its Architecture

Big Data Technology Architecture

Aug 24, 2021 · Big Data

An Overview of Apache Parquet: Architecture, Storage Model, and Comparison with ORC

This article provides a comprehensive introduction to Apache Parquet, covering its origins, columnar storage advantages, nested schema support, internal architecture, storage model components, comparison with ORC, and practical tools for inspecting Parquet files.

Columnar StorageHadoopORC Comparison

0 likes · 10 min read

An Overview of Apache Parquet: Architecture, Storage Model, and Comparison with ORC

Big Data Technology & Architecture

Aug 10, 2021 · Databases

Kudu Overview: Architecture, Features, and Use Cases

Kudu is an open‑source columnar storage engine from Cloudera that combines high‑throughput batch processing with low‑latency random reads, offering features such as C++/Java APIs, Raft‑based replication, flexible consistency, partitioning, and integration with Hadoop, Spark, Impala, and other ecosystem components.

Columnar StorageHadoopKudu

0 likes · 64 min read

Kudu Overview: Architecture, Features, and Use Cases

Baidu Geek Talk

Aug 9, 2021 · Databases

BaikalDB Implementation Practice at Tongcheng Yilong: High Availability, HTAP, Performance and Cost Optimization

Tongcheng Yilong’s BaikalDB deployment combines high‑availability multi‑Raft HA, HTAP support, and share‑nothing scalability to deliver over 72K TPS OLTP and ten‑fold faster OLAP queries while cutting operational costs up to a hundredfold through dual‑center, columnar storage and cloud‑native elasticity.

BaikalDBColumnar StorageHTAP

0 likes · 27 min read

BaikalDB Implementation Practice at Tongcheng Yilong: High Availability, HTAP, Performance and Cost Optimization

Python Programming Learning Circle

Jul 9, 2021 · Databases

Key Features of ClickHouse: DBMS Capabilities, Columnar Storage, Vectorized Execution, and Distributed Architecture

ClickHouse is an MPP column‑oriented DBMS that combines full DBMS functionality, advanced columnar storage with high compression, SIMD‑based vectorized execution, a rich relational SQL interface, diverse table engines, multi‑master clustering, and flexible sharding and distributed query capabilities, making it exceptionally fast for analytical workloads.

ClickHouseColumnar StorageDBMS

0 likes · 21 min read

Key Features of ClickHouse: DBMS Capabilities, Columnar Storage, Vectorized Execution, and Distributed Architecture

Big Data Technology Architecture

Jun 17, 2021 · Databases

Key Features of ClickHouse: DBMS Capabilities, Columnar Storage, Vectorized Execution, and Distributed Architecture

ClickHouse is a high‑performance MPP column‑store DBMS that combines complete DBMS functions, column‑oriented storage with aggressive compression, SIMD‑based vectorized execution, flexible table engines, multithreading, distributed processing, a multi‑master architecture, and SQL compatibility to deliver fast online analytical queries on massive data sets.

ClickHouseColumnar StorageDBMS

0 likes · 21 min read

ITPUB

Dec 29, 2020 · Databases

How BaikalDB’s Columnar Storage Boosted Real‑Time Analytics at DTCC2020

This article details how the DTCC2020 guest speaker from Tongcheng‑Elong introduced BaikalDB’s distributed columnar storage, covering internal and external motivations, technology comparison, architecture, implementation tricks, performance gains in production, and future hybrid row‑column research directions.

BaikalDBColumnar StorageHTAP

0 likes · 12 min read

How BaikalDB’s Columnar Storage Boosted Real‑Time Analytics at DTCC2020

Big Data Technology & Architecture

Nov 26, 2020 · Big Data

Understanding Apache Parquet: Architecture, Data Model, and Performance

This article provides a comprehensive overview of Apache Parquet, covering its modular architecture, nested data model, striping/assembly and definition level algorithms, file format details, push‑down optimizations, performance benchmarks, and the project's evolution within the big‑data ecosystem.

Columnar StorageParquetPushdown Optimization

0 likes · 18 min read

Understanding Apache Parquet: Architecture, Data Model, and Performance

Big Data Technology & Architecture

Nov 25, 2020 · Big Data

Understanding ORC File Format and Its Use in Hive and Java

This article explains the ORC (Optimized Row Columnar) file format, its advantages, internal structure, data model, compression mechanisms, and how to create Hive tables and write ORC files using Java, providing practical code examples and reference resources.

Columnar StorageData WarehouseHive

0 likes · 15 min read

Understanding ORC File Format and Its Use in Hive and Java

Programmer DD

Oct 25, 2020 · Databases

Why ClickHouse Beats MySQL for OLAP: Migration, Performance & Pitfalls

This article explains what ClickHouse is, compares column‑store and row‑store databases, shows how to migrate large MySQL tables to ClickHouse, presents performance test results, discusses data synchronization methods, highlights why ClickHouse is fast, and shares common migration pitfalls.

ClickHouseColumnar StorageOLAP

0 likes · 7 min read

Why ClickHouse Beats MySQL for OLAP: Migration, Performance & Pitfalls

Tencent Cloud Developer

Oct 20, 2020 · Databases

ClickHouse: Architecture, Core Features, and Limitations for Interactive Analytics

ClickHouse is a PB‑scale, open‑source columnar OLAP database that uses a ZooKeeper‑coordinated sharded cluster, columnar storage, vectorized execution, advanced compression, data‑skipping indexes, and materialized views to deliver high‑performance interactive analytics, yet it requires manual shard management, lacks a mature MPP optimizer, and handles real‑time single‑row writes poorly.

ClickHouseColumnar StorageMaterialized Views

0 likes · 18 min read

ClickHouse: Architecture, Core Features, and Limitations for Interactive Analytics

Big Data Technology Architecture

Sep 30, 2020 · Databases

Core Technologies of OLAP Systems: Storage, Computation, Optimizer, and Emerging Trends

This article systematically examines the core technologies of OLAP systems, covering storage models, columnar formats, indexing, distributed storage architectures, query execution steps, optimizer designs, and emerging trends such as real‑time analytics, HTAP, cloud‑native deployment, and hardware acceleration.

Columnar StorageOLAPQuery Optimizer

0 likes · 23 min read

Core Technologies of OLAP Systems: Storage, Computation, Optimizer, and Emerging Trends

JD Cloud Developers

Sep 29, 2020 · Databases

Why ClickHouse Powers JD Cloud’s Billion‑Row Queries: Architecture and Performance Secrets

This article explains how JD Cloud’s JCHDB, built on ClickHouse, achieves millisecond‑level queries on billions of rows through columnar storage, distributed multi‑master architecture, SIMD vector engine, sparse indexing, and specialized table engines, and outlines the ideal use cases and deployment details.

ClickHouseColumnar StorageJCHDB

0 likes · 10 min read

Why ClickHouse Powers JD Cloud’s Billion‑Row Queries: Architecture and Performance Secrets

Architects Research Society

Sep 1, 2020 · Databases

Understanding SAP HANA’s Combined Technologies: Memory, Columnar Storage, Compression, and Insert‑Only

The article explains SAP HANA’s performance advantages by combining four key technologies—high‑speed memory, columnar storage, data compression, and an insert‑only model—detailing their individual pros and cons, how they complement each other, and the trade‑offs involved in scaling and persistence.

Columnar StorageIn-MemoryInsert-Only

0 likes · 19 min read

Understanding SAP HANA’s Combined Technologies: Memory, Columnar Storage, Compression, and Insert‑Only

Tencent Database Technology

Aug 24, 2020 · Databases

Overview and Architecture of the CSTORE Columnar Engine for MySQL 8.0

This document explains the differences between OLTP and OLAP workloads, introduces the CSTORE columnar storage engine architecture, its core technologies, performance advantages, typical use cases, benchmark results, and future development plans for MySQL 8.0.

CStoreColumnar StorageDatabase Engine

0 likes · 14 min read

Overview and Architecture of the CSTORE Columnar Engine for MySQL 8.0

Big Data Technology Architecture

May 19, 2020 · Big Data

An Overview of Apache Parquet: Architecture, Features, and Comparison with ORC

Apache Parquet is a language‑agnostic, columnar storage format for the Hadoop ecosystem that offers high compression, efficient I/O through column and predicate push‑down, nested‑structure support, and a three‑layer architecture, and is compared with ORC while providing tooling for schema inspection.

Apache HadoopColumnar StorageData Formats

0 likes · 9 min read

An Overview of Apache Parquet: Architecture, Features, and Comparison with ORC

Alibaba Cloud Developer

Aug 22, 2019 · Big Data

How AliORC Supercharges MaxCompute: Inside the Next‑Gen Columnar Format

This article explains how Alibaba's MaxCompute platform evolved its storage engine from row‑based CFile to the columnar AliORC format, details the technical innovations such as async prefetch, small I/O elimination, adaptive dictionary encoding, and range‑aligned reads, and compares its performance against Apache ORC and Parquet.

AliORCApache ORCColumnar Storage

0 likes · 20 min read

How AliORC Supercharges MaxCompute: Inside the Next‑Gen Columnar Format

Big Data Technology & Architecture

Aug 14, 2019 · Big Data

Overview of Apache Druid Architecture and Its Comparison with Other Analytics Systems

This article provides a comprehensive overview of Apache Druid's distributed column‑store architecture, detailing its node types, external dependencies, data flow, and operational mechanisms, and compares Druid's real‑time analytics capabilities with systems such as Impala, Elasticsearch, and Spark.

Apache DruidColumnar Storagedistributed system

0 likes · 12 min read

Overview of Apache Druid Architecture and Its Comparison with Other Analytics Systems

360 Tech Engineering

Jul 18, 2019 · Databases

Principles and Practices of Apache Doris: Architecture, Key Technologies, and Real‑World Use Cases

This article presents a comprehensive overview of Apache Doris, covering its positioning as a distributed MPP analytical database, core architecture with FE and BE nodes, key technologies such as vectorized execution and materialized views, integration with Kafka and Elasticsearch, additional features, roadmap, and detailed case studies from Baidu Statistics and Meituan, illustrating its practical deployment and performance characteristics.

Apache DorisColumnar StorageData Warehouse

0 likes · 25 min read

Principles and Practices of Apache Doris: Architecture, Key Technologies, and Real‑World Use Cases

Big Data Technology Architecture

Jun 9, 2019 · Big Data

An Introduction to Apache Parquet: Architecture, Data Model, File Format, and Basic Operations

This article provides a comprehensive overview of Apache Parquet, covering its purpose, architectural components, nested data model, file structure, practical Hive commands for creating and inspecting Parquet tables, and a brief introduction to the TPC‑DS benchmark for performance testing.

Columnar StorageHiveParquet

0 likes · 8 min read

An Introduction to Apache Parquet: Architecture, Data Model, File Format, and Basic Operations

Efficient Ops

Feb 24, 2019 · Databases

Why Row vs Column Storage Matters: Understanding HBase’s Column‑Family Model

This article explains the differences between row‑oriented and column‑oriented storage, compares their trade‑offs, and introduces HBase’s column‑family architecture, including row keys, column qualifiers, timestamps, cells, and how it maps to a multi‑dimensional map structure.

Big DataColumnar StorageDatabases

0 likes · 7 min read

Why Row vs Column Storage Matters: Understanding HBase’s Column‑Family Model

Sohu Tech Products

Dec 12, 2018 · Databases

Optimizing MySQL Performance with Read/Write Splitting, Columnar Storage, and Dynamic Scheduling

The article details a real‑world MySQL performance case where a sudden 100‑fold load increase was mitigated through read/write splitting, replica‑based statistics, limited index tuning, middleware‑driven sharding, and finally a columnar storage layer (Infobright) with scripted dynamic data synchronization, achieving dramatic latency reductions and scalable architecture.

Columnar StorageData WarehouseInfobright

0 likes · 12 min read

Optimizing MySQL Performance with Read/Write Splitting, Columnar Storage, and Dynamic Scheduling

dbaplus Community

Dec 9, 2018 · Databases

How Read‑Write Splitting and Columnar Storage Rescued a 100× MySQL Load Spike

A MySQL‑based receipt‑tracking service suffered a sudden 100‑fold load increase, prompting a step‑by‑step optimization that combined read‑write splitting, middleware‑less data routing, columnar storage with Infobright, and dynamic scheduling to dramatically lower CPU/IO pressure and restore performance.

Columnar StoragePerformance OptimizationRead‑Write Splitting

0 likes · 13 min read

How Read‑Write Splitting and Columnar Storage Rescued a 100× MySQL Load Spike

Xianyu Technology

Nov 27, 2018 · Big Data

Millisecond-Scale Multi-Dimensional Data Filtering with HybridDB for MySQL

HybridDB for MySQL delivers millisecond‑scale, multi‑dimensional filtering on billions of rows with hundreds of metrics by combining a high‑performance columnar engine, automatic composite indexes, and a fused MPP‑DAG pipeline, turning half‑day push preparation into seconds while supporting full SQL, spatial, and JSON data.

Columnar StorageHybridDBOLAP

0 likes · 8 min read

Millisecond-Scale Multi-Dimensional Data Filtering with HybridDB for MySQL

Hulu Beijing

Feb 28, 2018 · Big Data

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

This article introduces Hulu's in‑house OLAP engine Nesto, detailing its near‑real‑time data ingestion, nested data model, TB‑level storage using HBase and Parquet, MPP query execution, custom predicate library, and the overall architecture that enables sub‑second ad‑hoc queries for user analytics.

Big DataColumnar StorageHBase

0 likes · 22 min read

How Hulu’s Nesto Engine Delivers Near‑Real‑Time OLAP on TB‑Scale Data

Huawei Cloud Developer Alliance

Feb 7, 2017 · Big Data

What’s New in Apache CarbonData 1.0.0? 80+ Features Boost Big Data Performance

Apache CarbonData 1.0.0, now an Apache incubating project, adds over 80 new features and bug fixes—including a new data loading solution, Spark 2.1 integration, update/delete SQL support, adaptive compression for numeric types, B‑Tree LRU cache, V2 format for faster first‑query performance, vectorized reader, bucket‑table joins, off‑heap memory, single‑pass loading, and pre‑generated dictionaries—aimed at delivering faster, more flexible, and efficient columnar storage for big‑data workloads.

Apache CarbonDataBig DataColumnar Storage

0 likes · 8 min read

What’s New in Apache CarbonData 1.0.0? 80+ Features Boost Big Data Performance

Hulu Beijing

Dec 20, 2016 · Big Data

How Hulu Supercharges OLAP Queries with CarbonData: Real‑World Optimizations

This article describes Hulu’s real‑world OLAP query optimization, covering the fundamentals of OLAP, comparisons of row‑ and column‑based storage formats, detailed indexing mechanisms of Parquet, ORC and CarbonData, and the specific schema, shuffle, block size, speculation and GC tuning techniques that enabled CarbonData to dramatically accelerate wide‑table queries on SparkSQL.

Big DataCarbonDataColumnar Storage

0 likes · 17 min read

How Hulu Supercharges OLAP Queries with CarbonData: Real‑World Optimizations

dbaplus Community

Jun 16, 2016 · Databases

How Dameng Implements Columnar Storage, Smart Indexes, and Adaptive Compression

This article explains Dameng's columnar storage architecture, the smart index mechanism that leverages zone statistics to reduce I/O, and the adaptive compression algorithms—including dictionary, constant, RLE, and sequence encoding—used to achieve high compression ratios on columnar data.

Columnar StorageDamengadaptive compression

0 likes · 14 min read

How Dameng Implements Columnar Storage, Smart Indexes, and Adaptive Compression

dbaplus Community

Dec 16, 2015 · Databases

How DB2 BLU Accelerator Supercharges OLAP with Columnar Storage and SIMD

This article explains IBM DB2 BLU Accelerator’s columnar storage, multi‑level compression, TSN‑based logical rows, SIMD processing, intra‑parallel execution, probability‑based caching, and automatic admin features, showing how these technologies together deliver dramatic I/O and performance gains for analytical workloads.

BLU AcceleratorColumnar StorageDB2

0 likes · 15 min read

How DB2 BLU Accelerator Supercharges OLAP with Columnar Storage and SIMD

Java High-Performance Architecture

Sep 14, 2015 · Databases

Exploring Key-Value, Document, Column, and Graph Database Models

This article explains four fundamental database data models—key‑value pair, document, column, and graph—detailing their structures, scalability characteristics, and typical implementations such as Redis, MongoDB, HBase, and Neo4j.

Columnar StorageKey-Value Storedatabase models

0 likes · 2 min read

Exploring Key-Value, Document, Column, and Graph Database Models

Art of Distributed System Architecture Design

Jul 12, 2015 · Big Data

Architectural Overview and Optimization Techniques for SQL‑on‑Hadoop Systems

This article provides a comprehensive analysis of SQL‑on‑Hadoop architectures, comparing runtime‑framework‑based engines like Hive with MPP‑style engines such as Impala, detailing core components, compilation pipelines, optimizer strategies, CPU/IO performance tricks, columnar storage formats, and resource management in modern big‑data query platforms.

Columnar StorageOptimizationQuery Engine

0 likes · 22 min read

Architectural Overview and Optimization Techniques for SQL‑on‑Hadoop Systems