Tag

columnar storage

0 views collected around this technical thread.

JD Retail Technology
JD Retail Technology
Apr 8, 2025 · Databases

ClickHouse Architecture and Core Technologies Overview

ClickHouse is an open‑source, massively parallel, column‑oriented OLAP database that integrates its own columnar storage, vectorized batch processing, pre‑sorted data, diverse table engines, extensive data types, sharding with replication, sparse primary‑key and skip indexes, and a multithreaded query engine, delivering high‑throughput real‑time analytics on massive datasets.

Big DataClickHouseOLAP
0 likes · 15 min read
ClickHouse Architecture and Core Technologies Overview
JD Tech Talk
JD Tech Talk
Dec 26, 2024 · Databases

Using ClickHouse for Efficient Tag Bitmap Storage and Group Computation in a CDP

This article explains how ClickHouse’s columnar storage, bitmap functions, and distributed architecture can be leveraged to store billions of tag bitmaps, combine them efficiently, and support fast group calculations for customer data platforms, while addressing data‑warehouse integration, storage format, and performance challenges.

BitmapCDPClickHouse
0 likes · 10 min read
Using ClickHouse for Efficient Tag Bitmap Storage and Group Computation in a CDP
vivo Internet Technology
vivo Internet Technology
Jul 10, 2024 · Databases

HBase Optimization Practice in Vivo's Unified Content Platform

Vivo's unified content platform replaced its unwieldy 60 TB MongoDB store with HBase, then upgraded the cluster, introduced table‑specific connection pools, column‑only reads, tuned compaction, and leveraged multi‑version cells, cutting response times from seconds to under ten milliseconds and dramatically lowering operational costs while boosting read/write performance.

Compaction OptimizationDatabase OptimizationDistributed Database
0 likes · 16 min read
HBase Optimization Practice in Vivo's Unified Content Platform
DataFunSummit
DataFunSummit
Jun 21, 2024 · Big Data

Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes dynamic read‑time modeling, outlines the system’s execution flow, storage and indexing strategies, and shares practical tips and extensions for building scalable big‑data solutions.

AceroApache ArrowBig Data
0 likes · 20 min read
Building a Complete Data System with Apache Arrow: Architecture, Dynamic Schema Modeling, and Practical Tips
DataFunSummit
DataFunSummit
Apr 23, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips

This article explains why new data systems are needed, introduces Apache Arrow’s columnar in‑memory format and its zero‑copy advantages, describes how to model data at read time, outlines the execution flow with Acero and SQL planning, and shares practical tips and extensions for building robust, dynamic‑schema data platforms.

AceroApache ArrowBig Data
0 likes · 20 min read
Building a Data System with Apache Arrow: Design, Implementation, and Practical Tips
Sohu Tech Products
Sohu Tech Products
Mar 6, 2024 · Big Data

Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution

The article explains how Apache Arrow’s columnar, cross‑language in‑memory format enables high‑performance, interoperable data systems—replacing traditional row‑oriented databases—by supporting dynamic schemas, zero‑copy data exchange, efficient indexing, Acero‑based query execution, and Flight/ADBC connectivity, while offering practical guidance and highlighting challenges.

Apache ArrowBig DataData Systems
0 likes · 20 min read
Building Data Systems with Apache Arrow: Architecture, Memory Format, and Execution
DataFunTalk
DataFunTalk
Feb 28, 2024 · Big Data

Building a Data System with Apache Arrow: Design, Modeling, and Execution

This article explains why new data systems are needed, introduces Apache Arrow and its columnar in‑memory format, describes read‑time modeling and dynamic schema handling, and shows how Arrow can be used to build a complete data processing pipeline with indexing, SQL planning, and zero‑copy data exchange.

Apache ArrowBig DataData Systems
0 likes · 20 min read
Building a Data System with Apache Arrow: Design, Modeling, and Execution
DataFunTalk
DataFunTalk
Jan 1, 2024 · Big Data

MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits

This article explains the nature of semi‑structured data, compares traditional schema‑on‑read and schema‑on‑write approaches, and details MaxCompute's columnar storage solution that balances flexibility, performance, and cost for large‑scale data warehouses.

Big DataData WarehouseMaxCompute
0 likes · 19 min read
MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits
DataFunTalk
DataFunTalk
Dec 11, 2023 · Databases

Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Modern Database Systems

The interview with Wu Li, a research engineer at Shanghai Yanhuang Data, explores how columnar storage, JIT compilation, and push-mode processing are reshaping modern database performance, highlighting hardware constraints, software optimizations, and product‑centric goals in the era of big data analytics.

JIT compilationOLAPcolumnar storage
0 likes · 11 min read
Interview with Wu Li on Columnar Storage, JIT Compilation, and Push Mode in Modern Database Systems
DataFunSummit
DataFunSummit
Dec 10, 2023 · Databases

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

In this technical interview, Wu Li, a research engineer at Shanghai Yanhuang Data, explains how hardware constraints drive database evolution, why columnar storage and SIMD acceleration are crucial for OLAP, and how JIT compilation and push‑mode processing improve query performance and product experience.

Data EngineeringJIT compilationOLAP
0 likes · 10 min read
Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode
DataFunTalk
DataFunTalk
Dec 8, 2023 · Databases

Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode

The article presents an interview with Wu Li, a research engineer at Shanghai Yanhuang Data, discussing how hardware limits have driven database evolution toward columnar storage, the adoption of Apache Arrow and Gandiva for SIMD‑enabled JIT compilation, and the shift from pull to push processing modes to improve OLAP performance.

Apache ArrowDatabase OptimizationGandiva
0 likes · 10 min read
Interview with Wu Li on Database Evolution: Columnar Storage, JIT Compilation, and Push Mode
DataFunSummit
DataFunSummit
Sep 7, 2023 · Big Data

MaxCompute Semi-Structured Data Solutions: Architecture, Comparison, and Performance Benefits

This article explains the concepts of semi‑structured data, compares traditional schema‑on‑read and schema‑on‑write approaches, and details MaxCompute's columnar storage solution—including AliORC, adaptive query processing, and handling of dirty or sparse data—to achieve high performance and low cost in big‑data warehousing.

MaxComputeSemi-Structured Datacolumnar storage
0 likes · 20 min read
MaxCompute Semi-Structured Data Solutions: Architecture, Comparison, and Performance Benefits
DataFunSummit
DataFunSummit
Jul 9, 2023 · Big Data

Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases

This article explains how a data‑ecosystem team governs and applies behavior‑analysis data by describing common analysis scenarios, data‑warehouse modeling methods and their pros and cons, the concepts and overall architecture of behavior‑centric analytics, key system components, and several concrete analysis examples such as retention, funnel and path analysis.

Behavior AnalysisBig DataData Warehouse
0 likes · 12 min read
Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases
Architects' Tech Alliance
Architects' Tech Alliance
Nov 20, 2022 · Databases

Columnar Storage vs Row Storage: Overview, Write/Read Comparison, Pros, Cons, and Use Cases

This article explains the differences between row-based and column-based storage, comparing their write and read performance, outlining advantages and disadvantages, and describing suitable scenarios such as OLAP queries, column families, compression, and indexing, to help choose the appropriate storage model.

Big DataDatabaseOLAP
0 likes · 10 min read
Columnar Storage vs Row Storage: Overview, Write/Read Comparison, Pros, Cons, and Use Cases
Architect's Tech Stack
Architect's Tech Stack
Aug 15, 2022 · Databases

Performance Comparison of ClickHouse, Oracle, and esProc SPL Using TPC‑H Benchmarks

This article benchmarks ClickHouse, Oracle, and the open‑source esProc SPL on the TPC‑H suite, showing ClickHouse excels at simple scans, Oracle handles many complex queries, while SPL consistently outperforms both in speed and code simplicity across a range of workloads.

ClickHouseDatabase PerformanceOracle
0 likes · 12 min read
Performance Comparison of ClickHouse, Oracle, and esProc SPL Using TPC‑H Benchmarks
Architects' Tech Alliance
Architects' Tech Alliance
Jun 26, 2022 · Databases

June 2022 China Database Popularity Rankings and an Overview of Columnar Databases

The article reports the June 2022 China database popularity ranking, highlights TiDB's comeback, introduces OtterTune's new financing, announces PostgreSQL 15 Beta 1, explains Google AlloyDB columnar features, and provides a detailed overview of columnar database concepts, history, advantages, and evolution.

AIOtterTunePostgreSQL
0 likes · 8 min read
June 2022 China Database Popularity Rankings and an Overview of Columnar Databases