Tagged articles
468 articles
Page 2 of 5
DataFunTalk
DataFunTalk
Feb 25, 2024 · Big Data

Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans

This article details Bilibili's tag system from its 2021 inception through successive redesigns, describing the three‑layer architecture, data flow pipelines using Hive, Iceberg, Spark and ClickHouse, crowd selection DSL, online services with Redis, performance optimizations, and upcoming governance and quality initiatives.

Big DataClickHouseReal-time Processing
0 likes · 12 min read
Implementation Practice of Bilibili's Tag System: Evolution, Architecture, and Future Plans
Architect
Architect
Feb 1, 2024 · Backend Development

Design and Optimization of Trace2.0: A High‑Performance Backend Tracing System

Trace2.0 is an OpenTelemetry‑based application monitoring system that processes petabyte‑scale trace data using multi‑channel client protocols, gRPC, load‑balancing optimizations, ZSTD compression, Kafka pipelines, ClickHouse storage, and a JDK 21 upgrade with virtual threads, achieving significant performance and cost improvements.

ClickHouseJDK21OpenTelemetry
0 likes · 15 min read
Design and Optimization of Trace2.0: A High‑Performance Backend Tracing System
Efficient Ops
Efficient Ops
Jan 31, 2024 · Databases

Why ClickHouse Beats Elasticsearch for High‑Performance Log Analytics

Facing data security and cost challenges in SaaS, the author evaluates ClickHouse versus Elasticsearch, highlighting ClickHouse’s superior write throughput, query speed, lower storage and CPU usage, and provides detailed deployment guides for Zookeeper, Kafka, FileBeat, and ClickHouse to build a cost‑effective private analytics platform.

Big DataClickHouseDatabase Deployment
0 likes · 8 min read
Why ClickHouse Beats Elasticsearch for High‑Performance Log Analytics
dbaplus Community
dbaplus Community
Jan 23, 2024 · Operations

How We Built a Scalable Real‑Time Log Center with ClickHouse and ELK

Facing massive data volumes, the team at Kuaidi100 redesigned their logging platform, moving from a file‑based system to an ELK stack and finally to a ClickHouse‑based architecture, achieving real‑time, scalable, cost‑effective log collection, analysis, and alerting while addressing storage, performance, and maintenance challenges.

ClickHouseELKLog Management
0 likes · 12 min read
How We Built a Scalable Real‑Time Log Center with ClickHouse and ELK
Bilibili Tech
Bilibili Tech
Jan 23, 2024 · Databases

Unique Engine Design and Implementation in ClickHouse for Bilibili Live Guild Data

Bilibili migrated its live‑guild analytics from MySQL to ClickHouse, creating a custom ReplicatedUniqueMergeTree engine that uses delete‑on‑insert, min‑max and hash‑bucketed indexes with delete bitmaps to achieve 10‑20× faster queries and scalable near‑real‑time reporting despite higher write latency.

ClickHouseUnique Enginedata engineering
0 likes · 18 min read
Unique Engine Design and Implementation in ClickHouse for Bilibili Live Guild Data
JD Tech
JD Tech
Jan 18, 2024 · Databases

Understanding ClickHouse: Architecture, Principles, and Performance

This article introduces ClickHouse, an open‑source columnar OLAP database, explains its architecture—including columnar storage, block processing, LSM, indexing and vectorized execution—highlights its performance advantages over other engines, and discusses its limitations such as write‑amplification, concurrency constraints, and ZooKeeper dependency.

Big DataClickHouseColumnar Database
0 likes · 12 min read
Understanding ClickHouse: Architecture, Principles, and Performance
Efficient Ops
Efficient Ops
Jan 17, 2024 · Operations

How We Built a Scalable Cloud‑Native Log Center with ClickHouse

This article details a courier company's evolution from a simple file‑based logging system to a cloud‑native log center, describing the limitations of the original architecture, the migration to an ELK stack, subsequent challenges, and the final redesign using ClickHouse for high compression, low cost, and improved query performance.

ClickHouseELKLog Management
0 likes · 12 min read
How We Built a Scalable Cloud‑Native Log Center with ClickHouse
MaGe Linux Operations
MaGe Linux Operations
Jan 3, 2024 · Big Data

ClickHouse vs Elasticsearch: Faster, Cheaper Log Analytics Explained

This article compares ClickHouse and Elasticsearch for log analytics, highlighting ClickHouse's superior write throughput, query speed, and lower server costs, then provides a detailed, cost‑effective deployment guide covering Zookeeper, Kafka, FileBeat, ClickHouse installation, and visualization with ClickVisual, plus optimization tips.

Big DataClickHouseDeployment
0 likes · 15 min read
ClickHouse vs Elasticsearch: Faster, Cheaper Log Analytics Explained
DataFunTalk
DataFunTalk
Jan 3, 2024 · Databases

ClickHouse 2024 Core New Features and Product Development Directions

This article introduces ClickHouse, an open‑source columnar OLAP database, outlines its architecture, advantages, self‑hosted and cloud deployment models, highlights recent product features such as async inserts, JSON support, Parquet acceleration, query caching, and summarizes a Q&A covering semi‑structured data, MPP, virtual columns, and future roadmap.

ClickHouseColumnar DatabaseData Warehouse
0 likes · 12 min read
ClickHouse 2024 Core New Features and Product Development Directions
Efficient Ops
Efficient Ops
Dec 27, 2023 · Big Data

Why ClickHouse Beats Elasticsearch for Log Analytics – Performance, Cost & Deployment

This article compares ClickHouse and Elasticsearch for log analytics, highlighting ClickHouse’s superior write throughput, query speed, and lower server costs, then details a cost‑effective deployment architecture—including Zookeeper, Kafka, FileBeat, and ClickHouse setup—and shares optimization tips and visualization using ClickVisual.

Big DataClickHouseElasticsearch
0 likes · 13 min read
Why ClickHouse Beats Elasticsearch for Log Analytics – Performance, Cost & Deployment
dbaplus Community
dbaplus Community
Nov 20, 2023 · Operations

Can VictoriaLogs Really Beat Elasticsearch, Loki, and ClickHouse? A Deep Dive

VictoriaLogs, a log‑storage system marketed as a cost‑effective, high‑performance alternative, is compared against Elasticsearch/OpenSearch, Grafana Loki, and ClickHouse, highlighting its lower RAM and disk usage, faster queries, simplified setup, and specialized features such as LogSQL, Bloom filters, and custom compression.

ClickHouseGrafana LokiLog Management
0 likes · 9 min read
Can VictoriaLogs Really Beat Elasticsearch, Loki, and ClickHouse? A Deep Dive
dbaplus Community
dbaplus Community
Nov 14, 2023 · Databases

How Didi Cut ClickHouse CPU Usage by 90%: Optimizing BgMoveProcPool Threads

This article details how Didi identified excessive CPU consumption by ClickHouse's BgMoveProcPool threads, traced the root cause to unnecessary part‑move checks, introduced a simple early‑exit guard in selectPartsForMove, and achieved a dramatic reduction in CPU load while contributing the fix upstream.

Background ThreadsCPUClickHouse
0 likes · 10 min read
How Didi Cut ClickHouse CPU Usage by 90%: Optimizing BgMoveProcPool Threads
DataFunTalk
DataFunTalk
Nov 7, 2023 · Big Data

Comprehensive Guide to User Crowd Analysis: Distribution, Metrics, Drill‑down, Cross, and Comparative Methods with Implementation Details

This article explains the concepts, analytical methods, visualizations, and SQL implementation of user crowd analysis—including distribution, metric, drill‑down, cross, and comparative analyses—while also covering trend monitoring, TGI calculation, and handling of array‑type tags in ClickHouse and Hive.

ClickHouseData visualizationSQL
0 likes · 17 min read
Comprehensive Guide to User Crowd Analysis: Distribution, Metrics, Drill‑down, Cross, and Comparative Methods with Implementation Details
Inke Technology
Inke Technology
Oct 31, 2023 · Operations

How We Re‑engineered Our Log Platform to Cut Costs by 60% with ClickHouse

This article details the redesign of a company’s logging infrastructure—from an ELK‑based solution to a ClickHouse‑powered architecture—highlighting the motivations, key requirements, component choices, configuration examples, performance optimizations, and the resulting cost and storage benefits.

Big DataClickHouseObservability
0 likes · 13 min read
How We Re‑engineered Our Log Platform to Cut Costs by 60% with ClickHouse
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Oct 27, 2023 · Databases

Corona Technical Series: Time-Series Databases in Corona

The article explains how Corona leverages three time‑series databases—InfluxDB for storing pre‑aggregated user metrics and platform health data, ClickHouse for real‑time multidimensional log analysis with aggregations, and ElasticSearch for full‑text searchable log monitoring—detailing their schema designs and query examples.

ClickHouseCoronaDatabase Architecture
0 likes · 19 min read
Corona Technical Series: Time-Series Databases in Corona
Architect
Architect
Oct 26, 2023 · Big Data

Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry

This article details Bilibili's evolution of its log system from an Elastic Stack‑based solution to a ClickHouse‑backed architecture with OpenTelemetry, describing the challenges of cost, stability, and scalability, the new components such as Log‑Agent, Log‑Ingester, and a custom visualization platform, and the performance gains and future directions.

ClickHouseObservabilityOpenTelemetry
0 likes · 26 min read
Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry
dbaplus Community
dbaplus Community
Oct 25, 2023 · Databases

ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance

This article compares ByConity and ClickHouse from a usage perspective, detailing their architectural differences, core components, basic operations such as table creation, data import and query, distributed transaction support, special table engines, scaling strategies, and deployment requirements.

ByConityClickHouseDistributed Transactions
0 likes · 26 min read
ByConity vs ClickHouse: Deep Dive into Architecture, Features, and Performance
dbaplus Community
dbaplus Community
Oct 18, 2023 · Databases

Doris vs ClickHouse: Which Database Delivers Faster Writes and Queries?

This article presents a systematic performance comparison between Doris and ClickHouse, covering data ingestion speed, SQL syntax differences, hardware impact, and detailed query benchmarks across multiple scenarios, ultimately revealing that each system excels in different use cases.

Big DataClickHouseSQL
0 likes · 15 min read
Doris vs ClickHouse: Which Database Delivers Faster Writes and Queries?
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 7, 2023 · Big Data

Comprehensive Guide to OLAP Optimization and ClickHouse Performance Tuning

This article explains how to optimize OLAP workloads by balancing normalization and denormalization, applying data sharding, replication, indexing, partitioning, materialized views, columnar storage, compression, and lifecycle management, and provides practical ClickHouse SQL examples for index creation, partitioning, and query plan analysis.

ClickHouseOLAPPartitioning
0 likes · 15 min read
Comprehensive Guide to OLAP Optimization and ClickHouse Performance Tuning
dbaplus Community
dbaplus Community
Sep 19, 2023 · Cloud Native

How REDck Transformed ClickHouse into a Scalable Cloud‑Native Real‑Time Data Warehouse

REDck, a cloud‑native real‑time data warehouse built on open‑source ClickHouse, overcomes the original MPP architecture’s scaling and maintenance limits by separating compute and storage, introducing unified metadata, multi‑level caching, bucket‑based sharding, and distributed transaction support, delivering petabyte‑scale, 99.9% availability and ten‑fold cost and performance gains for Xiaohongshu’s diverse workloads.

ClickHouseCloud NativeCompute-Storage Separation
0 likes · 22 min read
How REDck Transformed ClickHouse into a Scalable Cloud‑Native Real‑Time Data Warehouse
DataFunTalk
DataFunTalk
Sep 17, 2023 · Cloud Native

REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse

REDck is a cloud‑native, storage‑compute separated real‑time OLAP data warehouse derived from ClickHouse that addresses scalability, operational cost, and reliability challenges through a unified metadata service, object‑storage optimizations, multi‑level caching, distributed task scheduling, and two‑phase commit transactions.

ClickHouseDistributed TransactionsReal-time OLAP
0 likes · 18 min read
REDck: A Cloud‑Native Real‑Time Data Warehouse Built on ClickHouse
ITPUB
ITPUB
Sep 15, 2023 · Databases

Importing Billions of Kafka Rows into Doris and Benchmarking Against ClickHouse

This article explains Doris's various data import methods, focuses on the routine load approach for Kafka streams, describes how to handle mixed‑schema topics using the max_error_number parameter, and compares query performance of a 130 million‑row dataset against ClickHouse, highlighting each system's strengths and limitations.

ClickHouseKafkaRoutine Load
0 likes · 10 min read
Importing Billions of Kafka Rows into Doris and Benchmarking Against ClickHouse
Architect
Architect
Sep 11, 2023 · Databases

How eBay Scaled ClickHouse with Read/Write Separation and Keeper

This article details eBay's event monitoring platform architecture, explains the challenges of high‑load OLAP workloads on ClickHouse clusters, describes the design and implementation of read/write separation and multi‑shard Keeper coordination, and shares concrete configuration snippets, performance observations, and production lessons learned.

ClickHouseDistributed SystemsKeeper
0 likes · 20 min read
How eBay Scaled ClickHouse with Read/Write Separation and Keeper
ITPUB
ITPUB
Sep 11, 2023 · Cloud Native

How REDck Transforms ClickHouse into a Scalable Cloud‑Native Real‑Time Data Warehouse

Xiaohongshu built REDck, a cloud‑native, storage‑compute separated real‑time OLAP warehouse on ClickHouse, addressing scaling, cost, and reliability challenges through a unified metadata service, object‑storage optimizations, multi‑level caching, distributed task scheduling, bucketing, and exactly‑once transaction support.

ClickHouseDistributed TransactionsReal-time OLAP
0 likes · 21 min read
How REDck Transforms ClickHouse into a Scalable Cloud‑Native Real‑Time Data Warehouse
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 6, 2023 · Databases

REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse

REDck is a cloud‑native, real‑time OLAP data warehouse built on ClickHouse that adds elastic compute and storage scaling, object‑storage optimizations, multi‑level caching, and exactly‑once ingestion, delivering petabyte‑scale interactive analytics with ten‑fold CPU efficiency, ten‑fold cost reduction, and 99.9% availability.

Big DataClickHouseReal-time OLAP
0 likes · 21 min read
REDck: A Cloud‑Native Real‑Time OLAP Data Warehouse Built on ClickHouse
JD Retail Technology
JD Retail Technology
Sep 4, 2023 · Big Data

JD Mini Program Data Center: Architecture, Milestones, and Real‑time Analytics Solutions

The article details the JD Mini Program platform, its data‑center development milestones, comprehensive business panorama, technical architecture, data collection, storage, and analysis pipelines—including Flink‑based real‑time monitoring, ClickHouse custom analytics, and Elasticsearch user‑behavior insights—while outlining current challenges and future AI‑driven enhancements.

Big DataClickHouseData Warehouse
0 likes · 16 min read
JD Mini Program Data Center: Architecture, Milestones, and Real‑time Analytics Solutions
DataFunTalk
DataFunTalk
Aug 28, 2023 · Big Data

Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse

This article shares the practical architecture, technology selection, implementation details, and evolution of an e‑commerce platform’s offline and real‑time data warehouses, covering data modeling, processing pipelines, system components such as Hive, Spark, Flink, ClickHouse, Doris, and Hudi, and the lessons learned from multiple production deployments.

Big DataClickHouseData Warehouse
0 likes · 18 min read
Practical Experience of an E‑commerce Platform’s Offline and Real‑time Data Warehouse
JD Retail Technology
JD Retail Technology
Aug 21, 2023 · Artificial Intelligence

ChatGPT-4 Enhances Data Analysis Efficiency and Insight Across Big Data Scenarios

This article examines how ChatGPT-4, as an advanced natural‑language‑processing model, can streamline data analysis tasks—from generating Hive table definitions and sample data to crafting complex HiveSQL queries, visualizing results, and implementing ClickHouse and Flink solutions—thereby improving efficiency, insight, and problem‑solving in big‑data environments.

Big DataChatGPT-4ClickHouse
0 likes · 7 min read
ChatGPT-4 Enhances Data Analysis Efficiency and Insight Across Big Data Scenarios
dbaplus Community
dbaplus Community
Aug 15, 2023 · Databases

Why ClickHouse Outperforms MySQL, Elasticsearch, and HBase for Massive Event Data

This article examines the massive data storage and real‑time analysis needs of an activity platform, evaluates MySQL, sharded MySQL, Elasticsearch and HBase, and explains why ClickHouse—with its columnar storage, MergeTree engine, vectorized execution, and distributed architecture—offers the best balance of write performance, query speed, and scalability for billions of records.

Big DataClickHouseColumnar Database
0 likes · 31 min read
Why ClickHouse Outperforms MySQL, Elasticsearch, and HBase for Massive Event Data
DataFunSummit
DataFunSummit
Aug 10, 2023 · Databases

ClickHouse Deployment in Lenovo Manufacturing: Architecture, Data Integration, and Performance Optimization

This article details Lenovo's implementation of ClickHouse in a manufacturing environment, covering the current data landscape, cluster architecture, integration challenges, performance enhancements, and solutions such as Seatunnel and query pre‑aggregation, illustrating how OLAP engines can address real‑time analytics and concurrency issues in production data pipelines.

ClickHouseData IntegrationManufacturing
0 likes · 11 min read
ClickHouse Deployment in Lenovo Manufacturing: Architecture, Data Integration, and Performance Optimization
MaGe Linux Operations
MaGe Linux Operations
Aug 5, 2023 · Databases

Elasticsearch vs ClickHouse: Architecture, Queries, and Performance

This article compares Elasticsearch and ClickHouse by examining their underlying architectures, node roles, query languages, and performance through a series of benchmark tests using Docker‑compose, Vector data pipelines, and Python SDKs, revealing ClickHouse’s superior speed in most query scenarios despite lacking advanced search features.

ClickHouseElasticsearch
0 likes · 12 min read
Elasticsearch vs ClickHouse: Architecture, Queries, and Performance
dbaplus Community
dbaplus Community
Aug 3, 2023 · Databases

Scaling eBay’s Sherlock.io ClickHouse Platform with Read/Write Separation and Keeper

The article details how eBay’s Sherlock.io event monitoring platform, built on ClickHouse, faced scaling and performance challenges due to ZooKeeper bottlenecks, and explains the design and implementation of read/write separation, shard‑level Keeper coordination, and related operational fixes to improve reliability and latency.

ClickHouseKeeperRead-Write Separation
0 likes · 19 min read
Scaling eBay’s Sherlock.io ClickHouse Platform with Read/Write Separation and Keeper
Efficient Ops
Efficient Ops
Aug 2, 2023 · Databases

Why ClickHouse Outperforms Elasticsearch in Real‑World Queries

This article compares Elasticsearch and ClickHouse across architecture, query capabilities, and performance using Docker‑compose stacks and Python SDK tests, demonstrating that ClickHouse often delivers superior speed, especially in aggregation and regex queries, while highlighting each system’s design trade‑offs.

ClickHouseDocker ComposeElasticsearch
0 likes · 13 min read
Why ClickHouse Outperforms Elasticsearch in Real‑World Queries
Top Architect
Top Architect
Jul 27, 2023 · Big Data

Performance Comparison of Elasticsearch and ClickHouse for Log Search

This article compares Elasticsearch and ClickHouse as log‑search solutions, detailing their architectures, Docker‑compose deployments, data‑ingestion pipelines with Vector, query syntax differences, and benchmark results that show ClickHouse generally outperforms Elasticsearch in speed and aggregation efficiency.

Big DataClickHouseDocker
0 likes · 13 min read
Performance Comparison of Elasticsearch and ClickHouse for Log Search
DataFunSummit
DataFunSummit
Jul 27, 2023 · Backend Development

Building a High‑Availability ClickHouse Cluster with RaftKeeper

This article explains how RaftKeeper leverages the Raft consensus algorithm to create a high‑availability, high‑performance ClickHouse cluster across multiple data centers, covering project background, architecture, core features, performance optimizations, and real‑world deployment results.

Backend DevelopmentClickHouseCross-DataCenter
0 likes · 17 min read
Building a High‑Availability ClickHouse Cluster with RaftKeeper
dbaplus Community
dbaplus Community
Jul 26, 2023 · Databases

Mastering ClickHouse with Flink: Table Engine Choices, Performance Tuning, and Real‑World Lessons

This article details how JDQ+Flink+Elasticsearch was extended with ClickHouse for real‑time reporting, covering table‑engine selection, Flink sink implementation, query optimization techniques, high‑CPU shard analysis, and practical strategies to ensure high concurrency and stable performance in production.

ClickHouseDistributedTablesFlink
0 likes · 46 min read
Mastering ClickHouse with Flink: Table Engine Choices, Performance Tuning, and Real‑World Lessons
JD Cloud Developers
JD Cloud Developers
Jul 19, 2023 · Databases

Why ClickHouse Is the Ideal Choice for Massive Data Storage and Real‑Time Analytics

This article examines the massive‑scale data requirements of an activity‑tracking platform, compares MySQL, Elasticsearch and HBase, and explains why ClickHouse—with its columnar storage, MergeTree engine, vectorized execution, and distributed architecture—offers the best combination of storage capacity, write performance, real‑time analysis, and query speed for billions of records.

ClickHouseColumnar DatabaseData Warehouse
0 likes · 31 min read
Why ClickHouse Is the Ideal Choice for Massive Data Storage and Real‑Time Analytics
dbaplus Community
dbaplus Community
Jul 17, 2023 · Big Data

How Bilibili Built Billions 3.0: A Low‑Cost, Scalable Log Platform with ClickHouse, Iceberg, and Trino

This article details Bilibili's evolution from the ClickHouse‑based Billions 2.0 log system to the Billions 3.0 architecture, explaining how they reduced storage costs, improved troubleshooting, adopted a lake‑house design with Iceberg on HDFS, leveraged ClickHouse for acceleration, and integrated Trino as the unified query engine.

ClickHouseIcebergObservability
0 likes · 37 min read
How Bilibili Built Billions 3.0: A Low‑Cost, Scalable Log Platform with ClickHouse, Iceberg, and Trino
Architect
Architect
Jul 17, 2023 · Databases

Performance Comparison of Elasticsearch and ClickHouse for Log Search and Analytics

This article compares Elasticsearch and ClickHouse by describing their architectures, presenting Docker‑based test stacks, showing code snippets for deployment, data ingestion, and queries, and reporting performance results that demonstrate ClickHouse generally outperforms Elasticsearch in log‑analytics scenarios.

ClickHouseDockerElasticsearch
0 likes · 12 min read
Performance Comparison of Elasticsearch and ClickHouse for Log Search and Analytics
Didi Tech
Didi Tech
Jul 5, 2023 · Databases

Performance Optimization of ClickHouse: Identifying and Fixing High CPU Usage in BgMoveProcPool Threads

By adding a guard that skips the costly part‑scan in MergeTreePartsMover::selectPartsForMove when disk usage is below the threshold and no MoveTTL is set, Didi reduced BgMoveProcPool thread CPU consumption from about 30 % to under 4 %, halving overall node CPU load and improving ClickHouse’s PB‑scale performance.

BgMoveProcPoolCPUClickHouse
0 likes · 10 min read
Performance Optimization of ClickHouse: Identifying and Fixing High CPU Usage in BgMoveProcPool Threads
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jun 29, 2023 · Cloud Native

KubeCost: Kubernetes-Based Resource Cost Analysis and Allocation System

KubeCost, developed by NetEase Cloud Music, is a low‑intrusion, scalable Kubernetes cost analysis system that allocates resource expenses using peak‑or‑usage billing models, supports hybrid‑multi‑cloud pricing, aggregates per‑pod CPU/memory/GPU costs, and stores data efficiently in ClickHouse for reliable, business‑oriented financial insight.

ClickHouseCloud Cost ManagementCloud Native
0 likes · 10 min read
KubeCost: Kubernetes-Based Resource Cost Analysis and Allocation System
Bilibili Tech
Bilibili Tech
Jun 20, 2023 · Big Data

Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino

Bilibili evolved its log platform from ClickHouse‑based Billions 2.0 to Billions 3.0 lakehouse using Iceberg, HDFS, Trino, retaining ClickHouse for acceleration; this reduces storage cost by over 20%, improves observability, solves the compute‑storage mismatch, adds flexible indexing, and supports complex ETL while staying open‑source.

ClickHouseIcebergLakehouse
0 likes · 36 min read
Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino
Didi Tech
Didi Tech
Jun 14, 2023 · Big Data

Real-Time Data Development Practices and Component Selection at Didi

Didi’s unified real‑time data stack outlines best‑practice component choices for four key scenarios—metric monitoring, BI analysis, online services, and feature/tag systems—detailing pipelines from source to sink, resource‑usage guidelines, and a one‑stop development platform to build stable, high‑performance streaming solutions.

ClickHouseDruidFlink
0 likes · 17 min read
Real-Time Data Development Practices and Component Selection at Didi
JD Cloud Developers
JD Cloud Developers
May 30, 2023 · Big Data

ClickHouse & Flink: Choosing Engines, Tuning Queries, and Scaling Concurrency

This article details how JDQ, Flink, and ClickHouse were integrated to replace Elasticsearch for real‑time reporting, covering table‑engine selection, Flink sink implementation, performance bottlenecks, CPU hot‑spots, query optimization techniques, and strategies for handling high concurrency while ensuring data consistency and system stability.

ClickHouseFlinkSQL Optimization
0 likes · 46 min read
ClickHouse & Flink: Choosing Engines, Tuning Queries, and Scaling Concurrency
DataFunSummit
DataFunSummit
May 30, 2023 · Big Data

DataFunCon Conference – OLAP, StarRocks, ClickHouse, and ByteHouse Technical Sessions

The DataFunCon conference showcases leading experts from Ctrip, Didi, Bilibili, and ByteDance presenting next‑generation OLAP technologies such as StarRocks, ClickHouse, and ByteHouse, covering architecture, materialized views, ELT practices, and performance optimization to guide practitioners in big‑data platform selection and implementation.

ByteHouseClickHouseOLAP
0 likes · 7 min read
DataFunCon Conference – OLAP, StarRocks, ClickHouse, and ByteHouse Technical Sessions
Java Backend Technology
Java Backend Technology
May 30, 2023 · Databases

Why count(*) Slows Down MySQL and How to Optimize It

This article explains why MySQL count(*) can become a performance bottleneck on InnoDB tables, compares different count() variants, and presents practical optimization techniques such as Redis caching, second‑level caches, parallel execution, reducing joins, and offloading analytics to ClickHouse.

ClickHouseInnoDBRedis Cache
0 likes · 10 min read
Why count(*) Slows Down MySQL and How to Optimize It
ByteDance Data Platform
ByteDance Data Platform
May 29, 2023 · Databases

Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison

Using the TPC‑DS benchmark’s 99 queries on a 1 TB dataset, this study evaluates the performance of four open‑source OLAP engines—ClickHouse, Doris, Presto, and ByConity—across basic, join, aggregation, subquery, and window‑function scenarios, revealing ByConity’s superior speed and the limitations of ClickHouse.

ByConityClickHouseOLAP
0 likes · 12 min read
Which Open‑Source OLAP Engine Wins the TPC‑DS Benchmark? A Deep Performance Comparison
WeChat Backend Team
WeChat Backend Team
May 24, 2023 · Databases

Boost ClickHouse Bitmap Queries 10x with BitBooster: Techniques & Results

This article explains how the BitBooster suite accelerates ClickHouse bitmap (BitMap) queries by up to tenfold, covering background, performance bottlenecks, single‑node and read optimizations, layout and instruction‑set enhancements, encoding dictionaries, multi‑node scaling, and real‑world benchmark results.

BitmapClickHouseoptimization
0 likes · 23 min read
Boost ClickHouse Bitmap Queries 10x with BitBooster: Techniques & Results
ITPUB
ITPUB
May 15, 2023 · Big Data

Why ClickHouse Outperforms Elasticsearch in Log Analytics: A Practical Comparison

This article compares Elasticsearch and ClickHouse for log analytics by detailing their architectures, setting up Docker‑Compose stacks, ingesting synthetic syslog data with Vector, running equivalent queries, and measuring performance, revealing ClickHouse’s superior speed in most scenarios.

ClickHouseDockerElasticsearch
0 likes · 13 min read
Why ClickHouse Outperforms Elasticsearch in Log Analytics: A Practical Comparison
Efficient Ops
Efficient Ops
May 7, 2023 · Databases

Elasticsearch vs ClickHouse: Performance Comparison for Log Analytics

This article compares Elasticsearch and ClickHouse as log‑analytics solutions, detailing their architectures, node roles, data ingestion pipelines, query capabilities, and benchmark results, ultimately showing ClickHouse’s superior performance in most tested scenarios.

ClickHouseDockerElasticsearch
0 likes · 13 min read
Elasticsearch vs ClickHouse: Performance Comparison for Log Analytics
DataFunTalk
DataFunTalk
May 5, 2023 · Big Data

NetEase Cloud Music Real-Time Data Warehouse Architecture and Low-Code Platform Practices

This article presents NetEase Cloud Music's real-time data warehouse architecture, covering its streaming and batch scenarios, layered design (ODS, CDM, ADS), technology stack choices, consistency mechanisms, the FastX low-code platform, and future development plans, offering a comprehensive technical overview for data engineers and architects.

Big DataClickHouseFlink
0 likes · 18 min read
NetEase Cloud Music Real-Time Data Warehouse Architecture and Low-Code Platform Practices
Top Architect
Top Architect
Apr 26, 2023 · Databases

Comparative Performance and Feature Analysis of Elasticsearch vs ClickHouse

This article presents a practical comparison between Elasticsearch and ClickHouse, detailing their architectures, Docker‑Compose deployment, data ingestion pipelines, a series of representative queries, and benchmark results that show ClickHouse generally outperforms Elasticsearch in basic search and aggregation scenarios.

ClickHouseDocker ComposeElasticsearch
0 likes · 14 min read
Comparative Performance and Feature Analysis of Elasticsearch vs ClickHouse
Architect
Architect
Apr 23, 2023 · Big Data

Performance Comparison of Elasticsearch and ClickHouse for Log Analytics

This article compares Elasticsearch and ClickHouse by describing their architectures, demonstrating a Docker‑compose test environment, executing equivalent queries via both systems, and presenting performance results that show ClickHouse generally outperforms Elasticsearch in basic search and aggregation scenarios for log data.

ClickHouseElasticsearchSQL
0 likes · 12 min read
Performance Comparison of Elasticsearch and ClickHouse for Log Analytics
JD Cloud Developers
JD Cloud Developers
Apr 20, 2023 · Operations

How to Build a Cost‑Effective, High‑Throughput Log Collection System with ClickHouse

This article examines the challenges of scaling log storage and retrieval for high‑traffic services, analyzes the cost and performance limits of traditional ELK‑based pipelines, and presents a streamlined, UDP‑driven architecture using ClickHouse that dramatically reduces hardware expenses while handling hundreds of gigabytes per second.

ClickHouseCost OptimizationHigh Throughput
0 likes · 16 min read
How to Build a Cost‑Effective, High‑Throughput Log Collection System with ClickHouse
Code Ape Tech Column
Code Ape Tech Column
Apr 19, 2023 · Databases

Comparative Analysis of Elasticsearch and ClickHouse: Architecture, Query Performance, and Practical Benchmarks

This article compares Elasticsearch and ClickHouse by outlining their architectures, detailing deployment configurations, presenting benchmark queries and performance results, and concluding that ClickHouse generally outperforms Elasticsearch in many basic search and aggregation scenarios, while also noting each system's strengths and limitations.

Big DataClickHouseElasticsearch
0 likes · 13 min read
Comparative Analysis of Elasticsearch and ClickHouse: Architecture, Query Performance, and Practical Benchmarks
dbaplus Community
dbaplus Community
Apr 18, 2023 · Big Data

How Bilibili Scaled Its OLAP Platform with ClickHouse and Lakehouse Integration

At Bilibili, the OLAP platform evolved through three phases—consolidating data services onto ClickHouse, migrating text search to ClickHouse, and integrating a lake‑house architecture—delivering massive cost reductions, sub‑second query latency, and scalable analytics for billions of daily events.

Big DataClickHouseData Analytics
0 likes · 15 min read
How Bilibili Scaled Its OLAP Platform with ClickHouse and Lakehouse Integration
Selected Java Interview Questions
Selected Java Interview Questions
Mar 12, 2023 · Big Data

Design and Optimization of Querying 100K Records from Tens of Millions of Data Using ClickHouse, Elasticsearch, HBase, and RediSearch

This article presents a comprehensive design and performance‑optimization study for extracting up to 100 000 records from a pool of tens of millions, comparing multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, ES + HBase, and RediSearch + RedisJSON solutions, and provides practical recommendations based on measured latency and throughput.

ClickHouseHBaseRediSearch
0 likes · 11 min read
Design and Optimization of Querying 100K Records from Tens of Millions of Data Using ClickHouse, Elasticsearch, HBase, and RediSearch
dbaplus Community
dbaplus Community
Mar 7, 2023 · Operations

How We Rescued a ClickHouse Logging Cluster After Zookeeper‑Induced Read‑Only Failure

A production logging system became unavailable due to Kafka backlog alerts, prompting an investigation that uncovered read‑only ClickHouse tables caused by mismatched Zookeeper metadata after a TTL policy change, leading to a step‑by‑step recovery involving Zookeeper restarts, metadata fixes, and table reconstruction.

ClickHouseCluster RecoveryFlink
0 likes · 9 min read
How We Rescued a ClickHouse Logging Cluster After Zookeeper‑Induced Read‑Only Failure
Tencent Cloud Developer
Tencent Cloud Developer
Mar 1, 2023 · Big Data

We Analysis User Profiling System: Architecture and Technical Implementation

We Analysis, the official data‑analysis platform for WeChat mini‑program providers, delivers a zero‑learning‑curve user‑profiling system that combines basic tag analysis and flexible, rule‑based segmentation, using an ETL pipeline to store pre‑computed data in TDSQL and online bitmap‑optimized queries in ClickHouse with RoaringBitmap, ensuring low‑latency, stable, and comprehensive analytics.

ClickHouseDataPipelineSpark
0 likes · 20 min read
We Analysis User Profiling System: Architecture and Technical Implementation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 1, 2023 · Big Data

How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain

This article describes the challenges of a highly complex supply‑chain system, the evolution from early MySQL‑based reporting to a modern real‑time data platform using Flink, Kafka, ClickHouse, Hologres and other cloud services, and the tools and lessons learned to achieve low‑latency, high‑throughput analytics.

ClickHouseFlinkKafka
0 likes · 11 min read
How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain
DeWu Technology
DeWu Technology
Feb 24, 2023 · Big Data

Real-Time Data Architecture Evolution for a Complex Supply Chain

The article traces Dewu’s supply‑chain data platform from slow MySQL reporting through early CDC‑based wide tables to a Flink‑Kafka‑ClickHouse 1.0 design, then to a more scalable Flink‑Kafka‑Hologres 2.0 architecture that solves upsert and compute‑storage separation, while detailing key operational tricks, code‑generation tools, and future plans for lake‑house integration.

Big DataClickHouseFlink
0 likes · 10 min read
Real-Time Data Architecture Evolution for a Complex Supply Chain
StarRocks
StarRocks
Feb 21, 2023 · Databases

How Yidian Tianxia Built a Unified Real‑Time & Offline Data Warehouse with StarRocks

Yidian Tianxia tackled massive daily data volumes and complex analytics by defining a five‑layer data‑warehouse standard, comparing ClickHouse and StarRocks performance, and implementing a unified real‑time/offline architecture with StarRocks, DataPlus, and EasyJob, achieving multi‑fold query speedups and lower operational costs.

ClickHouseData GovernanceData Warehouse
0 likes · 14 min read
How Yidian Tianxia Built a Unified Real‑Time & Offline Data Warehouse with StarRocks
dbaplus Community
dbaplus Community
Feb 15, 2023 · Big Data

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.

ClickHouseFlinkIceberg
0 likes · 32 min read
How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg
Big Data Technology Architecture
Big Data Technology Architecture
Feb 15, 2023 · Databases

ClickHouse Usage Guide: Table Engines, Best Practices, and Cluster Architecture

This comprehensive guide introduces ClickHouse as a high‑performance columnar DBMS, outlines its main application scenarios, details the various table engines and their creation syntax, and provides practical development, deployment, and operational recommendations for building reliable ClickHouse clusters.

ClickHouseClusterArchitectureSQLGuidelines
0 likes · 22 min read
ClickHouse Usage Guide: Table Engines, Best Practices, and Cluster Architecture
ByteDance Data Platform
ByteDance Data Platform
Feb 15, 2023 · Databases

How ByteHouse Powers Real‑Time Data Warehousing at Scale

ByteHouse, a cloud‑native data warehouse built on ClickHouse, delivers ultra‑fast real‑time and massive offline analytics with elastic scaling, addressing business needs in ByteDance and the financial sector through optimized architecture, ROI‑driven monitoring, and comprehensive operational tools.

Big DataByteHouseClickHouse
0 likes · 16 min read
How ByteHouse Powers Real‑Time Data Warehousing at Scale
Java Architect Essentials
Java Architect Essentials
Jan 31, 2023 · Big Data

Optimizing Large-Scale Data Retrieval: ClickHouse Pagination, Elasticsearch Scroll Scan, ES+HBase, and RediSearch + RedisJSON Solutions

This article examines a business requirement to filter and rank up to 100,000 records from a pool of tens of millions, presenting and evaluating four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan deep paging, an ES‑HBase combined query, and a RediSearch + RedisJSON approach—along with performance data and code examples.

ClickHouseElasticsearchHBase
0 likes · 12 min read
Optimizing Large-Scale Data Retrieval: ClickHouse Pagination, Elasticsearch Scroll Scan, ES+HBase, and RediSearch + RedisJSON Solutions
DataFunSummit
DataFunSummit
Jan 23, 2023 · Big Data

Design and Practice of the 58 Agile BI System (Starfire)

This article presents a comprehensive overview of the 58 Agile BI platform called Starfire, covering its background, technical architecture, core permission and query engine challenges, MPP cache acceleration, visualization resource library, developer services, and future development directions.

BIBig DataClickHouse
0 likes · 13 min read
Design and Practice of the 58 Agile BI System (Starfire)
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0

This article details the evolution of Ctrip's log infrastructure, describing the shift from fragmented departmental logging to a unified Elasticsearch-based platform, the migration to ClickHouse for cost‑effective, high‑performance storage, and the subsequent Log 3.0 redesign that leverages Kubernetes, sharding, and a unified query governance layer to handle petabyte‑scale data.

Big DataClickHouseCloud Native
0 likes · 16 min read
Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0
dbaplus Community
dbaplus Community
Jan 10, 2023 · Big Data

Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

This article introduces OLAP concepts, compares major OLAP solutions such as Druid, Kylin, Doris, and ClickHouse, outlines their features and suitable scenarios, and shares practical optimization techniques—including materialized views, caching, node tiering, and query tuning—to improve performance for high‑concurrency analytical workloads.

Big DataClickHouseData Warehouse
0 likes · 16 min read
Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips
Bilibili Tech
Bilibili Tech
Jan 10, 2023 · Big Data

Technical Evolution of Bilibili's PolarStar User Behavior Analysis Platform

Bilibili’s PolarStar platform evolved from Spark‑based batch jobs to a Flink‑driven real‑time pipeline and finally to a unified Iceberg‑on‑ClickHouse model, cutting query latency to seconds, saving thousands of CPU cores and hundreds of gigabytes of Redis memory while enabling complex, near‑real‑time user‑behavior analyses and scalable data‑import, rebalancing, and compression optimizations.

ClickHouseFlinkIceberg
0 likes · 30 min read
Technical Evolution of Bilibili's PolarStar User Behavior Analysis Platform
DataFunSummit
DataFunSummit
Jan 9, 2023 · Big Data

JD Data‑Driven Business Development: Building a Business Metric Data System and Marketplace Governance

The article outlines JD's data‑driven business development strategy, describing the current challenges of its business data marketplace, the governance framework—including layered architecture, standardization, ClickHouse dictionary refresh, and optimization measures—and the resulting performance improvements and future outlook.

Big DataClickHouseData Governance
0 likes · 13 min read
JD Data‑Driven Business Development: Building a Business Metric Data System and Marketplace Governance
21CTO
21CTO
Jan 7, 2023 · Big Data

How WeChat’s WeAnalysis Powers Scalable User Segmentation with Big Data Architecture

This article explains the design and implementation of WeChat's WeAnalysis image system, covering its basic tag and user‑group modules, multi‑source data ingestion, ETL processing, storage choices such as TDSQL and ClickHouse, bitmap handling, query performance, and service APIs for flexible, high‑performance user segmentation.

ClickHouseData AnalyticsSpark
0 likes · 20 min read
How WeChat’s WeAnalysis Powers Scalable User Segmentation with Big Data Architecture
DataFunTalk
DataFunTalk
Jan 5, 2023 · Big Data

Five Optimization Strategies for Improving DataTester Query Performance

This article describes how DataTester, Volcano Engine's A/B testing platform, achieved over four‑fold query speedup by applying five technical optimizations—including pre‑aggregation, join reduction, GroupBy redesign, AU‑metric caching, and asynchronous query handling—targeting both data construction and execution layers.

A/B testingClickHouseDataTester
0 likes · 12 min read
Five Optimization Strategies for Improving DataTester Query Performance
ByteDance Data Platform
ByteDance Data Platform
Jan 4, 2023 · Databases

How ByteHouse Enhances ClickHouse with Resource Isolation and High Availability

This article explains how ByteHouse, an enhanced version of ClickHouse used at ByteDance, adds full upsert support, multi‑table joins, high‑availability features, and, most importantly, a Resource Group mechanism that provides fine‑grained CPU, memory, and concurrency isolation to improve query performance and stability.

ByteHouseClickHouseConcurrency Control
0 likes · 8 min read
How ByteHouse Enhances ClickHouse with Resource Isolation and High Availability
Code Ape Tech Column
Code Ape Tech Column
Jan 3, 2023 · Big Data

Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide

This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a step‑by‑step deployment guide for a private data pipeline using Zookeeper, Kafka, FileBeat, and ClickHouse, along with common issues and their solutions.

Big DataClickHouseDeployment
0 likes · 15 min read
Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide
ITPUB
ITPUB
Jan 2, 2023 · Databases

Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

This article introduces OLAP concepts, compares major OLAP engines such as Druid, Kylin, Doris, and ClickHouse, outlines real‑world application scenarios, and provides detailed optimization techniques—including materialized views, caching, tiered storage, and skip‑index configurations—to improve query performance.

AnalyticsClickHouseData Warehouse
0 likes · 16 min read
Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 31, 2022 · Databases

Understanding ClickHouse Performance: Storage Engine and Compute Engine Perspectives

This article explains why ClickHouse delivers high query speed by detailing storage‑engine optimizations such as pre‑sorting, columnar layout and compression, and compute‑engine techniques like vectorized execution, built‑in functions and minimal join usage, while also promoting the related book and giveaway.

Big DataClickHouseOLAP
0 likes · 9 min read
Understanding ClickHouse Performance: Storage Engine and Compute Engine Perspectives
Efficient Ops
Efficient Ops
Dec 29, 2022 · Operations

How eBay Scales Its Event Platform with ClickHouse and Kubernetes

This article details eBay's event platform architecture, explaining why a dedicated event system is needed, how ClickHouse provides high‑performance storage, the use of Kubernetes CRDs for cross‑region high availability, data routing, read/write separation, and query optimizations with LogQL.

ClickHouseEvent PlatformKubernetes
0 likes · 18 min read
How eBay Scales Its Event Platform with ClickHouse and Kubernetes
Selected Java Interview Questions
Selected Java Interview Questions
Dec 29, 2022 · Backend Development

Optimizing Large‑Scale Data Retrieval with ClickHouse, Elasticsearch Scroll Scan, ES+HBase, and RediSearch+RedisJSON

This article examines a business requirement to filter up to 100 000 records from a pool of tens of millions, presenting and evaluating four backend solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑HBase hybrid, and RediSearch + RedisJSON—along with performance data and implementation details.

BackendClickHouseData Retrieval
0 likes · 11 min read
Optimizing Large‑Scale Data Retrieval with ClickHouse, Elasticsearch Scroll Scan, ES+HBase, and RediSearch+RedisJSON