Tagged articles
469 articles
Page 3 of 5
Selected Java Interview Questions
Selected Java Interview Questions
Dec 29, 2022 · Backend Development

Optimizing Large‑Scale Data Retrieval with ClickHouse, Elasticsearch Scroll Scan, ES+HBase, and RediSearch+RedisJSON

This article examines a business requirement to filter up to 100 000 records from a pool of tens of millions, presenting and evaluating four backend solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑HBase hybrid, and RediSearch + RedisJSON—along with performance data and implementation details.

BackendData RetrievalHBase
0 likes · 11 min read
Optimizing Large‑Scale Data Retrieval with ClickHouse, Elasticsearch Scroll Scan, ES+HBase, and RediSearch+RedisJSON
Sohu Tech Products
Sohu Tech Products
Dec 28, 2022 · Databases

Using ClickHouse for High‑Performance Keyword Hit Statistics

This article presents the background and challenges of large‑scale keyword hit statistics, explains why traditional MySQL solutions struggle, and details how ClickHouse’s columnar storage, vectorized execution, and distributed architecture provide fast, scalable analytics, including cluster setup, table schema, queries, and migration lessons.

Columnar DatabaseKeyword Statisticsclickhouse
0 likes · 20 min read
Using ClickHouse for High‑Performance Keyword Hit Statistics
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Dec 22, 2022 · Backend Development

NetEase Cloud Music Tracking Management Platform: Architecture and Implementation Practices

NetEase Cloud Music’s tracking management platform, built on the Dawn Tracking solution, provides a unified, version‑controlled system for client‑side data events that automates code generation, real‑time validation, change visualization, and automatic rebasing, while supporting bridge‑mode integration and aiming to extend into server‑side and hybrid tracking.

Data AnalyticsNetEase Cloud MusicR&D platform
0 likes · 9 min read
NetEase Cloud Music Tracking Management Platform: Architecture and Implementation Practices
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 21, 2022 · Big Data

OLAP Technology Overview, Selection, and Optimization Practices

This article introduces OLAP concepts, compares ROLAP, MOLAP, and HOLAP, evaluates mainstream OLAP engines such as Druid, Kylin, Doris, and ClickHouse, and presents practical optimization techniques including materialized views, caching, tiered storage, and query tuning for large‑scale analytical workloads.

DruidOLAPclickhouse
0 likes · 17 min read
OLAP Technology Overview, Selection, and Optimization Practices
DataFunSummit
DataFunSummit
Dec 20, 2022 · Big Data

JD Retail Big Data OLAP Application and Practice

This talk presents JD Retail’s big‑data OLAP solution, covering the massive, variable and complex traffic data challenges, the custom data‑ingestion and versioned update tools, ClickHouse query‑architecture upgrades, optimization techniques, and future plans for multi‑cluster querying and pre‑computation.

Big DataJD RetailOLAP
0 likes · 21 min read
JD Retail Big Data OLAP Application and Practice
DataFunTalk
DataFunTalk
Dec 19, 2022 · Big Data

Evolution of OLAP: Key Technologies, Engine Comparison, and Future Trends

This article provides a comprehensive overview of OLAP technology evolution, covering its origins, modern requirements for massive and real‑time data, detailed comparisons of major open‑source OLAP engines such as Druid, Elasticsearch, Kylin, Doris/StarRocks, and ClickHouse, core architectural and storage techniques, and emerging trends like federated queries, hybrid storage, and lakehouse integration.

DruidOLAPclickhouse
0 likes · 22 min read
Evolution of OLAP: Key Technologies, Engine Comparison, and Future Trends
ITPUB
ITPUB
Dec 18, 2022 · Databases

Why ClickHouse Is So Fast: Deep Dive into Storage and Compute Engine Optimizations

This article explains how ClickHouse achieves high query performance by leveraging storage‑engine designs such as pre‑sorting, columnar layout, and block‑level compression, and by exploiting a vectorized compute engine while avoiding joins and using built‑in functions.

Big DataColumnar StorageDatabase Performance
0 likes · 9 min read
Why ClickHouse Is So Fast: Deep Dive into Storage and Compute Engine Optimizations
dbaplus Community
dbaplus Community
Dec 13, 2022 · Big Data

How ClickHouse Powers Real-Time Self-Service Analytics at Scale

Facing massive daily data volumes and complex, ad‑hoc analytical needs, Zhaozhuan’s engineering team evaluated multiple OLAP engines and chose ClickHouse, then built a four‑layer self‑service analytics platform, detailing architecture, use‑cases, performance tuning, large‑scale joins, and future roadmap challenges.

Big DataData ArchitectureOLAP
0 likes · 14 min read
How ClickHouse Powers Real-Time Self-Service Analytics at Scale
Efficient Ops
Efficient Ops
Dec 13, 2022 · Databases

How ClickHouse Replicates MySQL in Real-Time: A Step-by-Step Guide

This article explains how to configure ClickHouse as a MySQL replica using the MaterializeMySQL engine, covering code acquisition, MySQL master setup, ClickHouse slave configuration, handling of delete and update operations, and the underlying replication mechanism with practical code examples.

Database SyncMaterializeMySQLReal-Time
0 likes · 10 min read
How ClickHouse Replicates MySQL in Real-Time: A Step-by-Step Guide
ITPUB
ITPUB
Dec 10, 2022 · Big Data

How ClickHouse Powers Real-Time Self-Service Analytics at Scale

This article examines why ClickHouse was chosen as the OLAP engine for a massive self‑service analytics platform, describes the system architecture, shares concrete memory and performance tuning parameters, and outlines current challenges and future roadmap for large‑scale real‑time data analysis.

Big DataData ArchitectureOLAP
0 likes · 14 min read
How ClickHouse Powers Real-Time Self-Service Analytics at Scale
Zhuanzhuan Tech
Zhuanzhuan Tech
Dec 7, 2022 · Databases

ClickHouse in Self‑Service Analytics: OLAP Selection, Platform Architecture, Optimization Practices, and Future Outlook

This article examines the selection of ClickHouse as the OLAP engine for a self‑service analytics platform, describes the platform’s architecture, details memory and performance tuning techniques, discusses large‑scale join handling, and outlines current challenges and future development directions for ClickHouse.

Data ArchitectureOLAPSelf-Service Analytics
0 likes · 12 min read
ClickHouse in Self‑Service Analytics: OLAP Selection, Platform Architecture, Optimization Practices, and Future Outlook
Su San Talks Tech
Su San Talks Tech
Nov 16, 2022 · Databases

Why count(*) Slows Down MySQL and How to Optimize It

This article explains why MySQL's count(*) can become a performance bottleneck, especially with InnoDB, and presents practical optimization techniques such as Redis caching, second‑level in‑memory caches, parallel execution, reducing unnecessary joins, and using column‑store databases like ClickHouse.

_countclickhousemysql
0 likes · 10 min read
Why count(*) Slows Down MySQL and How to Optimize It
Su San Talks Tech
Su San Talks Tech
Nov 16, 2022 · Databases

Why count(*) Is Slow in MySQL InnoDB and How to Speed It Up

This article explains why MySQL's count(*) can be slow on InnoDB, compares different count variations, and presents practical optimization techniques such as Redis caching, second‑level caches, multithreading, reducing joins, and using ClickHouse for massive datasets.

InnoDB_countclickhouse
0 likes · 12 min read
Why count(*) Is Slow in MySQL InnoDB and How to Speed It Up
DataFunTalk
DataFunTalk
Nov 10, 2022 · Big Data

Enhancing ClickHouse Resource Isolation with ByteHouse Resource Group

This article explains how ByteHouse extends ClickHouse with a Resource Group mechanism that provides fine‑grained concurrency, memory, and CPU isolation, improving query latency, reducing variance, and increasing cluster stability for large‑scale ad‑tech workloads.

ByteHouseConcurrency ControlResource Isolation
0 likes · 8 min read
Enhancing ClickHouse Resource Isolation with ByteHouse Resource Group
DataFunTalk
DataFunTalk
Nov 3, 2022 · Databases

Enhancing ClickHouse High Availability: Reducing Zookeeper Load, Faster Recovery, and Additional Reliability Improvements

ByteDance’s article details the high‑availability challenges of ClickHouse in large‑scale deployments—such as frequent failures, long recovery times, and operational complexity—and explains three key enhancements: a new HaMergeTree engine to lessen Zookeeper load, RocksDB‑based metadata persistence for faster restarts, and additional reliability features like HaKafka and monitoring tools.

Database EngineeringHaMergeTreeMetadata Persistence
0 likes · 10 min read
Enhancing ClickHouse High Availability: Reducing Zookeeper Load, Faster Recovery, and Additional Reliability Improvements
Selected Java Interview Questions
Selected Java Interview Questions
Oct 23, 2022 · Big Data

Building a Cost‑Effective Data Analysis Platform: ClickHouse vs Elasticsearch and Deployment Guide for Zookeeper, Kafka, Filebeat, and ClickHouse

This article compares Elasticsearch and ClickHouse for log analytics, presents cost‑benefit calculations, and provides a step‑by‑step deployment guide for Zookeeper, Kafka, Filebeat, and ClickHouse to build a scalable, low‑cost data analysis platform for SaaS services.

Big DataDeploymentElasticsearch
0 likes · 12 min read
Building a Cost‑Effective Data Analysis Platform: ClickHouse vs Elasticsearch and Deployment Guide for Zookeeper, Kafka, Filebeat, and ClickHouse
ITPUB
ITPUB
Oct 21, 2022 · Databases

How We Replaced Elasticsearch with ClickHouse for High‑Performance Log Storage

Facing rapid growth, our team evaluated ClickHouse’s hot‑cold storage and tiered‑disk policies to replace Elasticsearch, designing partitioning, TTL, and multi‑level storage strategies—including hot, cold, and archive disks, custom storage policies, and OSS integration—to achieve higher write throughput, better compression, and over 50% cost reduction.

Cold Hot SeparationCost OptimizationTTL
0 likes · 22 min read
How We Replaced Elasticsearch with ClickHouse for High‑Performance Log Storage
dbaplus Community
dbaplus Community
Oct 19, 2022 · Backend Development

How Trace2.0 Cuts Tracing Costs by 66% with Tail Sampling and ClickHouse

This article details the design of Trace2.0, a next‑generation distributed tracing platform built on OpenTelemetry, covering its end‑to‑end architecture, tail sampling with hot‑cold storage, Bloom‑filter implementation, and a self‑built ClickHouse storage layer that reduces storage costs by two‑thirds while improving query performance.

Backend ArchitectureOpenTelemetryclickhouse
0 likes · 14 min read
How Trace2.0 Cuts Tracing Costs by 66% with Tail Sampling and ClickHouse
DaTaobao Tech
DaTaobao Tech
Oct 19, 2022 · Databases

Overview of LSM‑Tree Architecture and Its Use in Modern Databases

LSM‑Tree stores writes in an in‑memory MemTable then flushes ordered SSTables to disk, using Bloom filters and indexes to speed reads, while periodic compactions merge files; modern systems such as LevelDB, HBase, and ClickHouse adopt this design to achieve high write throughput despite slower point and range queries and occasional compaction overhead.

HBaseLSM‑TreeLevelDB
0 likes · 11 min read
Overview of LSM‑Tree Architecture and Its Use in Modern Databases
dbaplus Community
dbaplus Community
Oct 18, 2022 · Databases

Mastering ClickHouse: Practical Table Engine Choices and Cluster Best Practices

This guide explains ClickHouse’s core concepts, application scenarios, table engine families, detailed engine configurations, SQL development standards, cluster architecture, ZooKeeper’s role, chproxy usage, client tool options, availability considerations, and performance‑tuning parameters for high‑throughput OLAP workloads.

ClusterTable Engineclickhouse
0 likes · 26 min read
Mastering ClickHouse: Practical Table Engine Choices and Cluster Best Practices
dbaplus Community
dbaplus Community
Oct 11, 2022 · Big Data

How We Replaced Elasticsearch with ClickHouse for Faster, Cheaper Log Storage

Facing growing log volumes and compliance needs, we evaluated ClickHouse’s hot‑cold‑archive storage to replace Elasticsearch, detailing configuration of storage policies, partitioning strategies, table creation, TTL handling, and cost‑effective OSS integration, ultimately achieving higher write performance and over 50% storage cost reduction.

Big DataCold Hot ArchitectureOSS
0 likes · 22 min read
How We Replaced Elasticsearch with ClickHouse for Faster, Cheaper Log Storage
DataFunTalk
DataFunTalk
Oct 11, 2022 · Databases

Enhancing ClickHouse Multi‑Table Join Capability with ByteHouse

This article explains the limitations of ClickHouse for multi‑table joins, describes ByteHouse’s staged execution model, various join strategies (Shuffle, Broadcast, Colocate) and runtime filters, and presents performance benchmarks that show significant speed‑ups over the original ClickHouse engine.

ByteHouseDatabase OptimizationMulti-Table Join
0 likes · 10 min read
Enhancing ClickHouse Multi‑Table Join Capability with ByteHouse
DataFunSummit
DataFunSummit
Oct 7, 2022 · Databases

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements

This article explains how ByteHouse (a heavily optimized ClickHouse variant) tackles complex query challenges by introducing a multi‑stage execution model, exchange mechanisms, runtime filters, and network optimizations, and it presents performance results and future directions for large‑scale OLAP workloads.

ByteHouseDatabase OptimizationDistributed Query
0 likes · 21 min read
Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Performance Enhancements
Bilibili Tech
Bilibili Tech
Sep 30, 2022 · Big Data

From BitMap to RoaringBitmap: Principles, Performance, and Big Data Applications

RoaringBitmap improves traditional BitMap by lazily allocating four container types, compressing sparse data, and dynamically switching between array, bitmap, and run containers, enabling fast exact set operations that power big‑data systems such as Kylin, ClickHouse, and B‑Station’s user‑visit and crowd‑package pipelines, dramatically reducing memory use and processing latency.

Big DataBitmap CompressionData Structures
0 likes · 16 min read
From BitMap to RoaringBitmap: Principles, Performance, and Big Data Applications
dbaplus Community
dbaplus Community
Sep 26, 2022 · Backend Development

How Ctrip Replaced HBase with VictoriaMetrics & ClickHouse for Scalable Metrics Monitoring

Ctrip’s internal Dashboard monitoring platform, originally built on HBase, was redesigned by migrating its core writer and storage components to a hybrid VictoriaMetrics‑ClickHouse solution, delivering faster queries, higher write stability, and full Prometheus compatibility while keeping the user experience unchanged.

DashboardHBaseMetrics
0 likes · 13 min read
How Ctrip Replaced HBase with VictoriaMetrics & ClickHouse for Scalable Metrics Monitoring
DataFunSummit
DataFunSummit
Sep 24, 2022 · Big Data

Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks

The article details how 37 Mobile Games built and continuously evolved a multi-dimensional analytics platform—covering business background, data challenges, the migration from MySQL through Druid, Impala, ClickHouse to StarRocks, self‑service data tools, monitoring, and future roadmap—highlighting technical decisions and lessons learned.

ImpalaOLAPStarRocks
0 likes · 20 min read
Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks
Big Data Technology Architecture
Big Data Technology Architecture
Sep 17, 2022 · Databases

Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry

This article describes how Bilibili redesigned its log service by replacing Elasticsearch with ClickHouse, introducing OpenTelemetry‑based logging, optimizing storage, query, and alerting components, and enhancing ClickHouse features such as configuration tuning, Map types, and implicit columns to achieve higher performance, lower cost, and better observability.

Database OptimizationOpenTelemetryclickhouse
0 likes · 28 min read
Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry
ITPUB
ITPUB
Sep 16, 2022 · Big Data

How Bilibili Re‑engineered Its Log Service with ClickHouse and OpenTelemetry for 10× Performance

Bilibili redesigned its five‑year‑old ELK‑based log platform by replacing Elasticsearch with ClickHouse, adopting OpenTelemetry for unified log ingestion, and building a custom visualization and alerting system, achieving tenfold write throughput, one‑third storage cost, and dramatically faster query response times.

OpenTelemetryclickhouselog infrastructure
0 likes · 28 min read
How Bilibili Re‑engineered Its Log Service with ClickHouse and OpenTelemetry for 10× Performance
Bilibili Tech
Bilibili Tech
Sep 16, 2022 · Big Data

Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry

Bilibili’s Log Service 2.0 replaces its Elastic‑Stack pipeline with an OpenTelemetry‑driven architecture that writes logs via high‑performance Go/Java SDKs to ClickHouse, delivering ten‑fold write throughput, two‑fold query speed, one‑third storage cost, a custom query gateway, visualization UI, and advanced alerting.

OpenTelemetryclickhouselog infrastructure
0 likes · 27 min read
Design and Optimization of Bilibili Log Service 2.0 Using ClickHouse and OpenTelemetry
Aikesheng Open Source Community
Aikesheng Open Source Community
Sep 13, 2022 · Databases

Using ClickHouse Materialized Views: Creation, Testing, and Time‑Zone Issue Resolution

This article explains how to create a ClickHouse materialized view that aggregates per‑minute data from a per‑second table, demonstrates insertion and query tests, investigates an unexpected 1970‑01‑01 timestamp caused by time‑zone handling, and provides the corrected view definition aligning field names.

Time Zoneclickhousedatabase
0 likes · 8 min read
Using ClickHouse Materialized Views: Creation, Testing, and Time‑Zone Issue Resolution
ITPUB
ITPUB
Sep 12, 2022 · Databases

How ByteHouse Transforms ClickHouse for Complex Queries: Multi‑Stage Execution and Real‑World Optimizations

This article explains how ByteHouse, a heavily optimized fork of ClickHouse, introduces a multi‑stage execution model, advanced exchange mechanisms, and runtime filters to overcome the limitations of the original two‑stage query flow, delivering significant performance gains for complex joins, aggregations, and large‑scale analytics workloads.

ByteHouseDatabase EngineeringDistributed Query
0 likes · 22 min read
How ByteHouse Transforms ClickHouse for Complex Queries: Multi‑Stage Execution and Real‑World Optimizations
Selected Java Interview Questions
Selected Java Interview Questions
Sep 9, 2022 · Databases

Performance Testing and Optimization of ClickHouse and Elasticsearch for High-Concurrency Scenarios

This technical report details the requirement analysis, environment setup, monitoring tools, load‑test scripts, data design, execution results, and optimization recommendations for stress‑testing ClickHouse and Elasticsearch to ensure they can handle high‑concurrency business peaks.

Big DataDatabase OptimizationElasticsearch
0 likes · 11 min read
Performance Testing and Optimization of ClickHouse and Elasticsearch for High-Concurrency Scenarios
DataFunTalk
DataFunTalk
Sep 5, 2022 · Databases

Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Runtime Filters

This article explains how ByteHouse, a heavily optimized ClickHouse variant, addresses complex query challenges by introducing a multi‑stage execution model, sophisticated exchange management, various join strategies, runtime filters, and diagnostic metrics to improve performance, scalability, and resource utilization in large‑scale data environments.

ByteHouseExchange ManagerRuntime Filter
0 likes · 21 min read
Optimizing Complex Queries in ClickHouse: Multi‑Stage Execution, Exchange Management, and Runtime Filters
DeWu Technology
DeWu Technology
Sep 2, 2022 · Operations

Design and Implementation of Trace2.0 Distributed Tracing Platform

Trace2.0 is an OpenTelemetry‑based distributed tracing platform that collects billions of spans daily, routes data through a control plane, OTel Server, and Kafka to ClickHouse hot‑cold storage with tail sampling, achieving 66% cost reduction, 12× compression, sub‑second query latency, and plans to offload raw spans to object storage.

Backend ArchitectureDistributed TracingOpenTelemetry
0 likes · 12 min read
Design and Implementation of Trace2.0 Distributed Tracing Platform
DataFunTalk
DataFunTalk
Sep 1, 2022 · Big Data

Evolution and Construction of Huolala's OLAP System Based on Doris

This presentation details Huolala's journey from its initial OLAP architecture to a multi‑engine platform, describing background, data‑flow layers, technical research, engine selection (Druid, ClickHouse, Doris), POC validation, performance tuning, stability measures, production rollout, problem analysis, and future roadmap.

DruidHuolalaOLAP
0 likes · 17 min read
Evolution and Construction of Huolala's OLAP System Based on Doris
Selected Java Interview Questions
Selected Java Interview Questions
Aug 27, 2022 · Backend Development

Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide

This article compares Elasticsearch and ClickHouse for log analytics, presents cost analysis, and provides detailed deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse to build a private, high‑performance backend data platform for SaaS services.

ElasticsearchFilebeatKafka
0 likes · 12 min read
Deploying a Cost‑Effective ClickHouse‑Based Backend Data Platform: Comparison with Elasticsearch and Step‑by‑Step Setup Guide
ByteDance Data Platform
ByteDance Data Platform
Aug 22, 2022 · Databases

How ByteHouse Supercharges ClickHouse with Upsert, Joins, and High Availability

ByteHouse, built on ClickHouse, addresses key limitations such as missing upsert/delete, weak multi‑table joins, scalability issues, and lack of resource isolation by introducing a modular, stage‑based execution engine, advanced join strategies, runtime filters, and a custom optimizer, delivering dramatically faster query performance.

ByteHouseDatabase OptimizationMulti-Table Join
0 likes · 11 min read
How ByteHouse Supercharges ClickHouse with Upsert, Joins, and High Availability
IT Architects Alliance
IT Architects Alliance
Aug 13, 2022 · Operations

Why ClickHouse Beats Elasticsearch: Performance, Cost, and Deployment Guide

This article compares ClickHouse and Elasticsearch, analyzes cost savings, and provides step‑by‑step deployment instructions for Zookeeper, Kafka, Filebeat, and ClickHouse clusters, including configuration details, troubleshooting tips, and practical code snippets for building a scalable analytics pipeline.

DeploymentElasticsearchFilebeat
0 likes · 13 min read
Why ClickHouse Beats Elasticsearch: Performance, Cost, and Deployment Guide
37 Interactive Technology Team
37 Interactive Technology Team
Aug 8, 2022 · Backend Development

Time Management in Programming: Concepts, Practices, and Common Pitfalls

Time management in programming spans human concepts of time, language-specific handling of zones and timestamps, 32‑bit overflow risks, sync versus async processing, log timestamping, business‑level period calculations, and common pitfalls, emphasizing that mastering these nuances prevents bugs, improves performance, and enables reliable analytics.

ETLPHPasynchronous processing
0 likes · 20 min read
Time Management in Programming: Concepts, Practices, and Common Pitfalls
DataFunTalk
DataFunTalk
Jul 28, 2022 · Databases

ClickHouse Overview and the Top 5 Features Released in 2021

This article provides a comprehensive overview of ClickHouse, covering its origins, core characteristics, and the five most important features introduced in 2021—including JIT acceleration, Lambda‑based UDFs, native window functions, zero‑copy replication for S3/HDFS, and the Projection mechanism—highlighting why it remains a leading high‑performance OLAP database for big‑data analytics.

OLAPProjectionclickhouse
0 likes · 15 min read
ClickHouse Overview and the Top 5 Features Released in 2021
DataFunTalk
DataFunTalk
Jul 26, 2022 · Big Data

Feature Platform Architecture and Stream‑Batch Integrated Solutions

This talk presents Shuhe Technology’s feature platform, detailing its four‑layer architecture, feature storage services, stream‑batch integrated processing, event‑center design, consistency models, and four model‑strategy invocation schemes, illustrating data flows from MySQL through Sqoop, Kafka, Flink, HBase and ClickHouse.

Big DataFlinkHBase
0 likes · 17 min read
Feature Platform Architecture and Stream‑Batch Integrated Solutions
Bilibili Tech
Bilibili Tech
Jul 23, 2022 · Backend Development

API Gateway Evolution and Engineering Practices; Applying ClickHouse for Massive Data Processing

The talk traces the evolution of API Gateway architectures and the engineering practices—design patterns, deployment strategies, and operational considerations—required for scalable, reliable services, then demonstrates how ClickHouse can be leveraged for massive data workloads, highlighting practical scenarios, performance optimizations, and key lessons learned.

Big DataEngineeringapi-gateway
0 likes · 1 min read
API Gateway Evolution and Engineering Practices; Applying ClickHouse for Massive Data Processing
Bilibili Tech
Bilibili Tech
Jul 22, 2022 · Backend Development

GIAC Global Internet Architecture Conference: API Gateway and ClickHouse Practices

Senior Bilibili infrastructure engineers Chen Zhihui and Hu Fuwang will present at the GIAC Global Internet Architecture Conference on the 22nd‑23rd, discussing the evolution and engineering of API Gateways and their practical use of ClickHouse for large‑scale data analytics, inviting active participation and exchange.

B站api-gatewayclickhouse
0 likes · 1 min read
GIAC Global Internet Architecture Conference: API Gateway and ClickHouse Practices
dbaplus Community
dbaplus Community
Jul 19, 2022 · Cloud Native

How to Build a Scalable Kubernetes Log System with ClickHouse and Fluent‑Bit

This article explains why Stone Docs switched from SLS/ES to ClickHouse for Kubernetes log storage, outlines the four‑stage architecture (collection, transmission, storage, management), compares DaemonSet, network, and SideCar collection methods, and provides concrete ClickHouse table definitions and Fluent‑Bit configurations for a production‑grade logging pipeline.

Cloud NativeFluent BitKafka
0 likes · 14 min read
How to Build a Scalable Kubernetes Log System with ClickHouse and Fluent‑Bit
Youzan Coder
Youzan Coder
Jul 7, 2022 · Big Data

Optimizing Apache Doris Performance: A Case Study in Query Processing

Youzan replaced ClickHouse and Druid with Apache Doris, refined its vectorized engine by eliminating deserialization overhead in the merge‑aggregation phase, achieving roughly a 30 % query‑time boost, and validated compatibility through SQL rewriting and traffic replay, while planning further SIMD‑based optimizations and broader adoption.

Apache DorisDruidOLAP
0 likes · 8 min read
Optimizing Apache Doris Performance: A Case Study in Query Processing
DataFunTalk
DataFunTalk
Jul 6, 2022 · Databases

From ClickHouse to ByteHouse: Technical Optimizations and Production Practices

The whitepaper “From ClickHouse to ByteHouse” details ByteDance’s large‑scale deployment of ClickHouse, the challenges of moving it to production, and the key optimizations ByteHouse introduces—including custom table engines, a revamped query optimizer, and elastic compute‑storage separation—to achieve petabyte‑level OLAP performance.

Analytical DatabasesByteHouseOLAP
0 likes · 6 min read
From ClickHouse to ByteHouse: Technical Optimizations and Production Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2022 · Big Data

Curated List of Big Data Resources: ClickHouse, Apache Doris, and Apache Hudi

This article compiles a comprehensive set of Chinese-language resources covering major big-data technologies such as ClickHouse, Apache Doris, and Apache Hudi, including series on distributed tables, MergeTree, replication, optimization techniques, and practical tutorials, with direct links to each detailed guide.

Apache DorisApache HudiBig Data
0 likes · 6 min read
Curated List of Big Data Resources: ClickHouse, Apache Doris, and Apache Hudi
dbaplus Community
dbaplus Community
Jun 14, 2022 · Big Data

How Qunar Built a Scalable BI Platform for Real‑Time Analytics and Self‑Service Reporting

This article details Qunar's multi‑year journey of designing and evolving a full‑stack BI platform—covering data ingestion, storage, query engines, self‑service analytics, and real‑time OLAP—by iterating through three development phases, selecting technologies such as Impala, Kudu, ClickHouse and Apache Druid, and addressing performance, usability and governance challenges to empower business users with fast, reliable data insights.

Apache DruidBIBig Data
0 likes · 24 min read
How Qunar Built a Scalable BI Platform for Real‑Time Analytics and Self‑Service Reporting
Big Data Technology Architecture
Big Data Technology Architecture
Jun 14, 2022 · Big Data

Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans

This presentation details the background, redesign, and migration of a large‑scale data platform at Dangbei Network Technology, focusing on the adoption of Apache DolphinScheduler, ClickHouse migration, storage and compute separation, monitoring solutions, and the roadmap for future upgrades and open‑source involvement.

Apache DolphinSchedulerHAPlatform Migration
0 likes · 12 min read
Applying Apache DolphinScheduler in a Big Data Platform: Architecture, Migration, and Future Plans
Top Architect
Top Architect
Jun 6, 2022 · Big Data

Optimizing Large‑Scale Data Pagination with ClickHouse, Elasticsearch, HBase, and Redis

This article presents a comprehensive analysis and multiple optimization strategies—including multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑HBase hybrid approach, and RediSearch + RedisJSON—to efficiently filter and sort up to 10 W records from a pool of tens of millions while reducing query latency and system complexity.

HBaseclickhousepagination
0 likes · 11 min read
Optimizing Large‑Scale Data Pagination with ClickHouse, Elasticsearch, HBase, and Redis
Snowball Engineer Team
Snowball Engineer Team
Jun 6, 2022 · Databases

Deep Dive into ClickHouse Join Implementation and Optimization Techniques

This article examines ClickHouse's join mechanisms, detailing the limitations of standard joins, the advantages of Global joins, and optimization strategies such as hash and merge joins, subquery filtering, and memory considerations, illustrated with SQL examples and source‑code analysis.

Distributed SystemsHash JoinJOIN optimization
0 likes · 16 min read
Deep Dive into ClickHouse Join Implementation and Optimization Techniques
IT Architects Alliance
IT Architects Alliance
Jun 5, 2022 · Big Data

Optimizing 10K‑Record Queries from Tens of Millions: CK, ES, HBase & Redis Strategies

This article examines a real‑world requirement to extract no more than 100 000 rows from a pool of tens of millions, comparing multithreaded ClickHouse pagination, Elasticsearch scroll‑scan deep paging, an ES‑HBase hybrid query, and a RediSearch‑RedisJSON approach, and presents performance measurements and practical conclusions.

ElasticsearchHBaseLargeScaleQuery
0 likes · 12 min read
Optimizing 10K‑Record Queries from Tens of Millions: CK, ES, HBase & Redis Strategies
DataFunSummit
DataFunSummit
Jun 3, 2022 · Big Data

Building and Optimizing JD Retail OLAP Platform: Architecture, Management, and Performance Techniques

This article details JD Retail's OLAP platform construction, covering control plane design, architecture, business and operation management, real‑time data updates, materialized view usage, join optimizations, high‑concurrency and high‑throughput scenarios, and promotional preparation strategies, illustrated with diagrams and performance metrics.

Big DataDistributed SystemsOLAP
0 likes · 20 min read
Building and Optimizing JD Retail OLAP Platform: Architecture, Management, and Performance Techniques
ByteDance Data Platform
ByteDance Data Platform
May 30, 2022 · Databases

How UniqueMergeTree Boosts Real-Time Updates in ClickHouse Column Stores

UniqueMergeTree, a new ClickHouse table engine, addresses real‑time data update challenges by combining upsert semantics, unique key enforcement, and efficient delete‑bitmap handling, offering higher query performance at modest write cost, with detailed design, sharding strategies, conflict resolution, and performance evaluation.

Columnar StorageDatabase EngineReal-time Updates
0 likes · 14 min read
How UniqueMergeTree Boosts Real-Time Updates in ClickHouse Column Stores
dbaplus Community
dbaplus Community
May 24, 2022 · Big Data

How Vipshop Replaced ELK with ClickHouse for a Scalable, Low‑Cost Log System

Vipshop’s Dragonfly log platform evolved from a costly 260‑node Elasticsearch cluster to a ClickHouse‑based architecture that uses a unified JSON format, vfilebeat ingestion, Flink parsing, and MergeTree storage to achieve high‑throughput writes, fast vectorized queries, flexible TTL management, and dramatically lower operational expenses.

EFKFlinkKafka
0 likes · 20 min read
How Vipshop Replaced ELK with ClickHouse for a Scalable, Low‑Cost Log System
Efficient Ops
Efficient Ops
May 24, 2022 · Cloud Native

How AutoTagging and MultistageCodec Transform Cloud‑Native Observability

This article explores the challenges of building a unified observability data platform for hybrid‑cloud microservices, examines six common data‑island scenarios, and presents DeepFlow's AutoTagging and MultistageCodec techniques that dramatically reduce tagging overhead and storage costs while enabling seamless cross‑data correlation.

Microservicesauto-taggingclickhouse
0 likes · 11 min read
How AutoTagging and MultistageCodec Transform Cloud‑Native Observability
Big Data Technology & Architecture
Big Data Technology & Architecture
May 18, 2022 · Databases

Understanding ClickHouse Distributed JOIN Implementation and Best Practices

This article explains ClickHouse's single‑node and distributed JOIN mechanisms, compares ordinary, GLOBAL, Broadcast, Shuffle and Colocate JOINs, illustrates execution flows with code examples, and provides practical recommendations to reduce join size, avoid query amplification, and leverage data pre‑distribution for optimal performance.

Big Dataclickhouseperformance
0 likes · 10 min read
Understanding ClickHouse Distributed JOIN Implementation and Best Practices
DataFunTalk
DataFunTalk
May 18, 2022 · Big Data

Building and Optimizing JD Retail OLAP Platform: Architecture, Real‑time Updates, Materialized Views, and Join Optimization

This article presents JD Retail's OLAP platform construction and practical scenarios, covering control‑plane design, architecture, business management, operational safeguards, real‑time data updates, materialized view acceleration, join optimization techniques, high‑concurrency queries, and large‑scale write throughput for e‑commerce peak periods.

Big DataOLAPclickhouse
0 likes · 21 min read
Building and Optimizing JD Retail OLAP Platform: Architecture, Real‑time Updates, Materialized Views, and Join Optimization
DataFunSummit
DataFunSummit
May 14, 2022 · Databases

Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer

This article presents the cloud‑native redesign of ClickHouse, covering its current technical limitations in storage and computation, the proposed storage‑compute separation with DDL task management, multi‑replica and CommitLog mechanisms, and a new MPP query layer to meet future data‑warehouse demands such as real‑time analytics, flexibility, high throughput, low cost, and support for semi‑structured data.

Big DataCloud NativeDistributed Query
0 likes · 15 min read
Design of Cloud‑Native ClickHouse: Architecture, Storage‑Compute Separation, and MPP Query Layer
vivo Internet Technology
vivo Internet Technology
Apr 27, 2022 · Big Data

ClickHouse Funnel Analysis Model Practice - User Behavior Analysis Series (Part 2)

The second article in the user‑behavior series explains ClickHouse‑based funnel analysis, covering unordered and ordered models, configuration, computation, and storage phases, key ClickHouse functions such as windowFunnel and array utilities, detailed SQL examples, and optimization strategies for real‑time, low‑cost querying.

ClickHouse functionsData ConversionFunnel Analysis
0 likes · 17 min read
ClickHouse Funnel Analysis Model Practice - User Behavior Analysis Series (Part 2)
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 16, 2022 · Databases

ClickHouse Practical Guide: Engine Selection, Cluster Architecture, and Operational Best Practices

This article provides a comprehensive overview of ClickHouse, covering its core use cases, detailed explanations of the various table engines, recommended schema and deployment patterns, performance‑tuning parameters, tooling choices, and operational guidelines for building and maintaining high‑availability OLAP clusters.

Cluster ArchitectureOLAPTable Engines
0 likes · 24 min read
ClickHouse Practical Guide: Engine Selection, Cluster Architecture, and Operational Best Practices
ByteDance Data Platform
ByteDance Data Platform
Apr 15, 2022 · Cloud Native

How ByteHouse Evolved From ClickHouse Into a Next‑Gen Cloud‑Native Data Warehouse

ByteHouse, born from ByteDance’s extensive use of ClickHouse, transformed a high‑performance OLAP engine into a cloud‑native, scalable data warehouse by addressing scalability, elasticity, high availability, and multi‑tenant challenges through architectural redesign, custom storage layers, and advanced metadata management.

Big DataByteHouseScalability
0 likes · 19 min read
How ByteHouse Evolved From ClickHouse Into a Next‑Gen Cloud‑Native Data Warehouse
Volcano Engine Developer Services
Volcano Engine Developer Services
Apr 14, 2022 · Databases

How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse

This article explores ByteHouse’s evolution from ClickHouse within ByteDance, detailing the challenges of scaling to over 18,000 nodes, the architectural redesign for cloud‑native elasticity, high‑availability innovations, and the product’s roadmap toward a Snowflake‑like, multi‑tenant data warehouse solution.

ByteHouseDatabase Engineeringclickhouse
0 likes · 18 min read
How ByteHouse Transformed ClickHouse into a Cloud‑Native Data Warehouse
Cloud Native Technology Community
Cloud Native Technology Community
Apr 13, 2022 · Big Data

Introduction to ClickHouse: Features, Architecture, Installation, Data Types, and Cluster Deployment

This article provides a comprehensive overview of ClickHouse, an open‑source column‑oriented MPP analytical database, covering its advantages and drawbacks, key features, typical use cases, data access flow, installation steps, core directories, indexes, data types, database and table engines, as well as detailed cluster architecture and deployment patterns.

Big DataClusterData Types
0 likes · 29 min read
Introduction to ClickHouse: Features, Architecture, Installation, Data Types, and Cluster Deployment
DataFunTalk
DataFunTalk
Apr 13, 2022 · Databases

Adopting StarRocks for Real‑Time Analytics in ZhongAn’s JiZhi Platform: A Performance Comparison with ClickHouse

This article describes how ZhongAn Insurance’s JiZhi data‑analysis platform migrated from ClickHouse to the MPP OLAP engine StarRocks, detailing the business requirements, architectural challenges, benchmark results across single‑table and multi‑table queries, and the resulting improvements in latency, concurrency, and operational simplicity for real‑time analytics.

Big DataOLAPPerformance Testing
0 likes · 14 min read
Adopting StarRocks for Real‑Time Analytics in ZhongAn’s JiZhi Platform: A Performance Comparison with ClickHouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 11, 2022 · Big Data

Real-Time Data Warehouse Construction: Background, Objectives, Architecture, and Case Studies

This article explains the growing demand for real‑time data warehouses, outlines their objectives and layered architecture, and presents detailed case studies from Didi, Kuaishou, Tencent, Youzan and others, illustrating design choices, implementation challenges, and best practices for building scalable streaming data platforms.

FlinkKafkabig-data
0 likes · 48 min read
Real-Time Data Warehouse Construction: Background, Objectives, Architecture, and Case Studies
ITPUB
ITPUB
Apr 8, 2022 · Big Data

How to Build a Billion-Scale Real-Time Data Warehouse with ClickHouse

This article explains how a large‑scale advertising platform replaced its slow offline data‑warehouse with a ClickHouse‑based real‑time warehouse, covering data source integration, performance comparison, materialized views, projections, schema management, and cost‑effective hot‑cold storage strategies.

Kafka IntegrationMaterialized ViewsProjections
0 likes · 19 min read
How to Build a Billion-Scale Real-Time Data Warehouse with ClickHouse
JD Tech
JD Tech
Apr 8, 2022 · Big Data

Designing a High‑Throughput Log Collection System with ClickHouse, UDP and Compression

The article analyses the massive cost and performance challenges of traditional log‑storage pipelines at JD.com, then proposes a streamlined architecture that eliminates disk and MQ stages, uses in‑memory buffering, UDP transport, Snappy/ZSTD compression, and ClickHouse storage to achieve multi‑gigabyte per‑second throughput with over 70% hardware cost reduction.

Distributed SystemsHigh Throughputclickhouse
0 likes · 15 min read
Designing a High‑Throughput Log Collection System with ClickHouse, UDP and Compression
StarRocks
StarRocks
Apr 7, 2022 · Databases

How StarRocks Outperformed ClickHouse in Real‑Time Insurance Data Analytics

This article presents a technical case study of ZhongAn's Jizhi analytics platform, detailing how switching from ClickHouse to the MPP OLAP engine StarRocks resolved multi‑concurrency and join performance bottlenecks, improved real‑time query speed, and enabled near‑billion‑row data handling for insurance business operations.

Insurance TechnologyOLAPStarRocks
0 likes · 17 min read
How StarRocks Outperformed ClickHouse in Real‑Time Insurance Data Analytics
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 30, 2022 · Databases

Understanding ClickHouse AggregatingMergeTree, AggregateFunction, and Materialized Views

This article explains how ClickHouse's AggregatingMergeTree engine uses the special AggregateFunction data type to pre‑aggregate data, demonstrates table creation, data insertion, and querying with state and merge functions, and shows how to combine it with materialized views for efficient analytics.

AggregateFunctionAggregatingMergeTreeMaterializedView
0 likes · 8 min read
Understanding ClickHouse AggregatingMergeTree, AggregateFunction, and Materialized Views
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 23, 2022 · Databases

ClickHouse SQL Fundamentals: CREATE, TABLE, Views, ALTER, Partitioning, Import/Export, and Mutation Operations

This article provides a comprehensive guide to ClickHouse SQL, covering database creation, table definitions, column defaults, temporary and partitioned tables, view types, DDL commands, data import/export formats, and mutation‑based update and delete operations with practical code examples.

MutationPartitioningViews
0 likes · 17 min read
ClickHouse SQL Fundamentals: CREATE, TABLE, Views, ALTER, Partitioning, Import/Export, and Mutation Operations
DeWu Technology
DeWu Technology
Mar 21, 2022 · Big Data

Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse

The article describes a real‑time customer‑service dashboard built on Flink for streaming MySQL changes captured via Kafka, which cleans and aggregates ~60 operational metrics before writing them to ClickHouse’s MergeTree/ReplacingMergeTree tables, enabling sub‑second queries and exactly‑once guarantees while separating offline and live pipelines.

DashboardFlinkclickhouse
0 likes · 18 min read
Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse
DataFunTalk
DataFunTalk
Mar 15, 2022 · Big Data

Bilibili's Billion‑Scale Data Synchronization Using Apache SeaTunnel

This article details Bilibili's implementation of a hundred‑terabyte‑per‑day data synchronization pipeline, covering tool selection between DataX‑based Rider and SeaTunnel‑based AlterEgo, architecture design, performance tuning, logging optimization, rate‑limiting strategies, and comprehensive monitoring for large‑scale offline data ingestion and export.

Apache SeaTunnelBig DataTiDB
0 likes · 13 min read
Bilibili's Billion‑Scale Data Synchronization Using Apache SeaTunnel
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 14, 2022 · Databases

Understanding ClickHouse-Keeper: Features, Configuration, Commands, and Migration from ZooKeeper

ClickHouse‑Keeper, a C++‑based ZooKeeper replacement using the Raft algorithm, offers linearizable reads, compression, and easier deployment; this article explains its advantages, configuration template, startup command, parameter details, health checks, and step‑by‑step migration from ZooKeeper using the ClickHouse‑Keeper‑Converter tool.

ConfigurationKeeperRaft
0 likes · 6 min read
Understanding ClickHouse-Keeper: Features, Configuration, Commands, and Migration from ZooKeeper
StarRocks
StarRocks
Mar 10, 2022 · Databases

StarRocks 2.0 vs ClickHouse: Benchmark Shows Up to 7× Speed Boost

Community testing of StarRocks 2.0 revealed that, across multiple benchmarks—including low‑cardinality queries, SSB workloads, and high‑concurrency scenarios—StarRocks consistently outperformed ClickHouse and Druid, delivering performance gains ranging from 2‑3× to over 7×.

SSBStarRocksbenchmark
0 likes · 6 min read
StarRocks 2.0 vs ClickHouse: Benchmark Shows Up to 7× Speed Boost
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 10, 2022 · Databases

Understanding ClickHouse Replication Mechanism

This article explains the ClickHouse replication mechanism, covering the Replication engine family, table‑level operation, Zookeeper dependency, data synchronization, insert quorum, and data consistency guarantees, providing practical guidance for configuring and using replicated MergeTree tables.

MergeTreeReplicationclickhouse
0 likes · 7 min read
Understanding ClickHouse Replication Mechanism
Yiche Technology
Yiche Technology
Mar 9, 2022 · Cloud Native

Design and Implementation of the Yunji Logging System Using Flink and ClickHouse

The article presents the Yunji logging system, a Flink+ClickHouse-based cloud-native platform for real-time ingestion, storage, querying, analysis, and monitoring of massive heterogeneous logs, covering its architecture, configuration center, storage design, processing flow, monitoring features, and future enhancements.

Cloud NativeFlinkJanino
0 likes · 21 min read
Design and Implementation of the Yunji Logging System Using Flink and ClickHouse