Tagged articles
468 articles
Page 1 of 5
DataFunTalk
DataFunTalk
May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink
0 likes · 22 min read
How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
May 6, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's four‑stage data‑platform evolution—from a simple ClickHouse ad‑hoc setup to a Lambda‑based 2.0 design and finally a lakehouse‑driven 3.0 architecture—highlighting the adoption of general incremental compute, cost‑reduction to one‑third, performance gains of up to ten‑fold, and the SPOT standards that guide the new system.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
Apr 29, 2026 · Big Data

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based analytics stack to a unified lakehouse with generic incremental compute, cutting architecture complexity, resource cost, and development effort by roughly one‑third while supporting petabyte‑scale, sub‑second queries across its 350 million‑user app.

Big DataClickHouseData Architecture
0 likes · 22 min read
How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era
Baidu Geek Talk
Baidu Geek Talk
Mar 23, 2026 · Databases

How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture

This article analyzes the challenges of scaling ClickHouse within Baidu’s MEG data platform and details a lake‑house solution that decouples storage and compute, integrates a meta‑service for transparent data access, optimizes query performance through caching, data roll‑up and layout tuning, and introduces a unified query gateway that gracefully falls back to Spark for complex workloads.

ClickHouseData PlatformLakehouse
0 likes · 25 min read
How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture
Tech Freedom Circle
Tech Freedom Circle
Mar 17, 2026 · Databases

Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It

The article dissects an Alibaba interview question on counting 100 million daily active users, showing why HyperLogLog’s error and lack of per‑user state make it unsuitable, and presents a detailed Bitmap‑based architecture—including sharding, pre‑computation, and ClickHouse integration—to achieve precise, high‑performance analytics.

BitmapClickHouseDailyActiveUsers
0 likes · 16 min read
Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It
DeWu Technology
DeWu Technology
Feb 9, 2026 · Big Data

How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry

This article presents a production‑grade Flink ClickHouse sink that solves common pain points such as lack of size‑based batching, static table schemas, and distributed‑table latency by introducing data‑size batching, dynamic table routing, local‑table writes, load‑balanced node discovery, back‑pressure queues, dual‑trigger flush, and recursive retry with node exclusion, all integrated with Flink checkpoint semantics for at‑least‑once guarantees.

BatchingCheckpointClickHouse
0 likes · 25 min read
How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry
ITPUB
ITPUB
Feb 9, 2026 · Databases

ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink

Using a 600k‑record IP range dataset, we built identical tables in ClickHouse and Doris, and a Redis skip‑list store, then ran three Flink‑Kafka streaming jobs to compare query latency across the three databases under varying traffic rates, revealing Redis as fastest, ClickHouse second, Doris slowest.

ClickHouseDatabase PerformanceFlink
0 likes · 8 min read
ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink
ITPUB
ITPUB
Jan 15, 2026 · Databases

How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested

Facing a ClickHouse cluster shutdown, the author explores three migration methods—using Doris’s ClickHouse catalog, exporting to files with Broker/Stream Load, and Spark—to transfer ~10 billion rows to Doris, evaluating each for simplicity, bugs, and performance, and sharing detailed steps, code snippets, and benchmark results.

ClickHouseData MigrationSQL
0 likes · 9 min read
How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested
Xiao Liu Lab
Xiao Liu Lab
Dec 30, 2025 · Databases

How to Diagnose and Fix ClickHouse CPU Spikes in Minutes

This guide walks you through a step‑by‑step process for quickly identifying the cause of high CPU usage in ClickHouse, from emergency triage and precise diagnosis using system tables to practical optimization techniques and a ready‑to‑run monitoring script.

CPUClickHouseSQL
0 likes · 21 min read
How to Diagnose and Fix ClickHouse CPU Spikes in Minutes
ITPUB
ITPUB
Dec 26, 2025 · Databases

How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Strategies

When a ClickHouse cluster needed to be decommissioned, the author evaluated three migration approaches—using Doris' ClickHouse catalog, exporting to files with Broker/Stream Load, and leveraging Spark—to move roughly 100 billion rows to Doris, comparing their complexity, reliability, and performance.

CatalogClickHouseSQL
0 likes · 9 min read
How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Strategies
dbaplus Community
dbaplus Community
Dec 8, 2025 · Databases

Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks

This article presents a systematic benchmark comparing ClickHouse, Doris, and Redis for IP‑range dimension lookups using Flink‑Kafka pipelines, detailing test design, result table schema, query interfaces, and performance results across varying data rates, concluding that Redis offers the fastest and most stable query latency.

ClickHouseDatabase BenchmarkFlink
0 likes · 7 min read
Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks
Data STUDIO
Data STUDIO
Dec 5, 2025 · Big Data

Why Parquet Is the Default Choice for Big Data Storage

The article explains how Apache Parquet’s columnar layout, multi‑level row‑group structure, projection and predicate push‑down, and advanced compression and encoding make it the high‑performance, space‑efficient storage format that powers modern big‑data ecosystems and tools like Spark, Python pandas, and ClickHouse.

Big DataClickHouseColumnar Storage
0 likes · 11 min read
Why Parquet Is the Default Choice for Big Data Storage
Code Ape Tech Column
Code Ape Tech Column
Dec 5, 2025 · Big Data

Optimizing 100K Record Retrieval from 10M‑Row Pools: ClickHouse, ES Scroll, ES+HBase, RediSearch

This article examines several engineering solutions for extracting up to 100,000 records from a ten‑million‑row pool, comparing multi‑threaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑plus‑HBase hybrid, and RediSearch + RedisJSON, and presents performance measurements and practical trade‑offs.

Big DataClickHouseElasticsearch
0 likes · 12 min read
Optimizing 100K Record Retrieval from 10M‑Row Pools: ClickHouse, ES Scroll, ES+HBase, RediSearch
Ray's Galactic Tech
Ray's Galactic Tech
Nov 28, 2025 · Operations

How to Optimize Log Storage: From Centralized to Hot‑Cold Separation

This article explains why modern micro‑service systems need log storage optimization and presents a hot‑cold separation strategy, detailing ELK, Loki, and Kafka + ClickHouse architectures, implementation steps, best practices, and a comparative analysis to guide cost‑effective, high‑performance log management.

ClickHouseELKLoki
0 likes · 7 min read
How to Optimize Log Storage: From Centralized to Hot‑Cold Separation
Ctrip Technology
Ctrip Technology
Nov 27, 2025 · Big Data

How Ctrip Cut Query Latency by 85% with StarRocks’ Compute‑Storage Separation

Ctrip migrated its massive User Behavior Tracking system from ClickHouse to a compute‑storage separated StarRocks cluster on Kubernetes, achieving millisecond‑level query latency, halving storage usage, reducing node count, and sustaining millions‑of‑rows‑per‑second write throughput while simplifying scaling and operations.

Big DataClickHouseCompute-Storage Separation
0 likes · 15 min read
How Ctrip Cut Query Latency by 85% with StarRocks’ Compute‑Storage Separation
ITPUB
ITPUB
Nov 20, 2025 · Operations

What Triggered Cloudflare’s Massive November 2023 Outage? Inside the Bot Management Failure

On November 18, 2023 Cloudflare suffered a multi‑hour network outage that crippled major services worldwide, caused by a ClickHouse permission change that generated oversized bot‑management feature files, leading to 5xx errors across CDN, security, and authentication layers, and prompting a complex, step‑by‑step remediation effort.

Bot ManagementClickHouseCloudflare
0 likes · 19 min read
What Triggered Cloudflare’s Massive November 2023 Outage? Inside the Bot Management Failure
DevOps Coach
DevOps Coach
Nov 13, 2025 · Databases

Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL

ClickHouse 25.10 introduces a suite of enhancements—including 20 JOIN performance upgrades, lazy column replication, Bloom filter runtime filters, disjunction push‑down, automatic column statistics, the QBit vector type, expanded SQL operators, negative LIMIT/OFFSET, Arrow Flight support, and delayed secondary index materialization—backed by detailed benchmarks and contributor acknowledgments.

ClickHouseJOIN optimizationSQL Extensions
0 likes · 23 min read
Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL
Radish, Keep Going!
Radish, Keep Going!
Oct 28, 2025 · Big Data

How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse

Netflix processes over 5 PB of logs daily, handling millions of events per second, and by layering hot and cold storage, using a custom lexer for fingerprinting, native protocol serialization, and sharded tag maps, they reduced query latency from seconds to sub‑second levels with ClickHouse.

Big DataClickHouseDistributed Systems
0 likes · 8 min read
How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse
StarRocks
StarRocks
Oct 14, 2025 · Big Data

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Ctrip's User Behavior Tracking (UBT) system, handling 30 TB of daily data, moved from ClickHouse to StarRocks' compute‑storage separated architecture, cutting average query latency from 1.4 seconds to 203 ms, halving storage, reducing nodes from 50 to 40, and boosting write throughput to 3 million rows per second.

Big DataClickHouseData Migration
0 likes · 15 min read
How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks
Big Data Tech Team
Big Data Tech Team
Oct 12, 2025 · Databases

Why ClickHouse Dominates OLAP: Features, Configurations, Table Engines and Real‑World Use Cases

This article provides an in‑depth technical overview of ClickHouse, covering its OLAP‑focused architecture, key performance features, detailed configuration files, a comprehensive comparison of its many table engines, common troubleshooting tips, and real‑world deployment patterns for recommendation and advertising systems.

ClickHouseDatabase ConfigurationKafka engine
0 likes · 68 min read
Why ClickHouse Dominates OLAP: Features, Configurations, Table Engines and Real‑World Use Cases
JD Tech Talk
JD Tech Talk
Sep 2, 2025 · Databases

Unlock ClickHouse’s Secret Weapons: The 9 Techniques Behind Lightning‑Fast Queries

This article explores ClickHouse’s high‑performance OLAP architecture, covering its MPP design, columnar storage, vectorized execution, pre‑sorting, table engines, data types, sharding and replication strategies, as well as index designs that together enable rapid analysis of massive datasets.

ClickHouseColumnar StorageVectorized Execution
0 likes · 15 min read
Unlock ClickHouse’s Secret Weapons: The 9 Techniques Behind Lightning‑Fast Queries
JD Cloud Developers
JD Cloud Developers
Sep 2, 2025 · Databases

Unlocking ClickHouse’s Lightning‑Fast Queries: The ‘Nine Swords’ Architecture Explained

This article explores ClickHouse’s high‑performance OLAP design—including its MPP architecture, columnar storage, vectorized execution, pre‑sorting, sharding, replication, index strategies, and compute engine—showing how each innovation contributes to ultra‑fast, scalable data analysis in the big‑data era.

ClickHouseColumnar StorageOLAP
0 likes · 14 min read
Unlocking ClickHouse’s Lightning‑Fast Queries: The ‘Nine Swords’ Architecture Explained
Tech Freedom Circle
Tech Freedom Circle
Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

ClickHouseColumnar StorageDistributed Query
0 likes · 29 min read
How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows
Kuaishou Tech
Kuaishou Tech
Jul 31, 2025 · Big Data

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

This article details how Kuaishou’s content middle platform tackled the massive challenges of real‑time, flexible, and cost‑effective data analysis at trillion‑scale by redesigning its architecture, adopting ClickHouse, splitting wide tables, and implementing a scatter‑gather execution model with pre‑shuffle and bitmap optimizations.

Big DataClickHousePerformance Optimization
0 likes · 17 min read
How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics
DataFunSummit
DataFunSummit
Jul 18, 2025 · Databases

Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI

This article explores how ClickHouse is deployed across WeChat for real‑time analytics, introduces a suite of performance‑monitoring tools, details lakehouse read and bitmap optimizations, and describes the integration of AI‑driven vector search, showcasing substantial speedups and scalability improvements.

AIBig DataClickHouse
0 likes · 12 min read
Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI
JD Tech
JD Tech
May 13, 2025 · Databases

Unlock ClickHouse’s Lightning‑Fast Queries: Architecture, Storage, and Index Secrets

This article examines ClickHouse’s high‑performance OLAP design, covering its MPP architecture, columnar storage, vectorized execution, pre‑sorting, table engines, extensive data‑type system, sharding and replication strategies, as well as its sparse and skip‑index mechanisms that together enable ultra‑fast analytics on massive datasets.

Big DataClickHouseColumnar Storage
0 likes · 16 min read
Unlock ClickHouse’s Lightning‑Fast Queries: Architecture, Storage, and Index Secrets
JD Cloud Developers
JD Cloud Developers
Apr 21, 2025 · Databases

How ClickHouse Local Join Cuts Query Time and Memory Usage in Supply‑Chain Planning

This article explains how moving aggregation logic from in‑memory processing to ClickHouse SQL, synchronizing configuration data, and leveraging ClickHouse ReplacingMergeTree tables with local joins dramatically reduces query latency and memory consumption for large‑scale supply‑chain planning workloads.

ClickHouseDatabase EngineeringLocal Join
0 likes · 13 min read
How ClickHouse Local Join Cuts Query Time and Memory Usage in Supply‑Chain Planning
dbaplus Community
dbaplus Community
Apr 20, 2025 · Databases

Why Wide Tables Fail and How to Design Them Efficiently

This article explains what wide tables are, why they are controversial, outlines three common design pitfalls with practical avoidance tips, and introduces three key technologies—ClickHouse, Cassandra, and Hudi/Iceberg—to help engineers build performant, maintainable wide‑table solutions in data warehouses.

Big DataClickHouseDatabase design
0 likes · 7 min read
Why Wide Tables Fail and How to Design Them Efficiently
JD Retail Technology
JD Retail Technology
Apr 8, 2025 · Databases

ClickHouse Architecture and Core Technologies Overview

ClickHouse is an open‑source, massively parallel, column‑oriented OLAP database that integrates its own columnar storage, vectorized batch processing, pre‑sorted data, diverse table engines, extensive data types, sharding with replication, sparse primary‑key and skip indexes, and a multithreaded query engine, delivering high‑throughput real‑time analytics on massive datasets.

Big DataClickHouseColumnar Storage
0 likes · 15 min read
ClickHouse Architecture and Core Technologies Overview
Ops Development Stories
Ops Development Stories
Mar 19, 2025 · Cloud Native

Unified Multi‑Cluster Monitoring with KubeDoor 1.0: Alerts, Metrics & Best Practices

KubeDoor 1.0 introduces a new architecture for unified multi‑Kubernetes monitoring, offering components for master and agent, flexible deployment options, Helm‑based installation, configurable storage and alerting settings, and detailed guidance on integrating with existing Prometheus/VictoriaMetrics setups while providing automatic peak‑usage data collection.

AlertingClickHouseCloud Native
0 likes · 14 min read
Unified Multi‑Cluster Monitoring with KubeDoor 1.0: Alerts, Metrics & Best Practices
DataFunSummit
DataFunSummit
Mar 1, 2025 · Databases

Innovations and Breakthroughs of ClickHouse in Real‑Time OLAP

This article introduces ClickHouse as an open‑source column‑store OLAP database, outlines its core features, explains its distributed and cloud‑native architectures—including SharedMergeTree for serverless operation—presents benchmark results, compares community and enterprise editions, and answers common questions about its future direction.

ClickHouseCloud NativeReal-time OLAP
0 likes · 15 min read
Innovations and Breakthroughs of ClickHouse in Real‑Time OLAP
StarRocks
StarRocks
Feb 27, 2025 · Big Data

How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution

This article details iQIYI's transition from Impala+Kudu and ClickHouse to StarRocks, describing the OLAP architecture, performance gains of up to 400% in advertising workloads, the technical challenges of data consistency, lake‑warehouse fusion, operational scaling, and the step‑by‑step migration process using a dual‑run platform.

ClickHouseFlinkOLAP
0 likes · 15 min read
How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution
Bilibili Tech
Bilibili Tech
Feb 21, 2025 · Databases

Applying ClickHouse Bitmap and BSI Techniques for Real-Time Audience Selection in a Data Management Platform

By integrating ClickHouse bitmap structures, a dictionary service for dense ID mapping, and Bit‑Slice Indexes, Bilibili’s Data Management Platform now supports flexible, multi‑dimensional audience selection and profiling over petabyte‑scale data with minute‑level latency, cutting storage by over twenty‑fold and query times from hours to seconds.

BSIBig DataBitmap
0 likes · 23 min read
Applying ClickHouse Bitmap and BSI Techniques for Real-Time Audience Selection in a Data Management Platform
dbaplus Community
dbaplus Community
Feb 3, 2025 · Databases

How to Diagnose and Fix Extreme ClickHouse Load Spikes in Production

A production ClickHouse cluster suddenly showed blacked‑out dashboards due to CPU load soaring above 2,700%, and this guide walks through step‑by‑step diagnostics using system tables, a simple query to spot heavy SQL, and practical remediation actions to restore normal load levels.

ClickHouseDatabase PerformanceSQL Optimization
0 likes · 7 min read
How to Diagnose and Fix Extreme ClickHouse Load Spikes in Production
BirdNest Tech Talk
BirdNest Tech Talk
Jan 31, 2025 · Information Security

Building a Go TCP Scanner to Discover Unauthenticated ClickHouse Services

This article walks through creating a Go‑based TCP SYN scanner to locate public IPs with port 9000 open, verifies whether they run ClickHouse without authentication, and shares the full code, command‑line steps, and scan results that reveal only a handful of vulnerable instances.

ClickHouseGoTCP scanning
0 likes · 16 min read
Building a Go TCP Scanner to Discover Unauthenticated ClickHouse Services
dbaplus Community
dbaplus Community
Jan 5, 2025 · Big Data

How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation

DeWu’s observability platform faced scalability, cost, and operational challenges from petabyte‑scale trace data, prompting a shift to a storage‑compute separated architecture that leverages AutoMQ’s Kafka‑compatible service and ClickHouse Enterprise’s SharedMergeTree engine, ultimately achieving up to 50% cost reduction and five‑fold cold‑read performance gains.

AutoMQBig DataClickHouse
0 likes · 20 min read
How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation
ITPUB
ITPUB
Jan 3, 2025 · Databases

Why ClickHouse Sharded Table Queries Return Inconsistent Row Counts—and How to Fix It

A ClickHouse cluster showed wildly varying row counts when querying sharded tables, while local tables behaved correctly; the article analyses the root cause in the cluster and table configuration, explains why the inconsistency occurs, and provides a step‑by‑step fix by switching to replicated tables.

ClickHouseQuery InconsistencyReplication
0 likes · 7 min read
Why ClickHouse Sharded Table Queries Return Inconsistent Row Counts—and How to Fix It
dbaplus Community
dbaplus Community
Dec 24, 2024 · Big Data

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

The article details Bilibili's comprehensive redesign of its tag system—including background challenges, architectural layers, technical upgrades like Iceberg integration and shard‑based ClickHouse writes, crowd selection methods, online service guarantees, performance metrics, and future plans—showcasing a data‑driven solution that boosts stability, speed, and business coverage.

ClickHouseIcebergOnline Service
0 likes · 24 min read
How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy
JD Tech Talk
JD Tech Talk
Dec 13, 2024 · Databases

An Introduction to ClickHouse: Columnar Storage, Features, and Use Cases

This article introduces ClickHouse, an open‑source column‑oriented distributed database, explaining its columnar storage model, key performance and scalability features, rich analytical capabilities, and the scenarios where it excels or falls short in big‑data processing.

Big DataClickHouseColumnar Database
0 likes · 6 min read
An Introduction to ClickHouse: Columnar Storage, Features, and Use Cases
JD Cloud Developers
JD Cloud Developers
Dec 13, 2024 · Databases

Why ClickHouse Is Revolutionizing Big Data Analytics with Columnar Storage

ClickHouse, an open‑source column‑oriented distributed database from Yandex, offers high performance, efficient compression, vectorized execution, and scalable architecture, making it ideal for large‑scale analytics, log processing, monitoring, and data warehousing, while noting its limitations in transactions and strong consistency.

ClickHouseColumnar DatabaseData Analytics
0 likes · 5 min read
Why ClickHouse Is Revolutionizing Big Data Analytics with Columnar Storage
Architecture & Thinking
Architecture & Thinking
Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataClickHouseCloud Native
0 likes · 22 min read
How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets
Bilibili Tech
Bilibili Tech
Nov 12, 2024 · Big Data

Scalable Tag System Architecture and Optimization

The rebuilt tag system introduces a three‑layer architecture, standard pipelines, Iceberg‑backed storage and custom ClickHouse sharding, a DSL for crowd selection, and a stateless online service, achieving 99.9% success, sub‑5 ms latency, and supporting thousands of tags across dozens of business scenarios while planning real‑time processing and automated lifecycle management.

ClickHouseIcebergOnline Service
0 likes · 23 min read
Scalable Tag System Architecture and Optimization
macrozheng
macrozheng
Nov 7, 2024 · Backend Development

9 Proven Techniques to Supercharge Pagination Query Performance

This article presents nine practical strategies—including adding default filters, limiting page size, reducing joins, optimizing indexes, using straight_join, archiving data, leveraging count(*), querying ClickHouse, and implementing read‑write splitting—to dramatically improve the speed and scalability of pagination APIs in MySQL‑based back‑ends.

ClickHouseDatabase PerformanceSQL Optimization
0 likes · 11 min read
9 Proven Techniques to Supercharge Pagination Query Performance
BirdNest Tech Talk
BirdNest Tech Talk
Nov 3, 2024 · Databases

Master ClickHouse Write Performance: Proven Optimization Strategies

This comprehensive guide walks through ClickHouse write‑performance optimization, covering hardware choices, system and application‑level tuning, async insert settings, Buffer engine configuration, storage compression, real‑world case studies, monitoring queries, and actionable best‑practice recommendations.

Async InsertBuffer EngineClickHouse
0 likes · 12 min read
Master ClickHouse Write Performance: Proven Optimization Strategies
Baidu Tech Salon
Baidu Tech Salon
Oct 22, 2024 · Big Data

TDE-ClickHouse: Baidu MEG's High-Performance Big Data Analytics Engine

TDE‑ClickHouse, the core engine of Baidu MEG’s Turing 3.0 ecosystem, delivers sub‑second, self‑service analytics on petabyte‑scale data by decoupling compute, adding multi‑level aggregation, high‑cardinality and rule‑based optimizations, a two‑phase bulk‑load pipeline, cloud‑native deployment, and a lightweight meta service, now powering over 350 000 cores, 10 PB storage and more than 150 000 daily BI queries with average response times under three seconds.

ClickHouseDatabase Architecturebig data analytics
0 likes · 19 min read
TDE-ClickHouse: Baidu MEG's High-Performance Big Data Analytics Engine
Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGClickHouseCloud Native
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
Bilibili Tech
Bilibili Tech
Aug 23, 2024 · Big Data

Accelerating Multi‑Dimensional OLAP Queries in ClickHouse with Grouping Sets, RBM, and Dense Dictionary Encoding

To achieve sub‑second, multi‑dimensional analytics on Bilibili’s hundred‑million‑row datasets, the team built a ClickHouse‑based acceleration layer that combines grouping‑set pre‑aggregation, bitmap (RBM) distinct handling, and a dense dictionary encoding service, dramatically cutting CPU, memory and query latency versus traditional OLAP pipelines.

Big DataBitmapClickHouse
0 likes · 28 min read
Accelerating Multi‑Dimensional OLAP Queries in ClickHouse with Grouping Sets, RBM, and Dense Dictionary Encoding
Wukong Talks Architecture
Wukong Talks Architecture
Aug 6, 2024 · Databases

Migrating Tencent Music's Data Infrastructure from ClickHouse and Druid to StarRocks: Strategy, Implementation, and Best Practices

This article details how Tencent Music’s data‑infrastructure team migrated thousands of ClickHouse and Druid nodes to a StarRocks compute‑storage‑separated lakehouse, achieving 40‑50% cost reduction while maintaining query performance, and shares the technical challenges, solutions, and best‑practice recommendations gathered during the process.

ClickHouseCost reductionData Migration
0 likes · 19 min read
Migrating Tencent Music's Data Infrastructure from ClickHouse and Druid to StarRocks: Strategy, Implementation, and Best Practices
DataFunSummit
DataFunSummit
Jul 20, 2024 · Databases

Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation

This article presents TCHouse‑C, a cloud‑native ClickHouse service, detailing its real‑time data update architecture, schema‑less ingestion, various update strategies such as Delete‑Insert and lightweight‑update/delete, and comprehensive performance tests comparing UniqueMergeTree with standard ClickHouse engines across import, query, and update workloads.

ClickHouseData WarehouseDelete-Insert
0 likes · 32 min read
Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation
JD Cloud Developers
JD Cloud Developers
Jul 17, 2024 · Databases

Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained

Explore nine major database technologies—from traditional relational MySQL to NoSQL Redis, columnar HBase and ClickHouse, document-oriented MongoDB, search engine Elasticsearch, graph Neo4j, time‑series Prometheus, and vector Milvus—plus practical best‑practice guides, real‑world polyglot persistence scenarios, and recommended resources for mastering modern data storage.

ClickHouseElasticsearchHBase
0 likes · 50 min read
Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained
JD Tech Talk
JD Tech Talk
Jul 17, 2024 · Databases

A Comprehensive Guide to 9 Database Types and Polyglot Persistence

This article provides an in‑depth overview of nine major database categories—including relational, key‑value, columnar, document, graph, time‑series, and vector databases—detailing their strengths, weaknesses, best practices, and typical application scenarios, and explains how polyglot persistence combines multiple databases for optimal performance and scalability.

ClickHouseElasticsearchHBase
0 likes · 41 min read
A Comprehensive Guide to 9 Database Types and Polyglot Persistence
JD Tech
JD Tech
Jul 15, 2024 · Databases

A Comprehensive Overview of Nine Database Types and Polyglot Persistence Practices

This article provides an in‑depth survey of nine database categories—including relational, key‑value, columnar, document, graph, time‑series, and vector databases—detailing their architectures, advantages, disadvantages, best‑practice recommendations, typical use cases, and how they can be combined in polyglot persistence solutions.

ClickHouseDatabase TypesHBase
0 likes · 41 min read
A Comprehensive Overview of Nine Database Types and Polyglot Persistence Practices
DataFunTalk
DataFunTalk
Jul 11, 2024 · Backend Development

Performance Optimizations and Benchmark Analysis of RaftKeeper v2.1.0

The article presents a detailed engineering analysis of RaftKeeper v2.1.0, describing benchmark methodology, performance gains across create, mixed, and list workloads, and four major optimizations—including response serialization parallelism, list‑request handling, system‑call reduction, thread‑pool redesign, and asynchronous snapshot processing—demonstrating substantial throughput and latency improvements in large‑scale ClickHouse deployments.

CClickHouseRaftKeeper
0 likes · 12 min read
Performance Optimizations and Benchmark Analysis of RaftKeeper v2.1.0
dbaplus Community
dbaplus Community
Jul 10, 2024 · Databases

Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide

This article explains ClickHouse’s columnar, MPP‑based design, block compression, LSM pre‑sorting, sparse and skip‑list indexing, and vectorized execution, while also discussing its high‑frequency write challenges, concurrency limits, and production‑grade issues such as Zookeeper load and resource management.

ClickHouseColumnar DatabaseLSM
0 likes · 11 min read
Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide
Aikesheng Open Source Community
Aikesheng Open Source Community
Jul 9, 2024 · Databases

Resolving ClickHouse “too many mutations” Errors by Cleaning Mutations and Switching to ReplacingMergeTree

The article describes a real‑world ClickHouse incident where excessive UPDATE‑style mutations caused a “too many mutations(1036)” error, explains the cluster’s configuration, and details a step‑by‑step recovery process that clears pending mutations and migrates tables to the ReplacingMergeTree engine to restore service.

ClickHouseReplacingMergeTreeTable Engine
0 likes · 7 min read
Resolving ClickHouse “too many mutations” Errors by Cleaning Mutations and Switching to ReplacingMergeTree
JD Cloud Developers
JD Cloud Developers
Jul 3, 2024 · Big Data

How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse

This article details the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard, covering Flink‑based data pipelines, ClickHouse OLAP storage, metric consistency, stability measures, extensible configuration, and comprehensive monitoring to ensure accurate, scalable performance during major promotions.

Big DataClickHouseDashboard
0 likes · 9 min read
How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse
JD Tech Talk
JD Tech Talk
Jul 3, 2024 · Big Data

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

This article describes the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard using Flink and ClickHouse, covering data processing pipelines, metric consistency, stability mechanisms, extensible configurations, and monitoring techniques to guide similar large‑screen projects.

ClickHouseFlinkReal-time Dashboard
0 likes · 9 min read
Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices
JD Tech
JD Tech
Jul 2, 2024 · Big Data

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

This article presents the design and implementation of a high‑availability, real‑time logistics supply‑chain monitoring dashboard, covering its data processing pipeline with Flink, storage choices between Elasticsearch and ClickHouse, multi‑layer architecture, metric consistency, stability mechanisms, extensibility configurations, and monitoring practices.

Big DataClickHouseDashboard
0 likes · 11 min read
Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design
DataFunTalk
DataFunTalk
Jun 28, 2024 · Big Data

Accelerating Spark with ClickHouse: Native Optimization Techniques and Performance Evaluation

This article presents a comprehensive technical overview of using ClickHouse as a native backend to accelerate Spark SQL execution, covering Spark performance bottlenecks, ClickHouse's CPU‑level optimizations, the design and implementation of the Spark‑Native integration, and detailed TPC‑DS benchmark results demonstrating up to 3.5× speedup.

Big DataClickHouseNative Execution
0 likes · 33 min read
Accelerating Spark with ClickHouse: Native Optimization Techniques and Performance Evaluation
Baidu Geek Talk
Baidu Geek Talk
Jun 24, 2024 · Big Data

Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation

The paper presents a Spark acceleration framework that replaces Java‑based task operators with a ClickHouse native library, converting plans via Protobuf and JNI, leveraging columnar storage, SIMD and JIT to achieve up to 3× speed‑up on TPC‑DS workloads while providing fallback mechanisms to ensure no performance loss.

Big DataClickHouseNative Acceleration
0 likes · 31 min read
Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jun 24, 2024 · Big Data

Boost Spark Performance with ClickHouse: Native Acceleration Techniques

This article presents a detailed technical overview of accelerating Spark's compute engine using ClickHouse as a native backend, covering Spark performance background, ClickHouse's advantages, the design and implementation of a Spark‑Native acceleration solution, and extensive performance evaluation results.

ClickHouseNative AccelerationPerformance Optimization
0 likes · 34 min read
Boost Spark Performance with ClickHouse: Native Acceleration Techniques
DataFunTalk
DataFunTalk
Jun 9, 2024 · Big Data

Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration

This article details how the WeChat team leverages ClickHouse at massive scale, introduces a suite of performance observation tools, describes lakehouse reading and bitmap optimizations, and explains the integration of AI workloads, demonstrating overall query speedups of up to tenfold across diverse scenarios.

Big DataBitmapClickHouse
0 likes · 10 min read
Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration
ITPUB
ITPUB
Jun 9, 2024 · Databases

Doris vs ClickHouse: Which Database Fits Your Workload?

This article compares Doris and ClickHouse across architecture, table creation, ecosystem integration, management tools, query performance, and join capabilities, offering practical guidance on how to choose the right database based on your specific data processing and operational requirements.

ClickHouseData WarehouseSQL
0 likes · 10 min read
Doris vs ClickHouse: Which Database Fits Your Workload?
ITPUB
ITPUB
May 26, 2024 · Cloud Native

Containerizing Elasticsearch & ClickHouse on Kubernetes: Bilibili’s Scalable, Low‑Cost Solution

This article details Bilibili’s journey of containerizing Elasticsearch and ClickHouse on Kubernetes, covering the challenges of stateful services, architectural decisions, custom operators, storage and network solutions, deployment steps, observability enhancements, and the resulting cost, quality, and efficiency gains.

ClickHouseCloud NativeElasticsearch
0 likes · 38 min read
Containerizing Elasticsearch & ClickHouse on Kubernetes: Bilibili’s Scalable, Low‑Cost Solution
ITPUB
ITPUB
May 21, 2024 · Databases

Can ClickHouse Distributed Tables Outperform Single-Node Tables? A Real-World Benchmark

This article presents a systematic benchmark comparing ClickHouse local (single‑node) tables and distributed tables across three data volumes—≈60 billion, 5 billion and 50 million rows—using a variety of aggregation and filter queries, and reveals that distributed tables dominate at large scale while the gap narrows as the dataset shrinks.

BenchmarkClickHouseDistributed Tables
0 likes · 13 min read
Can ClickHouse Distributed Tables Outperform Single-Node Tables? A Real-World Benchmark
vivo Internet Technology
vivo Internet Technology
Apr 17, 2024 · Big Data

Retention Analysis Model Practice Based on ClickHouse

The article explains retention analysis models, their importance for user loyalty, outlines offline Hive architecture, then shows how ClickHouse’s retention() function and columnar storage dramatically speed up multi‑day retention calculations, providing SQL examples and practical guidance for product analytics.

ClickHouseHiveRetention Analysis
0 likes · 17 min read
Retention Analysis Model Practice Based on ClickHouse
ITPUB
ITPUB
Apr 11, 2024 · Big Data

Query 100K Items from 10M+ Records: CK, ES Scroll, HBase, RediSearch

When faced with a business requirement to filter up to 100 000 records from a pool of tens of millions and then sort and de‑duplicate them, this article explores four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, a combined Elasticsearch‑HBase approach, and RediSearch with RedisJSON—detailing their design, implementation, performance testing, and trade‑offs.

Big DataClickHouseElasticsearch
0 likes · 12 min read
Query 100K Items from 10M+ Records: CK, ES Scroll, HBase, RediSearch
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Apr 11, 2024 · Backend Development

Design and Implementation of an Online Configurable Data Consumption Service for NetEase Cloud Music Frontend Performance Monitoring (Corona)

The article details NetEase Cloud Music’s end‑to‑end, online‑configurable data‑consumption service and schema‑driven visualization platform that transform raw client logs into ClickHouse records, automatically generate tables and dashboards, and provide observability, dramatically reducing manual effort while supporting over twenty performance metrics for frontend monitoring.

ClickHousePerformance Monitoringdata pipeline
0 likes · 17 min read
Design and Implementation of an Online Configurable Data Consumption Service for NetEase Cloud Music Frontend Performance Monitoring (Corona)
dbaplus Community
dbaplus Community
Apr 8, 2024 · Cloud Native

Containerizing Elasticsearch & ClickHouse on Kubernetes: Challenges & Solutions

Facing the complexities of running stateful services like Elasticsearch and ClickHouse in production, Bilibili’s infrastructure team detailed their migration to Kubernetes, describing the architectural design, custom operators, storage provisioning with LVM, network configuration, high‑availability strategies, observability, and the resulting cost, quality, and efficiency gains.

ClickHouseCloudNativeElasticsearch
0 likes · 37 min read
Containerizing Elasticsearch & ClickHouse on Kubernetes: Challenges & Solutions
Practical DevOps Architecture
Practical DevOps Architecture
Apr 4, 2024 · Databases

ClickHouse Training Course Overview and Curriculum

This article introduces a comprehensive ClickHouse training program that covers fundamental concepts, architecture, installation, distributed cluster design, data import, performance tuning, and includes a detailed list of 33 video modules and additional recommended reading resources for large‑scale data analytics.

Big DataClickHouseColumnar Database
0 likes · 4 min read
ClickHouse Training Course Overview and Curriculum
Tencent Cloud Developer
Tencent Cloud Developer
Apr 2, 2024 · Backend Development

tRPC Scaffolding Tooling and Observability Best Practices for Tencent Docs Backend

By introducing the unified tRPC scaffolding tool trpcx and embedding OpenTelemetry‑generated observability configurations, the Tencent Docs backend team streamlined service creation, standardized directory structures, migrated metrics and logs to ClickHouse for cost‑effective performance, and established best‑practice workflows that dramatically improve development speed and fault‑diagnosis efficiency.

Backend DevelopmentClickHouseOpenTelemetry
0 likes · 18 min read
tRPC Scaffolding Tooling and Observability Best Practices for Tencent Docs Backend
dbaplus Community
dbaplus Community
Mar 19, 2024 · Big Data

How JD’s Mini‑Program Data Center Powers Real‑Time Analytics and Monitoring

JD’s Mini‑Program Data Center integrates data collection, storage, and real‑time analysis using Flink, ClickHouse, and Elasticsearch to provide comprehensive monitoring, user behavior insights, and scalable analytics for mini‑programs across JD’s ecosystem, enabling precise operations and future AI‑driven enhancements.

ClickHouseData centerElasticsearch
0 likes · 19 min read
How JD’s Mini‑Program Data Center Powers Real‑Time Analytics and Monitoring
dbaplus Community
dbaplus Community
Mar 12, 2024 · Databases

How Didi Scaled Log Search by Replacing Elasticsearch with ClickHouse

Facing PB‑scale daily logs and costly Elasticsearch bottlenecks, Didi redesigned its log‑search architecture by migrating to ClickHouse, detailing the challenges, storage redesign, cluster upgrades, performance optimizations, stability fixes, and the resulting cost reduction and query speed gains.

ClickHouseDistributed Systemselasticsearch migration
0 likes · 15 min read
How Didi Scaled Log Search by Replacing Elasticsearch with ClickHouse
Bilibili Tech
Bilibili Tech
Mar 12, 2024 · Cloud Native

Containerizing Elasticsearch and ClickHouse on Kubernetes: Architecture, Implementation, and Benefits

Bilibili migrated its Elasticsearch and ClickHouse clusters to Kubernetes using custom operators, CRDs, LVM‑based local storage, MacVLAN networking, and pod anti‑affinity, achieving higher resource utilization, isolation, and automation that cut server count, reduced latency spikes, and lowered operational costs dramatically.

ClickHouseElasticsearchKubernetes
0 likes · 38 min read
Containerizing Elasticsearch and ClickHouse on Kubernetes: Architecture, Implementation, and Benefits
Linux Code Review Hub
Linux Code Review Hub
Mar 11, 2024 · Databases

How Didi Built a Next‑Gen Log Storage System with ClickHouse

Didi migrated its massive PB‑scale log data from Elasticsearch to ClickHouse, redesigning storage with separate Log and Trace clusters, optimizing partition and sorting keys, introducing native TCP connectors, and revamping HDFS cold‑hot separation, achieving up to four‑fold query speed gains and 30% lower hardware costs.

ClickHouseDistributed SystemsFlink
0 likes · 15 min read
How Didi Built a Next‑Gen Log Storage System with ClickHouse
Didi Tech
Didi Tech
Mar 5, 2024 · Databases

Migrating Didi's Log Retrieval from Elasticsearch to ClickHouse: Architecture, Challenges, and Performance Optimizations

Didi replaced its Elasticsearch‑based log platform with ClickHouse, redesigning architecture into isolated Log and Trace clusters, using hourly‑partitioned MergeTree tables and aggregating views to handle petabyte‑scale writes, diverse low‑latency queries, and high QPS, achieving over 400 nodes, 40 GB/s throughput, 30 % cost savings and four‑fold query latency reduction.

Big DataClickHouseElasticsearch
0 likes · 15 min read
Migrating Didi's Log Retrieval from Elasticsearch to ClickHouse: Architecture, Challenges, and Performance Optimizations
Volcano Engine Developer Services
Volcano Engine Developer Services
Feb 29, 2024 · Big Data

How MetaApp Cut Data Warehouse Costs by 50% with ByConity

MetaApp replaced ClickHouse with the open‑source cloud‑native data warehouse ByConity, achieving over 50% cost reduction and faster, more stable OLAP queries by separating storage and compute, simplifying scaling, and improving resource utilization across a range of analytics workloads such as deduplication, retention, conversion and point‑lookup.

ByConityClickHouseCost reduction
0 likes · 13 min read
How MetaApp Cut Data Warehouse Costs by 50% with ByConity