Tagged articles

ClickHouse

482 articles · Page 1 of 5

Jun 30, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu, with over 3.5 billion monthly users and daily logs in the trillions, migrated 500 PB of data to Alibaba Cloud and iterated its data platform through four architecture generations—ClickHouse‑based ad‑hoc, Lambda, Lakehouse, and a unified incremental compute model—cutting resource, development, and storage costs to one‑third while delivering sub‑10‑second query latency at petabyte scale.

Big DataClickHouseData Architecture

0 likes · 22 min read

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

DataFunTalk

Jun 24, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu, with over 350 million monthly users and daily logs in the billions, migrated its data platform from AWS to Alibaba Cloud and iterated four times—from a ClickHouse‑based ad‑hoc layer to a Lambda architecture and finally a Lakehouse with incremental compute—cutting architecture complexity, resource cost and development effort each to about one‑third while delivering second‑level analytics on trillion‑scale data.

Big DataClickHouseData Architecture

0 likes · 22 min read

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

DataFunTalk

Jun 20, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's step‑by‑step migration from a simple ClickHouse‑based analytics stack to a Lambda‑style 2.0 architecture and finally to a Lakehouse‑based 3.0 design, highlighting concrete performance numbers, cost reductions, and the definition of a generic incremental‑compute model (SPOT) that underpins the evolution.

Big DataClickHouseData Architecture

0 likes · 22 min read

DataFunTalk

May 28, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse with generic incremental compute, cutting architecture complexity, resource and development costs by one‑third while delivering second‑level queries over trillions of rows.

Big DataClickHouseData Architecture

0 likes · 21 min read

DataFunTalk

May 22, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

The article details Xiaohongshu's evolution from a simple ClickHouse‑based analytics layer to a Lambda‑enabled 2.0 stack and finally a Lakehouse‑based 3.0 architecture, showing how each iteration reduced infrastructure complexity, resource consumption and development effort by roughly one‑third while supporting trillions of daily events and AI‑driven use cases.

Big DataClickHouseData Architecture

0 likes · 21 min read

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

dbaplus Community

May 20, 2026 · Databases

Stunning SQL Queries: From Tetris Game to Real‑Time Funnels

This article showcases a collection of impressive SQL queries—including a PostgreSQL Tetris implemented with a recursive CTE, window‑function session analysis, a ClickHouse real‑time funnel, dynamic WHERE clause generation, and a recursive employee hierarchy—while discussing performance tips and engine choices.

ClickHouseData WarehouseHive

0 likes · 25 min read

Stunning SQL Queries: From Tetris Game to Real‑Time Funnels

DataFunTalk

May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink

0 likes · 22 min read

DataFunTalk

May 6, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's four‑stage data‑platform evolution—from a simple ClickHouse ad‑hoc setup to a Lambda‑based 2.0 design and finally a lakehouse‑driven 3.0 architecture—highlighting the adoption of general incremental compute, cost‑reduction to one‑third, performance gains of up to ten‑fold, and the SPOT standards that guide the new system.

Big DataClickHouseData Architecture

0 likes · 21 min read

DataFunTalk

Apr 29, 2026 · Big Data

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based analytics stack to a unified lakehouse with generic incremental compute, cutting architecture complexity, resource cost, and development effort by roughly one‑third while supporting petabyte‑scale, sub‑second queries across its 350 million‑user app.

Big DataClickHouseData Architecture

0 likes · 22 min read

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Baidu Geek Talk

Mar 23, 2026 · Databases

How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture

This article analyzes the challenges of scaling ClickHouse within Baidu’s MEG data platform and details a lake‑house solution that decouples storage and compute, integrates a meta‑service for transparent data access, optimizes query performance through caching, data roll‑up and layout tuning, and introduces a unified query gateway that gracefully falls back to Spark for complex workloads.

ClickHouseData PlatformLakehouse

0 likes · 25 min read

How Baidu’s MEG Platform Revamped ClickHouse with a Lakehouse Architecture

Tech Freedom Circle

Mar 17, 2026 · Databases

Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It

The article dissects an Alibaba interview question on counting 100 million daily active users, showing why HyperLogLog’s error and lack of per‑user state make it unsuitable, and presents a detailed Bitmap‑based architecture—including sharding, pre‑computation, and ClickHouse integration—to achieve precise, high‑performance analytics.

ClickHouseDailyActiveUsersHyperLogLog

0 likes · 16 min read

Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It

dbaplus Community

Mar 12, 2026 · Databases

How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Approaches

This article walks through three concrete methods for moving massive ClickHouse datasets—up to 100 billion rows—to Doris, detailing catalog integration, file export with stream load, and Spark‑based pipelines, while sharing real‑world performance results and pitfalls.

Apache DorisClickHouseData Migration

0 likes · 8 min read

How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Approaches

Big Data Technology & Architecture

Mar 6, 2026 · Big Data

What’s New in Big Data Frameworks? ClickHouse, Fluss, Delta Lake, StarRocks & More (Mar 2026)

This roundup compiles the latest releases across major data platforms—including ClickHouse, Apache Fluss, Delta Lake, StarRocks, Apache Pulsar and DolphinScheduler—highlighting version numbers, key feature additions, security fixes, and emerging trends shaping the big‑data ecosystem.

Apache FlussBig DataClickHouse

0 likes · 19 min read

What’s New in Big Data Frameworks? ClickHouse, Fluss, Delta Lake, StarRocks & More (Mar 2026)

DeWu Technology

Feb 9, 2026 · Big Data

How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry

This article presents a production‑grade Flink ClickHouse sink that solves common pain points such as lack of size‑based batching, static table schemas, and distributed‑table latency by introducing data‑size batching, dynamic table routing, local‑table writes, load‑balanced node discovery, back‑pressure queues, dual‑trigger flush, and recursive retry with node exclusion, all integrated with Flink checkpoint semantics for at‑least‑once guarantees.

BatchingCheckpointClickHouse

0 likes · 25 min read

How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry

ITPUB

Feb 9, 2026 · Databases

ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink

Using a 600k‑record IP range dataset, we built identical tables in ClickHouse and Doris, and a Redis skip‑list store, then ran three Flink‑Kafka streaming jobs to compare query latency across the three databases under varying traffic rates, revealing Redis as fastest, ClickHouse second, Doris slowest.

ClickHouseDatabase PerformanceDoris

0 likes · 8 min read

ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink

ITPUB

Jan 15, 2026 · Databases

How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested

Facing a ClickHouse cluster shutdown, the author explores three migration methods—using Doris’s ClickHouse catalog, exporting to files with Broker/Stream Load, and Spark—to transfer ~10 billion rows to Doris, evaluating each for simplicity, bugs, and performance, and sharing detailed steps, code snippets, and benchmark results.

ClickHouseData MigrationDoris

0 likes · 9 min read

How to Migrate ClickHouse Data to Doris: Three Practical Strategies Tested

Xiao Liu Lab

Dec 30, 2025 · Databases

How to Diagnose and Fix ClickHouse CPU Spikes in Minutes

This guide walks you through a step‑by‑step process for quickly identifying the cause of high CPU usage in ClickHouse, from emergency triage and precise diagnosis using system tables to practical optimization techniques and a ready‑to‑run monitoring script.

CPUClickHouseSQL

0 likes · 21 min read

How to Diagnose and Fix ClickHouse CPU Spikes in Minutes

ITPUB

Dec 26, 2025 · Databases

How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Strategies

When a ClickHouse cluster needed to be decommissioned, the author evaluated three migration approaches—using Doris' ClickHouse catalog, exporting to files with Broker/Stream Load, and leveraging Spark—to move roughly 100 billion rows to Doris, comparing their complexity, reliability, and performance.

CatalogClickHouseDoris

0 likes · 9 min read

How to Migrate 100 Billion ClickHouse Rows to Doris: Three Practical Strategies

dbaplus Community

Dec 8, 2025 · Databases

Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks

This article presents a systematic benchmark comparing ClickHouse, Doris, and Redis for IP‑range dimension lookups using Flink‑Kafka pipelines, detailing test design, result table schema, query interfaces, and performance results across varying data rates, concluding that Redis offers the fastest and most stable query latency.

ClickHouseDatabase BenchmarkDoris

0 likes · 7 min read

Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks

Data STUDIO

Dec 5, 2025 · Big Data

Why Parquet Is the Default Choice for Big Data Storage

The article explains how Apache Parquet’s columnar layout, multi‑level row‑group structure, projection and predicate push‑down, and advanced compression and encoding make it the high‑performance, space‑efficient storage format that powers modern big‑data ecosystems and tools like Spark, Python pandas, and ClickHouse.

Big DataClickHouseColumnar Storage

0 likes · 11 min read

Why Parquet Is the Default Choice for Big Data Storage

Code Ape Tech Column

Dec 5, 2025 · Big Data

Optimizing 100K Record Retrieval from 10M‑Row Pools: ClickHouse, ES Scroll, ES+HBase, RediSearch

This article examines several engineering solutions for extracting up to 100,000 records from a ten‑million‑row pool, comparing multi‑threaded ClickHouse pagination, Elasticsearch scroll‑scan, an ES‑plus‑HBase hybrid, and RediSearch + RedisJSON, and presents performance measurements and practical trade‑offs.

Big DataClickHouseElasticsearch

0 likes · 12 min read

Optimizing 100K Record Retrieval from 10M‑Row Pools: ClickHouse, ES Scroll, ES+HBase, RediSearch

Ray's Galactic Tech

Nov 28, 2025 · Operations

How to Optimize Log Storage: From Centralized to Hot‑Cold Separation

This article explains why modern micro‑service systems need log storage optimization and presents a hot‑cold separation strategy, detailing ELK, Loki, and Kafka + ClickHouse architectures, implementation steps, best practices, and a comparative analysis to guide cost‑effective, high‑performance log management.

ClickHouseELKhot-cold separation

0 likes · 7 min read

How to Optimize Log Storage: From Centralized to Hot‑Cold Separation

Ctrip Technology

Nov 27, 2025 · Big Data

How Ctrip Cut Query Latency by 85% with StarRocks’ Compute‑Storage Separation

Ctrip migrated its massive User Behavior Tracking system from ClickHouse to a compute‑storage separated StarRocks cluster on Kubernetes, achieving millisecond‑level query latency, halving storage usage, reducing node count, and sustaining millions‑of‑rows‑per‑second write throughput while simplifying scaling and operations.

Big DataClickHouseCompute-Storage Separation

0 likes · 15 min read

How Ctrip Cut Query Latency by 85% with StarRocks’ Compute‑Storage Separation

ITPUB

Nov 20, 2025 · Operations

What Triggered Cloudflare’s Massive November 2023 Outage? Inside the Bot Management Failure

On November 18, 2023 Cloudflare suffered a multi‑hour network outage that crippled major services worldwide, caused by a ClickHouse permission change that generated oversized bot‑management feature files, leading to 5xx errors across CDN, security, and authentication layers, and prompting a complex, step‑by‑step remediation effort.

Bot ManagementClickHouseCloudflare

0 likes · 19 min read

What Triggered Cloudflare’s Massive November 2023 Outage? Inside the Bot Management Failure

Architect's Guide

Nov 20, 2025 · Operations

What Caused Cloudflare’s Half‑Internet Outage? A Deep Dive into the Technical Failure

Cloudflare suffered a massive multi‑hour outage that knocked offline popular sites and AI services, traced to a sudden traffic spike, a mis‑configured Rust‑based bot‑management module, and a database permission change that doubled a feature file size, overwhelming its routing software.

CDNClickHouseCloudflare

0 likes · 12 min read

What Caused Cloudflare’s Half‑Internet Outage? A Deep Dive into the Technical Failure

dbaplus Community

Nov 19, 2025 · Operations

Why Did Cloudflare’s Global Outage Happen on Nov 18 2025? Inside the Bot Management Bug

On the night of November 18 2025, Cloudflare suffered a worldwide outage that crippled services like ChatGPT, X, Spotify, and major gaming platforms, and a detailed post‑mortem reveals that a ClickHouse permission change caused an oversized bot‑management configuration file to crash edge nodes.

Bot ManagementCDNClickHouse

0 likes · 9 min read

Why Did Cloudflare’s Global Outage Happen on Nov 18 2025? Inside the Bot Management Bug

DevOps Coach

Nov 13, 2025 · Databases

Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL

ClickHouse 25.10 introduces a suite of enhancements—including 20 JOIN performance upgrades, lazy column replication, Bloom filter runtime filters, disjunction push‑down, automatic column statistics, the QBit vector type, expanded SQL operators, negative LIMIT/OFFSET, Arrow Flight support, and delayed secondary index materialization—backed by detailed benchmarks and contributor acknowledgments.

ClickHouseSQL Extensionsdatabase

0 likes · 23 min read

Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL

Radish, Keep Going!

Oct 28, 2025 · Big Data

How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse

Netflix processes over 5 PB of logs daily, handling millions of events per second, and by layering hot and cold storage, using a custom lexer for fingerprinting, native protocol serialization, and sharded tag maps, they reduced query latency from seconds to sub‑second levels with ClickHouse.

Big DataClickHouseLog Analytics

0 likes · 8 min read

How Netflix Achieved Petabyte-Scale, Sub-Second Log Queries with ClickHouse

Xiao Liu Lab

Oct 23, 2025 · Databases

How to Install and Configure ClickHouse on Rocky Linux/CentOS with Remote Access

This step‑by‑step guide shows how to add the Yandex repository, install ClickHouse server and client on Rocky Linux or CentOS, configure the service, test local connections, create databases, enable remote access, and verify the setup, all within 15‑20 minutes.

ClickHouseInstallationLinux

0 likes · 8 min read

How to Install and Configure ClickHouse on Rocky Linux/CentOS with Remote Access

StarRocks

Oct 14, 2025 · Big Data

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Ctrip's User Behavior Tracking (UBT) system, handling 30 TB of daily data, moved from ClickHouse to StarRocks' compute‑storage separated architecture, cutting average query latency from 1.4 seconds to 203 ms, halving storage, reducing nodes from 50 to 40, and boosting write throughput to 3 million rows per second.

Big DataClickHouseData Migration

0 likes · 15 min read

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Big Data Tech Team

Oct 12, 2025 · Databases

Why ClickHouse Dominates OLAP: Features, Configurations, Table Engines and Real‑World Use Cases

This article provides an in‑depth technical overview of ClickHouse, covering its OLAP‑focused architecture, key performance features, detailed configuration files, a comprehensive comparison of its many table engines, common troubleshooting tips, and real‑world deployment patterns for recommendation and advertising systems.

ClickHouseDatabase ConfigurationKafka Engine

0 likes · 68 min read

Why ClickHouse Dominates OLAP: Features, Configurations, Table Engines and Real‑World Use Cases

JD Tech Talk

Sep 2, 2025 · Databases

Unlock ClickHouse’s Secret Weapons: The 9 Techniques Behind Lightning‑Fast Queries

This article explores ClickHouse’s high‑performance OLAP architecture, covering its MPP design, columnar storage, vectorized execution, pre‑sorting, table engines, data types, sharding and replication strategies, as well as index designs that together enable rapid analysis of massive datasets.

ClickHouseColumnar StorageVectorized Execution

0 likes · 15 min read

Unlock ClickHouse’s Secret Weapons: The 9 Techniques Behind Lightning‑Fast Queries

JD Cloud Developers

Sep 2, 2025 · Databases

Unlocking ClickHouse’s Lightning‑Fast Queries: The ‘Nine Swords’ Architecture Explained

This article explores ClickHouse’s high‑performance OLAP design—including its MPP architecture, columnar storage, vectorized execution, pre‑sorting, sharding, replication, index strategies, and compute engine—showing how each innovation contributes to ultra‑fast, scalable data analysis in the big‑data era.

ClickHouseColumnar StorageOLAP

0 likes · 14 min read

Unlocking ClickHouse’s Lightning‑Fast Queries: The ‘Nine Swords’ Architecture Explained

Tech Freedom Circle

Sep 1, 2025 · Databases

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

This article explains ClickHouse’s core architecture—including its storage‑compute integration, MPP parallelism, columnar storage, vectorized execution, data pre‑sorting, table engines, sparse and auxiliary indexes, and the two‑stage aggregation pipeline—then walks through the exact GROUP BY execution flow for both local and distributed tables, illustrating each step with diagrams, SQL demos, and code snippets.

ClickHouseColumnar StorageDistributed Query

0 likes · 29 min read

How ClickHouse Executes GROUP BY and Handles Real‑Time Analytics on Billions of Rows

Kuaishou Tech

Jul 31, 2025 · Big Data

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

This article details how Kuaishou’s content middle platform tackled the massive challenges of real‑time, flexible, and cost‑effective data analysis at trillion‑scale by redesigning its architecture, adopting ClickHouse, splitting wide tables, and implementing a scatter‑gather execution model with pre‑shuffle and bitmap optimizations.

Big DataClickHousePerformance Optimization

0 likes · 17 min read

How Kuaishou Overcame the ‘Impossible Triangle’ of Performance, Flexibility, and Cost in Real‑Time Big Data Analytics

DataFunSummit

Jul 18, 2025 · Databases

Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI

This article explores how ClickHouse is deployed across WeChat for real‑time analytics, introduces a suite of performance‑monitoring tools, details lakehouse read and bitmap optimizations, and describes the integration of AI‑driven vector search, showcasing substantial speedups and scalability improvements.

AIBig DataClickHouse

0 likes · 12 min read

Boosting ClickHouse on WeChat: Performance Tools, Lakehouse Hacks & AI

Architect's Guide

Jul 16, 2025 · Big Data

Efficient 100K‑Record Queries on 10M‑Scale Data: ClickHouse, ES Scroll, ES+HBase

To retrieve up to 100 000 items from a pool of tens of millions, the article compares multi‑threaded ClickHouse pagination, Elasticsearch scroll‑scan deep paging, a combined ES‑HBase approach, and RediSearch + RedisJSON, detailing design, implementation code, performance benchmarks, and trade‑offs.

ClickHouseHBaseQuery Optimization

0 likes · 11 min read

Efficient 100K‑Record Queries on 10M‑Scale Data: ClickHouse, ES Scroll, ES+HBase

Code Wrench

Jul 4, 2025 · Frontend Development

Boost Web Performance: Leveraging navigator.sendBeacon and Kafka for Efficient Data Transmission

This article explains how to improve front‑end data transfer using the browser's navigator.sendBeacon API and how to streamline back‑end processing with Go, Kafka, and ClickHouse to achieve higher performance and stability in feature‑heavy web applications.

ClickHousebackendfrontend

0 likes · 8 min read

Boost Web Performance: Leveraging navigator.sendBeacon and Kafka for Efficient Data Transmission

JD Tech

May 13, 2025 · Databases

Unlock ClickHouse’s Lightning‑Fast Queries: Architecture, Storage, and Index Secrets

This article examines ClickHouse’s high‑performance OLAP design, covering its MPP architecture, columnar storage, vectorized execution, pre‑sorting, table engines, extensive data‑type system, sharding and replication strategies, as well as its sparse and skip‑index mechanisms that together enable ultra‑fast analytics on massive datasets.

Big DataClickHouseColumnar Storage

0 likes · 16 min read

Unlock ClickHouse’s Lightning‑Fast Queries: Architecture, Storage, and Index Secrets

Code Ape Tech Column

Apr 22, 2025 · Big Data

Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide for Enterprise Data Analytics

This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, then provides a detailed deployment guide for Zookeeper, Kafka, Filebeat, and ClickHouse clusters, including troubleshooting steps and cost analysis for enterprise data analytics.

ClickHouseElasticsearchcost analysis

0 likes · 15 min read

Elasticsearch vs ClickHouse: Performance, Cost, and Deployment Guide for Enterprise Data Analytics

JD Tech Talk

Apr 21, 2025 · Databases

Optimizing Supply Chain Planning Systems: ClickHouse ReplacingMergeTree and Local Join Solutions

This article discusses solutions to system bottlenecks in supply chain planning business development, focusing on ClickHouse ReplacingMergeTree table creation, local join optimization, and real-time data synchronization between TiDB and ClickHouse to improve query performance and system stability.

ClickHouseData synchronizationLocal Join

0 likes · 10 min read

Optimizing Supply Chain Planning Systems: ClickHouse ReplacingMergeTree and Local Join Solutions

JD Cloud Developers

Apr 21, 2025 · Databases

How ClickHouse Local Join Cuts Query Time and Memory Usage in Supply‑Chain Planning

This article explains how moving aggregation logic from in‑memory processing to ClickHouse SQL, synchronizing configuration data, and leveraging ClickHouse ReplacingMergeTree tables with local joins dramatically reduces query latency and memory consumption for large‑scale supply‑chain planning workloads.

ClickHouseDatabase EngineeringLocal Join

0 likes · 13 min read

How ClickHouse Local Join Cuts Query Time and Memory Usage in Supply‑Chain Planning

dbaplus Community

Apr 20, 2025 · Databases

Why Wide Tables Fail and How to Design Them Efficiently

This article explains what wide tables are, why they are controversial, outlines three common design pitfalls with practical avoidance tips, and introduces three key technologies—ClickHouse, Cassandra, and Hudi/Iceberg—to help engineers build performant, maintainable wide‑table solutions in data warehouses.

Big DataCassandraClickHouse

0 likes · 7 min read

Why Wide Tables Fail and How to Design Them Efficiently

JD Retail Technology

Apr 8, 2025 · Databases

ClickHouse Architecture and Core Technologies Overview

ClickHouse is an open‑source, massively parallel, column‑oriented OLAP database that integrates its own columnar storage, vectorized batch processing, pre‑sorted data, diverse table engines, extensive data types, sharding with replication, sparse primary‑key and skip indexes, and a multithreaded query engine, delivering high‑throughput real‑time analytics on massive datasets.

Big DataClickHouseColumnar Storage

0 likes · 15 min read

ClickHouse Architecture and Core Technologies Overview

Ops Development Stories

Mar 19, 2025 · Cloud Native

Unified Multi‑Cluster Monitoring with KubeDoor 1.0: Alerts, Metrics & Best Practices

KubeDoor 1.0 introduces a new architecture for unified multi‑Kubernetes monitoring, offering components for master and agent, flexible deployment options, Helm‑based installation, configurable storage and alerting settings, and detailed guidance on integrating with existing Prometheus/VictoriaMetrics setups while providing automatic peak‑usage data collection.

AlertingClickHouseCloud Native

0 likes · 14 min read

Unified Multi‑Cluster Monitoring with KubeDoor 1.0: Alerts, Metrics & Best Practices

StarRocks

Mar 4, 2025 · Databases

How NAVER Boosted Query Performance and Scalability by Migrating from ClickHouse to StarRocks

NAVER migrated its massive analytics platform from ClickHouse to StarRocks, achieving dramatic improvements in multi‑table JOIN performance, real‑time aggregation speed, and horizontal scalability while simplifying data integration across heterogeneous sources on a Kubernetes‑based architecture.

ClickHouseMaterialized ViewsStarRocks

0 likes · 13 min read

How NAVER Boosted Query Performance and Scalability by Migrating from ClickHouse to StarRocks

DataFunSummit

Mar 1, 2025 · Databases

Innovations and Breakthroughs of ClickHouse in Real‑Time OLAP

This article introduces ClickHouse as an open‑source column‑store OLAP database, outlines its core features, explains its distributed and cloud‑native architectures—including SharedMergeTree for serverless operation—presents benchmark results, compares community and enterprise editions, and answers common questions about its future direction.

ClickHouseCloud NativeReal-time OLAP

0 likes · 15 min read

Innovations and Breakthroughs of ClickHouse in Real‑Time OLAP

StarRocks

Feb 27, 2025 · Big Data

How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution

This article details iQIYI's transition from Impala+Kudu and ClickHouse to StarRocks, describing the OLAP architecture, performance gains of up to 400% in advertising workloads, the technical challenges of data consistency, lake‑warehouse fusion, operational scaling, and the step‑by‑step migration process using a dual‑run platform.

ClickHouseFlinkOLAP

0 likes · 15 min read

How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution

Bilibili Tech

Feb 21, 2025 · Databases

Applying ClickHouse Bitmap and BSI Techniques for Real-Time Audience Selection in a Data Management Platform

By integrating ClickHouse bitmap structures, a dictionary service for dense ID mapping, and Bit‑Slice Indexes, Bilibili’s Data Management Platform now supports flexible, multi‑dimensional audience selection and profiling over petabyte‑scale data with minute‑level latency, cutting storage by over twenty‑fold and query times from hours to seconds.

BSIBig DataClickHouse

0 likes · 23 min read

Applying ClickHouse Bitmap and BSI Techniques for Real-Time Audience Selection in a Data Management Platform

dbaplus Community

Feb 3, 2025 · Databases

How to Diagnose and Fix Extreme ClickHouse Load Spikes in Production

A production ClickHouse cluster suddenly showed blacked‑out dashboards due to CPU load soaring above 2,700%, and this guide walks through step‑by‑step diagnostics using system tables, a simple query to spot heavy SQL, and practical remediation actions to restore normal load levels.

ClickHouseDatabase PerformanceSystem Tables

0 likes · 7 min read

How to Diagnose and Fix Extreme ClickHouse Load Spikes in Production

BirdNest Tech Talk

Jan 31, 2025 · Information Security

Building a Go TCP Scanner to Discover Unauthenticated ClickHouse Services

This article walks through creating a Go‑based TCP SYN scanner to locate public IPs with port 9000 open, verifies whether they run ClickHouse without authentication, and shares the full code, command‑line steps, and scan results that reveal only a handful of vulnerable instances.

ClickHouseGoTCP scanning

0 likes · 16 min read

Building a Go TCP Scanner to Discover Unauthenticated ClickHouse Services

dbaplus Community

Jan 5, 2025 · Big Data

How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation

DeWu’s observability platform faced scalability, cost, and operational challenges from petabyte‑scale trace data, prompting a shift to a storage‑compute separated architecture that leverages AutoMQ’s Kafka‑compatible service and ClickHouse Enterprise’s SharedMergeTree engine, ultimately achieving up to 50% cost reduction and five‑fold cold‑read performance gains.

AutoMQBig DataClickHouse

0 likes · 20 min read

How DeWu Halved Observability Costs Using AutoMQ and ClickHouse Storage‑Compute Separation

ITPUB

Jan 3, 2025 · Databases

Why ClickHouse Sharded Table Queries Return Inconsistent Row Counts—and How to Fix It

A ClickHouse cluster showed wildly varying row counts when querying sharded tables, while local tables behaved correctly; the article analyses the root cause in the cluster and table configuration, explains why the inconsistency occurs, and provides a step‑by‑step fix by switching to replicated tables.

ClickHouseQuery InconsistencySharding

0 likes · 7 min read

Why ClickHouse Sharded Table Queries Return Inconsistent Row Counts—and How to Fix It

JD Cloud Developers

Dec 26, 2024 · Databases

How ClickHouse Powers Billion‑User Tagging with Efficient Bitmap Storage

This article explains how ClickHouse’s columnar storage, compression, and bitmap functions enable fast, scalable processing of billions of user tags and groups in a CDP, covering data storage design, bitmap generation, and distributed query optimization.

CDPClickHouseColumnar Database

0 likes · 11 min read

How ClickHouse Powers Billion‑User Tagging with Efficient Bitmap Storage

StarRocks

Dec 25, 2024 · Databases

Cutting Costs 40% and Halving Query Latency: Our ClickHouse‑to‑StarRocks Migration

Facing high costs and scaling limits with ClickHouse, we migrated a 4000‑core, 500TB OLAP workload to StarRocks, achieving 40% cost reduction, 50% storage savings, and up to 30× query speedups through storage‑compute separation, materialized‑view rewrites, and extensive performance tuning.

ClickHouseMaterialized ViewsOLAP

0 likes · 18 min read

Cutting Costs 40% and Halving Query Latency: Our ClickHouse‑to‑StarRocks Migration

dbaplus Community

Dec 24, 2024 · Big Data

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

The article details Bilibili's comprehensive redesign of its tag system—including background challenges, architectural layers, technical upgrades like Iceberg integration and shard‑based ClickHouse writes, crowd selection methods, online service guarantees, performance metrics, and future plans—showcasing a data‑driven solution that boosts stability, speed, and business coverage.

ClickHouseData EngineeringDistributed Computing

0 likes · 24 min read

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

JD Tech Talk

Dec 13, 2024 · Databases

An Introduction to ClickHouse: Columnar Storage, Features, and Use Cases

This article introduces ClickHouse, an open‑source column‑oriented distributed database, explaining its columnar storage model, key performance and scalability features, rich analytical capabilities, and the scenarios where it excels or falls short in big‑data processing.

Big DataClickHouseColumnar Database

0 likes · 6 min read

An Introduction to ClickHouse: Columnar Storage, Features, and Use Cases

JD Cloud Developers

Dec 13, 2024 · Databases

Why ClickHouse Is Revolutionizing Big Data Analytics with Columnar Storage

ClickHouse, an open‑source column‑oriented distributed database from Yandex, offers high performance, efficient compression, vectorized execution, and scalable architecture, making it ideal for large‑scale analytics, log processing, monitoring, and data warehousing, while noting its limitations in transactions and strong consistency.

ClickHouseColumnar Databasedata analytics

0 likes · 5 min read

Why ClickHouse Is Revolutionizing Big Data Analytics with Columnar Storage

Architecture & Thinking

Nov 15, 2024 · Databases

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

This article explains how Baidu’s TDE‑ClickHouse, as a core engine of the Turing 3.0 ecosystem, overcomes platform fragmentation, quality issues, and usability challenges through the OneData+ development paradigm, multi‑level aggregation, projection, query‑caching, bulk‑load ingestion, and a cloud‑native architecture to achieve sub‑second query response for massive data volumes.

Big DataClickHouseCloud Native

0 likes · 22 min read

How Baidu’s TDE‑ClickHouse Delivers Sub‑Second Analytics on Billion‑Row Datasets

Bilibili Tech

Nov 12, 2024 · Big Data

Scalable Tag System Architecture and Optimization

The rebuilt tag system introduces a three‑layer architecture, standard pipelines, Iceberg‑backed storage and custom ClickHouse sharding, a DSL for crowd selection, and a stateless online service, achieving 99.9% success, sub‑5 ms latency, and supporting thousands of tags across dozens of business scenarios while planning real‑time processing and automated lifecycle management.

ClickHouseIcebergOnline Service

0 likes · 23 min read

Scalable Tag System Architecture and Optimization

macrozheng

Nov 7, 2024 · Backend Development

9 Proven Techniques to Supercharge Pagination Query Performance

This article presents nine practical strategies—including adding default filters, limiting page size, reducing joins, optimizing indexes, using straight_join, archiving data, leveraging count(*), querying ClickHouse, and implementing read‑write splitting—to dramatically improve the speed and scalability of pagination APIs in MySQL‑based back‑ends.

ClickHouseDatabase PerformanceIndexing

0 likes · 11 min read

9 Proven Techniques to Supercharge Pagination Query Performance

BirdNest Tech Talk

Nov 3, 2024 · Databases

Master ClickHouse Write Performance: Proven Optimization Strategies

This comprehensive guide walks through ClickHouse write‑performance optimization, covering hardware choices, system and application‑level tuning, async insert settings, Buffer engine configuration, storage compression, real‑world case studies, monitoring queries, and actionable best‑practice recommendations.

Async InsertBuffer EngineClickHouse

0 likes · 12 min read

Master ClickHouse Write Performance: Proven Optimization Strategies

Programmer XiaoFu

Oct 30, 2024 · Databases

How to Boost Pagination Queries for a Million Products by 10×

This article walks through nine practical techniques—default filters, smaller page sizes, fewer joins, index tuning, straight_join, data archiving, efficient count(*), ClickHouse offloading, and read/write splitting—to dramatically improve the performance of pagination APIs handling millions of product records.

ClickHouseIndex OptimizationSQL

0 likes · 11 min read

How to Boost Pagination Queries for a Million Products by 10×

Code Ape Tech Column

Oct 24, 2024 · Databases

Elasticsearch vs ClickHouse: Performance Comparison, Cost Analysis, and Deployment Guide

This article compares Elasticsearch and ClickHouse in terms of write throughput, query speed, and server cost, provides a cost analysis, and offers step‑by‑step deployment instructions for Zookeeper, Kafka, FileBeat, and ClickHouse, including troubleshooting tips and configuration examples.

ClickHouseElasticsearchZookeeper

0 likes · 13 min read

Elasticsearch vs ClickHouse: Performance Comparison, Cost Analysis, and Deployment Guide

Baidu Tech Salon

Oct 22, 2024 · Big Data

TDE-ClickHouse: Baidu MEG's High-Performance Big Data Analytics Engine

TDE‑ClickHouse, the core engine of Baidu MEG’s Turing 3.0 ecosystem, delivers sub‑second, self‑service analytics on petabyte‑scale data by decoupling compute, adding multi‑level aggregation, high‑cardinality and rule‑based optimizations, a two‑phase bulk‑load pipeline, cloud‑native deployment, and a lightweight meta service, now powering over 350 000 cores, 10 PB storage and more than 150 000 daily BI queries with average response times under three seconds.

ClickHouseDatabase ArchitectureQuery Optimization

0 likes · 19 min read

TDE-ClickHouse: Baidu MEG's High-Performance Big Data Analytics Engine

Baidu Geek Talk

Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGClickHouseCloud Native

0 likes · 19 min read

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Senior Tony

Sep 19, 2024 · Databases

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

This article compares ClickHouse and MySQL by examining benchmark results, MPP architecture, columnar storage, compression techniques, vectorized execution, and index designs, showing why ClickHouse delivers dramatically higher query performance on massive data sets.

ClickHouseColumnar StorageDatabases

0 likes · 8 min read

Why ClickHouse Outperforms MySQL: Deep Dive into Architecture and Benchmarks

Bilibili Tech

Aug 23, 2024 · Big Data

Accelerating Multi‑Dimensional OLAP Queries in ClickHouse with Grouping Sets, RBM, and Dense Dictionary Encoding

To achieve sub‑second, multi‑dimensional analytics on Bilibili’s hundred‑million‑row datasets, the team built a ClickHouse‑based acceleration layer that combines grouping‑set pre‑aggregation, bitmap (RBM) distinct handling, and a dense dictionary encoding service, dramatically cutting CPU, memory and query latency versus traditional OLAP pipelines.

Big DataClickHouseData Warehouse

0 likes · 28 min read

Accelerating Multi‑Dimensional OLAP Queries in ClickHouse with Grouping Sets, RBM, and Dense Dictionary Encoding

Wukong Talks Architecture

Aug 6, 2024 · Databases

Migrating Tencent Music's Data Infrastructure from ClickHouse and Druid to StarRocks: Strategy, Implementation, and Best Practices

This article details how Tencent Music’s data‑infrastructure team migrated thousands of ClickHouse and Druid nodes to a StarRocks compute‑storage‑separated lakehouse, achieving 40‑50% cost reduction while maintaining query performance, and shares the technical challenges, solutions, and best‑practice recommendations gathered during the process.

ClickHouseData MigrationDruid

0 likes · 19 min read

Migrating Tencent Music's Data Infrastructure from ClickHouse and Druid to StarRocks: Strategy, Implementation, and Best Practices

Past Memory Big Data

Aug 2, 2024 · Big Data

How Haijing Tech Built a Real-Time Telecom Analytics Platform with ByConity

Haijing Technology faced Hadoop's real‑time limits and ClickHouse's operational pain points, so it adopted the open‑source ByConity platform, which provides a unified table engine, fast multi‑table joins, and seamless scaling to deliver a carrier‑grade real‑time analytics solution.

Big DataByConityClickHouse

0 likes · 11 min read

How Haijing Tech Built a Real-Time Telecom Analytics Platform with ByConity

DataFunTalk

Jul 25, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Challenges, Solutions, and Future Outlook

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda architecture, its shortcomings, the integration of a data‑lake‑based solution, iterative redesigns, technical trade‑offs, and future directions for real‑time analytics.

ClickHouseFlinkHudi

0 likes · 25 min read

Real‑time Data Warehouse Evolution with Data Lake: Challenges, Solutions, and Future Outlook

DataFunSummit

Jul 20, 2024 · Databases

Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation

This article presents TCHouse‑C, a cloud‑native ClickHouse service, detailing its real‑time data update architecture, schema‑less ingestion, various update strategies such as Delete‑Insert and lightweight‑update/delete, and comprehensive performance tests comparing UniqueMergeTree with standard ClickHouse engines across import, query, and update workloads.

ClickHouseData WarehouseDelete-Insert

0 likes · 32 min read

Real-time Data Update Solutions in TCHouse‑C: Architecture, Schema‑less Design, and Performance Evaluation

JD Cloud Developers

Jul 17, 2024 · Databases

Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained

Explore nine major database technologies—from traditional relational MySQL to NoSQL Redis, columnar HBase and ClickHouse, document-oriented MongoDB, search engine Elasticsearch, graph Neo4j, time‑series Prometheus, and vector Milvus—plus practical best‑practice guides, real‑world polyglot persistence scenarios, and recommended resources for mastering modern data storage.

ClickHouseDatabasesElasticsearch

0 likes · 50 min read

Choosing the Right Database: MySQL, Redis, HBase, ClickHouse, MongoDB, Elasticsearch, Neo4j, Prometheus & Milvus Explained

JD Tech Talk

Jul 17, 2024 · Databases

A Comprehensive Guide to 9 Database Types and Polyglot Persistence

This article provides an in‑depth overview of nine major database categories—including relational, key‑value, columnar, document, graph, time‑series, and vector databases—detailing their strengths, weaknesses, best practices, and typical application scenarios, and explains how polyglot persistence combines multiple databases for optimal performance and scalability.

ClickHouseDatabasesElasticsearch

0 likes · 41 min read

A Comprehensive Guide to 9 Database Types and Polyglot Persistence

JD Tech

Jul 15, 2024 · Databases

A Comprehensive Overview of Nine Database Types and Polyglot Persistence Practices

This article provides an in‑depth survey of nine database categories—including relational, key‑value, columnar, document, graph, time‑series, and vector databases—detailing their architectures, advantages, disadvantages, best‑practice recommendations, typical use cases, and how they can be combined in polyglot persistence solutions.

ClickHouseDatabase TypesHBase

0 likes · 41 min read

A Comprehensive Overview of Nine Database Types and Polyglot Persistence Practices

DataFunTalk

Jul 11, 2024 · Backend Development

Performance Optimizations and Benchmark Analysis of RaftKeeper v2.1.0

The article presents a detailed engineering analysis of RaftKeeper v2.1.0, describing benchmark methodology, performance gains across create, mixed, and list workloads, and four major optimizations—including response serialization parallelism, list‑request handling, system‑call reduction, thread‑pool redesign, and asynchronous snapshot processing—demonstrating substantial throughput and latency improvements in large‑scale ClickHouse deployments.

C#ClickHouseRaftKeeper

0 likes · 12 min read

Performance Optimizations and Benchmark Analysis of RaftKeeper v2.1.0

dbaplus Community

Jul 10, 2024 · Databases

Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide

This article explains ClickHouse’s columnar, MPP‑based design, block compression, LSM pre‑sorting, sparse and skip‑list indexing, and vectorized execution, while also discussing its high‑frequency write challenges, concurrency limits, and production‑grade issues such as Zookeeper load and resource management.

ClickHouseColumnar DatabaseIndexing

0 likes · 11 min read

Why ClickHouse Dominates OLAP Performance: An In‑Depth Architecture Guide

Aikesheng Open Source Community

Jul 9, 2024 · Databases

Resolving ClickHouse “too many mutations” Errors by Cleaning Mutations and Switching to ReplacingMergeTree

The article describes a real‑world ClickHouse incident where excessive UPDATE‑style mutations caused a “too many mutations(1036)” error, explains the cluster’s configuration, and details a step‑by‑step recovery process that clears pending mutations and migrates tables to the ReplacingMergeTree engine to restore service.

ClickHouseReplacingMergeTreeTable Engine

0 likes · 7 min read

Resolving ClickHouse “too many mutations” Errors by Cleaning Mutations and Switching to ReplacingMergeTree

JD Cloud Developers

Jul 3, 2024 · Big Data

How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse

This article details the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard, covering Flink‑based data pipelines, ClickHouse OLAP storage, metric consistency, stability measures, extensible configuration, and comprehensive monitoring to ensure accurate, scalable performance during major promotions.

Big DataClickHouseFlink

0 likes · 9 min read

How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse

JD Tech Talk

Jul 3, 2024 · Big Data

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

This article describes the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard using Flink and ClickHouse, covering data processing pipelines, metric consistency, stability mechanisms, extensible configurations, and monitoring techniques to guide similar large‑screen projects.

ClickHouseFlinkStability

0 likes · 9 min read

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

JD Tech

Jul 2, 2024 · Big Data

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

This article presents the design and implementation of a high‑availability, real‑time logistics supply‑chain monitoring dashboard, covering its data processing pipeline with Flink, storage choices between Elasticsearch and ClickHouse, multi‑layer architecture, metric consistency, stability mechanisms, extensibility configurations, and monitoring practices.

Big DataClickHouseElasticsearch

0 likes · 11 min read

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

DataFunTalk

Jun 28, 2024 · Big Data

Accelerating Spark with ClickHouse: Native Optimization Techniques and Performance Evaluation

This article presents a comprehensive technical overview of using ClickHouse as a native backend to accelerate Spark SQL execution, covering Spark performance bottlenecks, ClickHouse's CPU‑level optimizations, the design and implementation of the Spark‑Native integration, and detailed TPC‑DS benchmark results demonstrating up to 3.5× speedup.

Big DataClickHousePerformance Optimization

0 likes · 33 min read

Accelerating Spark with ClickHouse: Native Optimization Techniques and Performance Evaluation

Baidu Geek Talk

Jun 24, 2024 · Big Data

Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation

The paper presents a Spark acceleration framework that replaces Java‑based task operators with a ClickHouse native library, converting plans via Protobuf and JNI, leveraging columnar storage, SIMD and JIT to achieve up to 3× speed‑up on TPC‑DS workloads while providing fallback mechanisms to ensure no performance loss.

Big DataClickHouseNative Acceleration

0 likes · 31 min read

Accelerating Spark with ClickHouse Native Techniques: Design, Implementation, and Performance Evaluation

Baidu Intelligent Cloud Tech Hub

Jun 24, 2024 · Big Data

Boost Spark Performance with ClickHouse: Native Acceleration Techniques

This article presents a detailed technical overview of accelerating Spark's compute engine using ClickHouse as a native backend, covering Spark performance background, ClickHouse's advantages, the design and implementation of a Spark‑Native acceleration solution, and extensive performance evaluation results.

ClickHouseNative AccelerationPerformance Optimization

0 likes · 34 min read

Boost Spark Performance with ClickHouse: Native Acceleration Techniques

JD Tech Talk

Jun 14, 2024 · Artificial Intelligence

Building a Retrieval‑Augmented Generation (RAG) System with JD Cloud Docs, ClickHouse, LangChain, and FastAPI

This guide explains how to build a Retrieval‑Augmented Generation (RAG) system using JD Cloud documentation as a knowledge base, storing document embeddings in ClickHouse, leveraging LangChain for vector retrieval, and exposing query and answer services via FastAPI and a Gradio UI.

AIClickHouseFastAPI

0 likes · 13 min read

Building a Retrieval‑Augmented Generation (RAG) System with JD Cloud Docs, ClickHouse, LangChain, and FastAPI

JD Cloud Developers

Jun 14, 2024 · Artificial Intelligence

Build a Retrieval‑Augmented Generation (RAG) System Using JD Cloud Docs and ClickHouse

This guide walks through creating a Retrieval‑Augmented Generation pipeline that harvests JD Cloud documentation, stores vector embeddings in ClickHouse, and serves queries via FastAPI, LangChain, a Qwen LLM, and a Gradio front‑end.

ClickHouseFastAPILLM

0 likes · 14 min read

Build a Retrieval‑Augmented Generation (RAG) System Using JD Cloud Docs and ClickHouse

DataFunTalk

Jun 9, 2024 · Big Data

Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration

This article details how the WeChat team leverages ClickHouse at massive scale, introduces a suite of performance observation tools, describes lakehouse reading and bitmap optimizations, and explains the integration of AI workloads, demonstrating overall query speedups of up to tenfold across diverse scenarios.

Big DataClickHouseLakehouse

0 likes · 10 min read

Optimizing ClickHouse Performance in WeChat: Observation Tools, Lakehouse Reading, Bitmap Acceleration, and AI Integration

ITPUB

Jun 9, 2024 · Databases

Doris vs ClickHouse: Which Database Fits Your Workload?

This article compares Doris and ClickHouse across architecture, table creation, ecosystem integration, management tools, query performance, and join capabilities, offering practical guidance on how to choose the right database based on your specific data processing and operational requirements.

ClickHouseData WarehouseDatabase Comparison

0 likes · 10 min read

Doris vs ClickHouse: Which Database Fits Your Workload?

DataFunSummit

Jun 8, 2024 · Big Data

Case Study: Building a High‑Performance Advertising Platform with ClickHouse Enterprise

This article presents a detailed case study of how EasyPoint built a scalable, stable advertising platform using ClickHouse Enterprise, covering company background, data architecture with Kafka and Druid, ClickHouse advantages, serverless resource scaling, and extensive performance benchmarks.

Big DataClickHouseData Architecture

0 likes · 11 min read

Case Study: Building a High‑Performance Advertising Platform with ClickHouse Enterprise

ITPUB

May 26, 2024 · Cloud Native

Containerizing Elasticsearch & ClickHouse on Kubernetes: Bilibili’s Scalable, Low‑Cost Solution

This article details Bilibili’s journey of containerizing Elasticsearch and ClickHouse on Kubernetes, covering the challenges of stateful services, architectural decisions, custom operators, storage and network solutions, deployment steps, observability enhancements, and the resulting cost, quality, and efficiency gains.

ClickHouseCloud NativeElasticsearch

0 likes · 38 min read

Containerizing Elasticsearch & ClickHouse on Kubernetes: Bilibili’s Scalable, Low‑Cost Solution

ITPUB

May 21, 2024 · Databases

Can ClickHouse Distributed Tables Outperform Single-Node Tables? A Real-World Benchmark

This article presents a systematic benchmark comparing ClickHouse local (single‑node) tables and distributed tables across three data volumes—≈60 billion, 5 billion and 50 million rows—using a variety of aggregation and filter queries, and reveals that distributed tables dominate at large scale while the gap narrows as the dataset shrinks.

ClickHouseDistributed TablesLocal Tables

0 likes · 13 min read

Can ClickHouse Distributed Tables Outperform Single-Node Tables? A Real-World Benchmark

DataFunSummit

May 18, 2024 · Big Data

Building a User Profile Platform with ClickHouse at 58.com: Architecture and Optimization

This article describes how 58.com designed and implemented a large‑scale user profiling platform using ClickHouse, covering system overview, core modules, major challenges of scale, complexity and performance, and the detailed storage, query, and optimization techniques applied to meet business needs.

Big DataClickHouseData Architecture

0 likes · 11 min read

Sohu Tech Products

Apr 24, 2024 · Big Data

How to Build a ClickHouse‑Powered Retention Analysis Model for User Behavior

This article explains the concepts, formulas, and step‑by‑step implementation of a user‑retention analysis model, covering both Hive‑based offline processing and ClickHouse‑accelerated real‑time queries, complete with SQL examples, architecture diagrams, and practical optimization tips.

Big DataClickHouseData Visualization

0 likes · 19 min read

How to Build a ClickHouse‑Powered Retention Analysis Model for User Behavior

Past Memory Big Data

Apr 23, 2024 · Big Data

ByConity Replaces ClickHouse for OLAP, Cutting Resource Costs Over 50%

MetaApp replaced ClickHouse with the open‑source cloud‑native warehouse ByConity, achieving more than 50% reduction in resource costs while delivering comparable or faster OLAP query performance across distinct, retention, conversion, and point‑lookup workloads, thanks to compute‑storage separation, read/write isolation, and minute‑level elastic scaling.

ByConityClickHouseCloud Native

0 likes · 15 min read

ByConity Replaces ClickHouse for OLAP, Cutting Resource Costs Over 50%

DataFunSummit

Apr 18, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda‑based design, its shortcomings, the transition to a data‑lake‑integrated architecture, iterative improvements, encountered technical and non‑technical issues, and future outlooks.

ClickHouseData LakeFlink

0 likes · 24 min read

Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

vivo Internet Technology

Apr 17, 2024 · Big Data

Retention Analysis Model Practice Based on ClickHouse

The article explains retention analysis models, their importance for user loyalty, outlines offline Hive architecture, then shows how ClickHouse’s retention() function and columnar storage dramatically speed up multi‑day retention calculations, providing SQL examples and practical guidance for product analytics.

ClickHouseHiveRetention Analysis

0 likes · 17 min read

Retention Analysis Model Practice Based on ClickHouse

Open Source Tech Hub

Apr 12, 2024 · Databases

Boost AI Vector Search with MyScaleDB: ClickHouse‑Powered SQL Database

MyScaleDB is a high‑performance, cost‑effective SQL vector database built on ClickHouse that lets developers use familiar SQL to store, index, and search billions of vectors alongside structured data, offering fast, accurate AI retrieval and seamless integration with existing tools.

AIClickHouseMyScaleDB

0 likes · 11 min read

Boost AI Vector Search with MyScaleDB: ClickHouse‑Powered SQL Database

ITPUB

Apr 11, 2024 · Big Data

Query 100K Items from 10M+ Records: CK, ES Scroll, HBase, RediSearch

When faced with a business requirement to filter up to 100 000 records from a pool of tens of millions and then sort and de‑duplicate them, this article explores four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan, a combined Elasticsearch‑HBase approach, and RediSearch with RedisJSON—detailing their design, implementation, performance testing, and trade‑offs.

Big DataClickHouseElasticsearch

0 likes · 12 min read

Query 100K Items from 10M+ Records: CK, ES Scroll, HBase, RediSearch

NetEase Cloud Music Tech Team

Apr 11, 2024 · Backend Development

Design and Implementation of an Online Configurable Data Consumption Service for NetEase Cloud Music Frontend Performance Monitoring (Corona)

The article details NetEase Cloud Music’s end‑to‑end, online‑configurable data‑consumption service and schema‑driven visualization platform that transform raw client logs into ClickHouse records, automatically generate tables and dashboards, and provide observability, dramatically reducing manual effort while supporting over twenty performance metrics for frontend monitoring.

ClickHousedata pipelinefrontend

0 likes · 17 min read

Design and Implementation of an Online Configurable Data Consumption Service for NetEase Cloud Music Frontend Performance Monitoring (Corona)

dbaplus Community

Apr 8, 2024 · Cloud Native

Containerizing Elasticsearch & ClickHouse on Kubernetes: Challenges & Solutions

Facing the complexities of running stateful services like Elasticsearch and ClickHouse in production, Bilibili’s infrastructure team detailed their migration to Kubernetes, describing the architectural design, custom operators, storage provisioning with LVM, network configuration, high‑availability strategies, observability, and the resulting cost, quality, and efficiency gains.

ClickHouseCloudNativeElasticsearch

0 likes · 37 min read

Containerizing Elasticsearch & ClickHouse on Kubernetes: Challenges & Solutions