Explore ClickHouse 25.10: 20 JOIN Boosts, Vector Search & New SQL
ClickHouse 25.10 introduces a suite of enhancements—including 20 JOIN performance upgrades, lazy column replication, Bloom filter runtime filters, disjunction push‑down, automatic column statistics, the QBit vector type, expanded SQL operators, negative LIMIT/OFFSET, Arrow Flight support, and delayed secondary index materialization—backed by detailed benchmarks and contributor acknowledgments.
ClickHouse 25.10 was officially released, bringing 20 new JOIN‑related features, 30 performance optimizations, and 103 bug fixes. The update focuses on reducing memory and CPU consumption during joins, improving query planning, and extending the SQL language.
Lazy Column Replication
This mechanism avoids repeated copying of identical column values during JOINs. Instead of materialising duplicate strings, ClickHouse stores a compact index that points to the original values, copying them only when truly needed.
Enable it with the settings:
SET enable_lazy_columns_replication = 1,
allow_special_serialization_kinds_in_output_formats = 1;Benchmark on an AWS m6i.8xlarge instance (32 vCPU, 128 GiB) using a self‑join on a hits table shows a speed‑up from 83 s to 4 s (≈20×) and a slight reduction in peak memory usage.
Bloom Filter Runtime Filters
ClickHouse now builds a Bloom filter on the join key of the right‑hand table and applies it as a PREWHERE filter on the left‑hand table, skipping irrelevant rows before the join.
Activate with: SET enable_join_runtime_filters = 1; Using the TPC‑H orders and customer tables, the query execution time dropped from 1.005 s to 0.471 s (≈2.1× faster) while peak memory fell from 1.24 GiB to 185 MiB.
Disjunction Push‑Down
Complex OR conditions that contain predicates for both sides of a join are now split and pushed down to each table, allowing early filtering.
Example:
(t1.k IN (1,2) AND t2.x = 100) OR
(t1.k IN (3,4) AND t2.x = 200)After push‑down, the left table receives t1.k IN (1,2,3,4) and the right table receives t2.x IN (100,200), reducing data read and improving performance. Benchmarks on the same TPC‑H dataset show a 24× speed‑up and >60× memory reduction.
Automatic Column Statistics
The new auto_statistics_types table setting automatically builds statistics (e.g., minmax, uniq, countmin) for all applicable columns of a MergeTree table, improving join reordering decisions without manual intervention.
CREATE TABLE tpch.orders (...)
SETTINGS auto_statistics_types = 'minmax, uniq, countmin';QBit Vector Data Type
QBit is a novel type for vector embeddings that stores values bit‑wise, allowing runtime selection of the number of high‑order bits used for search, thus trading precision for speed.
CREATE TABLE vectors (
id UInt64,
name String,
vec QBit(BFloat16, 1536)
) ORDER BY ();Query example:
SELECT id, name FROM vectors
ORDER BY L2DistanceTransposed(vector, target, 10)
LIMIT 10;SQL Extensions
IS NOT DISTINCT FROM ( <=>) is now available in all SQL contexts, treating NULL as equal.
Negative LIMIT and OFFSET values enable retrieving the most recent rows while preserving ascending order. LIMIT BY ALL provides a concise way to return a single row per distinct combination of selected columns.
Examples:
SELECT NULL <=> NULL, NULL = NULL; SELECT date, price FROM uk_price_paid
WHERE date >= '2024-01-01' AND price > 10000000
ORDER BY date DESC LIMIT -10; SELECT town, district, type FROM uk_price_paid
WHERE county = 'GREATER LONDON' AND price > 10000000 AND type <> 'other'
ORDER BY price DESC
LIMIT 1 BY ALL;Arrow Flight Support
ClickHouse now fully supports the Arrow Flight protocol as both client and server. Enable the server with a simple configuration file:
arrowflight_port: 6379
arrowflight:
enable_ssl: false
auth_required: falseQuery via Arrow Flight:
SELECT max(price), count()
FROM arrowflight('localhost:6379', 'uk_price_paid', 'default', '');Delayed Materialisation of Secondary Indexes
A new setting allows postponing the materialisation of costly secondary indexes (e.g., vector similarity indexes) until a later merge phase, dramatically speeding up bulk inserts.
SET exclude_materialize_skip_indexes_on_insert = 'vector_idx';After inserting into a copy table dbpedia2, the write time dropped from ~6 s to ~0.5 s. The index can be materialised later with:
ALTER TABLE dbpedia2 MATERIALIZE INDEX vector_idx SETTINGS mutations_sync = 2;Contributors
The release acknowledges dozens of new contributors who made their first commits in version 25.10, as well as specific contributors for each feature (e.g., Pavel Kruglov for lazy column replication, Alexander Gololobov for Bloom filter runtime filters, Yarik Briukhovetskyi for disjunction push‑down, Anton Popov for automatic statistics, Raufs Dunamalijevs for QBit, Nihal Z. Miaji et al. for SQL extensions, zakr600 & Vitaly Baranov for Arrow Flight, George Larionov for delayed index materialisation).
Overall Impact
These four major join‑related optimisations—lazy column replication, Bloom‑filter runtime filters, disjunction push‑down, and automatic statistics—represent a significant step forward for ClickHouse’s query engine, delivering multi‑fold speed improvements and substantial memory savings while expanding the system’s analytical capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
