Databases 13 min read

How ByConity 0.3.0 Boosts Text Search and Cold‑Read Performance with Inverted Indexes

ByConity 0.3.0 introduces a new inverted index supporting Chinese tokenization, a shared‑storage leader election mechanism, enhanced cold‑read performance via Prefetch and adaptive mark allocation, and an upgraded ELT pipeline with BSP mode, all detailed with implementation steps, code examples, and future roadmap.

ITPUB

Dec 28, 2023

How ByConity 0.3.0 Boosts Text Search and Cold‑Read Performance with Inverted Indexes

Inverted Index Support

During the use of ByConity, many workloads required high‑performance text search (e.g., StringLike). To meet this demand while staying compatible with ClickHouse’s inverted‑index feature, ByConity 0.3.0 adds full text‑search support, including Chinese tokenization and I/O optimizations.

Phase 1: Enhance ClickHouse community features.

Phase 2: Add advanced text‑search capabilities such as phrase queries, fuzzy matching, and JSON‑type support.

The inverted index maps values to row identifiers, allowing the engine to locate matching rows without scanning large data blocks and reducing filter‑condition computation.

Write‑path changes : Generate an inverted index for each column during write and store it in remote storage.

Read‑path changes : Build expressions from filter conditions to prune data ranges during query execution.

After adding the inverted index, the write and read flows are illustrated below.

Usage example – Chinese token split

CREATE TABLE chinese_token_split (
    `key` UInt64,
    `doc` String,
    -- token_chinese_default represents the token_chinese_default tokenizer
    -- default represents the default configuration
    INDEX inv_idx doc TYPE inverted('token_chinese_default', 'default', 1.0) GRANULARITY 1
) ENGINE = MergeTree ORDER BY key

Chinese tokenization also requires additional dictionary and model configuration in the ByConity config file.

Future work will extend text‑search to phrase queries, fuzzy matching, relevance scoring, and add JSON‑type support, while optimizing the index‑based row retrieval and merge‑process reuse.

Shared‑Storage Leader Election

ByConity’s architecture contains multiple control nodes (e.g., Resource Manager, TSO) that need high‑availability leader election. The previous solution used clickhouse‑keeper (Raft‑based), which required three or more nodes and introduced operational complexity.

Because ByConity is a cloud‑native service, a new leader‑election method based on shared storage and compute‑storage separation was designed. The competition for leadership is modeled as a multi‑thread synchronization problem, using CAS operations on a KV store to emulate Linux mutex wake‑up semantics.

Key ideas:

Each candidate node acts like a thread.

The KV store provides CAS writes that guarantee visibility order.

The node that successfully performs CAS becomes the leader.

The leader’s address is written as the value, enabling readers to discover the service without knowing non‑leader addresses.

The election process includes stages such as candidate selection, campaigning, victory, inauguration, renewal, voluntary and involuntary resignation, and handling term expiration.

In Kubernetes deployments, scaling the number of replicas automatically adjusts the number of Pods participating in the election. In physical deployments, the scheme works without additional service‑discovery configuration, eliminating the need for clickhouse‑keeper.

Cold‑Read Performance Enhancements

Version 0.2.0 introduced IOScheduler to improve cold reads on S3. Version 0.3.0 adds a ReadBuffer Preload and a Prefetch mechanism that pushes Mark‑Range filtering down to the execution side.

Prefetch workflow consists of three stages:

Stage 1: Evenly distribute all required marks across threads.

Stage 2: Subdivide each thread’s marks into multiple tasks to enable work‑stealing.

Stage 3: Prefetch data per task.

Larger tasks reduce the number of network requests but may affect work‑stealing efficiency.

Prefetch does not need to read the entire S3 stream; data can be consumed on‑demand after the request returns.

Adaptive “mark‑per‑task” optimization adjusts the number of marks per task based on query size, using a global mark pool for the initial even distribution and then applying a non‑uniform strategy per thread.

These optimizations double cold‑read performance on S3 and improve HDFS cold reads by roughly 20%.

ELT Capability Improvements

ByConity 0.2.0 introduced asynchronous execution, queuing, and disk‑based shuffle for ELT workloads. Version 0.3.0 adds a new BSP (Bulk‑Synchronous‑Parallel) mode that executes queries stage‑by‑stage with an enhanced disk‑based shuffle, increasing throughput under constrained resources.

The physical/logic plan is split into multiple stages (plan segments or fragments). Each stage contains many identical tasks that process different data partitions, with data exchange between dependent stages.

Implementation includes a hierarchical scheduler: StagedSegmentScheduler (similar to DAGScheduler) manages stage state and dispatch, while TaskScheduler creates task sets and assigns them to execution nodes. The BSP mode reuses the asynchronous execution framework from 0.2.0 and will be further refined in future releases.

For a complete list of 0.3.0 features and optimizations, see the release page: https://github.com/ByConity/ByConity/releases/tag/0.3.0

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database inverted index Leader Election ELT ByConity Cold Read

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.