Databases 23 min read

How AutoTiKV’s Machine Learning Optimizes Beaver Search Engine Performance

This article describes how the Beaver search engine’s many performance‑related configuration parameters can be automatically tuned using machine‑learning techniques from OtterTune and AutoTiKV, detailing the background research, Gaussian Process regression model, Bayesian optimization process, implementation steps, test results, and future improvements.

Efficient Ops
Efficient Ops
Efficient Ops
How AutoTiKV’s Machine Learning Optimizes Beaver Search Engine Performance

Introduction

Beaver is a self‑developed, secure search engine composed of Master, Broker and Datanode, widely used for storing and analyzing logs from large distributed systems. Because it has many performance‑related configuration items, manual tuning is time‑consuming, so automatic adjustment of configuration parameters is needed.

Background Research

Several automatic tuning projects exist, such as CMU’s OtterTune and PingCAP’s AutoTiKV, providing papers and open‑source code.

OtterTune

Database systems have hundreds of parameters that affect performance. OtterTune, developed by CMU, uses machine learning to automatically recommend optimal parameter files, helping DBAs reduce manual effort.

Client collects statistics from the target database and uploads them to the server.

Server trains a model and recommends a configuration file.

The client applies the recommended file, restarts the database, measures performance, and repeats until satisfied.

AutoTiKV

AutoTiKV is an automatic tuning tool for TiKV, based on a SIGMOD 2017 paper, using machine‑learning models.

It adopts OtterTune’s design and simplifies the architecture. The tuning process loops for a configurable number of rounds (default 200) or until convergence.

Machine‑Learning Model

AutoTiKV uses Gaussian Process Regression (GP), a non‑parametric model that works well with few training samples. GP estimates the mean m(X) and standard deviation s(X) of the function f:X→Y, which predicts database latency.

GP is combined with Bayesian Optimization, which consists of two steps: (1) estimate the function distribution via GP, and (2) use an acquisition function to guide the next sample, balancing exploration and exploitation.

The acquisition function U(X)=m(X)+k·s(X) (k>0) selects points with high uncertainty (large s) or high predicted performance (large m).

Feasibility Analysis

Existing open‑source auto‑tuning tools rely on machine‑learning‑driven recommendation of configuration parameters for databases or other engines. OtterTune’s generic model can be applied to many scenarios, including OS kernel tuning, as long as metrics are available.

We can adapt AutoTiKV’s code to tune Beaver’s Datanode by replacing the target DB, using the baimi Apache log dataset for performance testing.

Implementation Details

We reuse AutoTiKV’s algorithm and modify the database‑specific code to work with Beaver’s flag‑style configuration.

Key changes include:

Parsing and writing Beaver flags in

controller.py

instead of YAML.

Declaring tunable knobs and metrics in

settings.py

.

Collecting search latency via Beaver Broker’s API, warming up 20 requests and measuring 100 requests, using the 90th‑percentile as the metric.

<code># beaver cluster broker address and port
beaver_broker_ip="172.21.16.16"
beaver_broker_port="50061"
# index for search
index_forsearch="ops-http_baimi-20210507"
# pb search query to compute avg(apache.resp_len)
pb_forsearch='search_info {query {type: kQueryMatchAll}fetch_source {fetch: true}size {value: 0} aggregations { aggs { type: kAggAvg name: "av(apache.resp_len)" body { field: "apache.resp_len__l__" } } } query_time_range {time_range {min: 0 max: 1620374828405}}}'
# workload metrics
wl_metrics={"avgsearch": ["search_latency","compaction_mem","compaction_cpu"]}
loadtype = "avgsearch"
wltype = "avgsearch"
target_metric_name="search_latency"
target_knob_set=['--enable_query_cache','--max_concurrency_tasks_per_search','--max_per_search_ram','--max_per_sub_search_ram','--block_ids_per_batch']
</code>

Controller functions define knob metadata and metric reading functions, and restart Beaver Datanode via OS commands.

<code># example knob definition
knob_set = {
    "--max_concurrency_tasks_per_search": {
        "changebyyml": True,
        "set_func": None,
        "minval": 0,
        "maxval": 0,
        "enumval": [4,6,8],
        "type": "enum",
        "default": 0
    }
}
metric_set = {
    "search_latency": {
        "read_func": read_search_latency,
        "lessisbetter": 1,
        "calc": "ins"
    }
}
</code>

Test Results

Knobs

The tested knobs are:

Metrics

search_latency (ms)

compaction_mem (%)

compaction_cpu (%)

After running

pipeline.py

, the best configurations were found to be either all zeros or a mix such as

[0,1,0,0,1]

, corresponding to:

<code>--enable_query_cache false
--max_concurrency_tasks_per_search 4
--max_per_search_ram 198m
--max_per_sub_search_ram 99m
--block_ids_per_batch 16
</code>

Increasing search concurrency or the number of blocks per SubSearch improves search performance.

Open Issues

Add more workload modes to test index and search performance.

Implement a graceful restart for Beaver Datanode.

Adjust wait time for Datanode availability based on environment.

Mitigate metric noise caused by network variability.

References

OtterTune. https://github.com/cmu-db/ottertune

AutoTiKV. https://github.com/tikv/auto-tikv

Automatic Database Management System Tuning Through Large‑scale Machine Learning. https://www.cs.cmu.edu/~ggordon/van-aken-etal-parameters.pdf

Additional references omitted for brevity.

Machine LearningDatabase PerformanceGaussian Processauto-tuningBayesian OptimizationBeaver
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.