Backend Development 9 min read

Adaptive Sampling: Dynamically Tuning Tracing to Cut Overhead

This article explains how adaptive sampling reduces the performance and storage impact of full‑link tracing by dynamically adjusting the sampling rate based on application QPS, outlines the mathematical model, discusses implementation details such as warm‑up, QPS lag, concurrency, and provides a Java reference sampler.

ITFLY8 Architecture Home

Jul 14, 2021

Adaptive Sampling: Dynamically Tuning Tracing to Cut Overhead

In production, enabling full‑link tracing for every request adds performance overhead and consumes storage, so sampling is essential.

Setting a sampling rate (e.g., 0.5) reduces load, but fixed‑rate sampling has two problems: applications cannot easily evaluate the rate, and traffic variations cause unbalanced samples.

Adaptive sampling addresses these issues, originally described in Dapper’s “Coping with aggressive sampling”.

QPS‑SampleCount‑Rate Function

We model the number of samples per second as a function of application QPS, then derive the sampling rate as (samples per second) / QPS.

Minimum threshold: when QPS ≤ 10, sampling rate = 100 % (samples per second = QPS).

Business target: assume average QPS = 200 and aim to cut storage by 40 %; at QPS = 200 we need 120 samples per second.

Maximum threshold: for very high QPS we keep a fixed sample count; the derivative of the QPS‑samples function becomes 0 beyond a max QPS (e.g., 2000).

Assuming a quadratic form samples = a·QPS² + b·QPS + c, we solve:

100a + 10b + c = 10

40000a + 200b + c = 120

4000a + b = 0

Resulting function: samples = ‑0.00015·QPS² + 0.611·QPS + 3.905 (adjustable per business).

The corresponding QPS‑samples and QPS‑rate curves are shown below:

Calculating QPS

Using the same 100‑size BitSet from the reservoir‑sampling algorithm, we record the timestamp of every 100th request; the interval between successive timestamps yields QPS = 100 / interval.

Applying the Sampling Rate

When the counter reaches 99, we compute a new sampling rate from the QPS‑samples function, generate a new BitSet of 100 bits, and apply it to the next cycle.

Warm‑up

Initially the BitSet uses a 100 % rate because QPS is unknown, ensuring the first 100 requests are fully sampled for debugging.

QPS Lag

Because the rate is derived from the previous 100 samples, there is a lag equal to the time to consume those samples; however, for tracing this lag is acceptable.

Concurrency

Rate calculation and BitSet regeneration occur once per 100 requests, protected by a synchronized block, which is lightweight in modern JVMs.

Reference Implementation

public class AdvancedAdaptiveSampler extends Sampler {
    private volatile AtomicInteger counter = new AtomicInteger(0);
    private static BitSet sampleDecisions;
    private long prevTime;
    private static final int MIN_SAMPLE_LIMIT = 10;
    private static final int MAX_SAMPLE_LIMIT = 2000;

    public AdvancedAdaptiveSampler() {
        int outOf100 = (int) (1 * 100.0f);
        // Reservoir sampling
        sampleDecisions = RandomBitSet.genBitSet(100, outOf100, new Random());
        prevTime = System.currentTimeMillis();
    }

    @Override
    protected boolean doSampled() {
        boolean res = true;
        int i;
        do {
            i = this.counter.getAndIncrement();
            if (i < 99) {
                res = sampleDecisions.get(i);
            } else {
                synchronized (this) {
                    if (i == 99) {
                        res = sampleDecisions.get(99);
                        int outOf100 = calAdaptiveRateInHundred(System.currentTimeMillis() - prevTime);
                        sampleDecisions = RandomBitSet.genBitSet(100, outOf100, new Random());
                        prevTime = System.currentTimeMillis();
                        this.counter.set(0);
                    }
                }
            }
        } while (i > 99);
        return res;
    }

    private int calAdaptiveRateInHundred(long interval) {
        double qps = (double) (100 * 1000) / interval;
        if (qps <= MIN_SAMPLE_LIMIT) {
            return (int) (1 * 100.0f);
        } else {
            if (qps > MAX_SAMPLE_LIMIT) {
                qps = MAX_SAMPLE_LIMIT;
            }
            double num = -0.00015 * Math.pow(qps, 2) + 0.611 * qps + 3.905;
            return (int) Math.round((num / qps) * 100.0f);
        }
    }
}

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

tracing QPS reservoir-sampling adaptive sampling

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.