Adaptive Sampling: Dynamically Tuning Tracing to Cut Overhead
This article explains how adaptive sampling reduces the performance and storage impact of full‑link tracing by dynamically adjusting the sampling rate based on application QPS, outlines the mathematical model, discusses implementation details such as warm‑up, QPS lag, concurrency, and provides a Java reference sampler.
In production, enabling full‑link tracing for every request adds performance overhead and consumes storage, so sampling is essential.
Setting a sampling rate (e.g., 0.5) reduces load, but fixed‑rate sampling has two problems: applications cannot easily evaluate the rate, and traffic variations cause unbalanced samples.
Adaptive sampling addresses these issues, originally described in Dapper’s “Coping with aggressive sampling”.
QPS‑SampleCount‑Rate Function
We model the number of samples per second as a function of application QPS, then derive the sampling rate as (samples per second) / QPS.
Minimum threshold: when QPS ≤ 10, sampling rate = 100 % (samples per second = QPS).
Business target: assume average QPS = 200 and aim to cut storage by 40 %; at QPS = 200 we need 120 samples per second.
Maximum threshold: for very high QPS we keep a fixed sample count; the derivative of the QPS‑samples function becomes 0 beyond a max QPS (e.g., 2000).
Assuming a quadratic form samples = a·QPS² + b·QPS + c, we solve:
100a + 10b + c = 10
40000a + 200b + c = 120
4000a + b = 0
Resulting function: samples = ‑0.00015·QPS² + 0.611·QPS + 3.905 (adjustable per business).
The corresponding QPS‑samples and QPS‑rate curves are shown below:
Calculating QPS
Using the same 100‑size BitSet from the reservoir‑sampling algorithm, we record the timestamp of every 100th request; the interval between successive timestamps yields QPS = 100 / interval.
Applying the Sampling Rate
When the counter reaches 99, we compute a new sampling rate from the QPS‑samples function, generate a new BitSet of 100 bits, and apply it to the next cycle.
Warm‑up
Initially the BitSet uses a 100 % rate because QPS is unknown, ensuring the first 100 requests are fully sampled for debugging.
QPS Lag
Because the rate is derived from the previous 100 samples, there is a lag equal to the time to consume those samples; however, for tracing this lag is acceptable.
Concurrency
Rate calculation and BitSet regeneration occur once per 100 requests, protected by a synchronized block, which is lightweight in modern JVMs.
Reference Implementation
public class AdvancedAdaptiveSampler extends Sampler {
private volatile AtomicInteger counter = new AtomicInteger(0);
private static BitSet sampleDecisions;
private long prevTime;
private static final int MIN_SAMPLE_LIMIT = 10;
private static final int MAX_SAMPLE_LIMIT = 2000;
public AdvancedAdaptiveSampler() {
int outOf100 = (int) (1 * 100.0f);
// Reservoir sampling
sampleDecisions = RandomBitSet.genBitSet(100, outOf100, new Random());
prevTime = System.currentTimeMillis();
}
@Override
protected boolean doSampled() {
boolean res = true;
int i;
do {
i = this.counter.getAndIncrement();
if (i < 99) {
res = sampleDecisions.get(i);
} else {
synchronized (this) {
if (i == 99) {
res = sampleDecisions.get(99);
int outOf100 = calAdaptiveRateInHundred(System.currentTimeMillis() - prevTime);
sampleDecisions = RandomBitSet.genBitSet(100, outOf100, new Random());
prevTime = System.currentTimeMillis();
this.counter.set(0);
}
}
}
} while (i > 99);
return res;
}
private int calAdaptiveRateInHundred(long interval) {
double qps = (double) (100 * 1000) / interval;
if (qps <= MIN_SAMPLE_LIMIT) {
return (int) (1 * 100.0f);
} else {
if (qps > MAX_SAMPLE_LIMIT) {
qps = MAX_SAMPLE_LIMIT;
}
double num = -0.00015 * Math.pow(qps, 2) + 0.611 * qps + 3.905;
return (int) Math.round((num / qps) * 100.0f);
}
}
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
