Fundamentals 9 min read

Testing Probabilistic Events with Binomial Confidence Intervals

To verify that a probabilistic interface behaves as configured, the article explains how to compute binomial confidence intervals using the normal approximation for moderate probabilities and large samples, or the exact Clopper‑Pearson method for extreme or small samples, and provides Java examples and practical guidelines.

DeWu Technology

Dec 25, 2020

Testing Probabilistic Events with Binomial Confidence Intervals

In daily development testing, one may encounter interfaces or methods that are triggered probabilistically. Determining how many samples are needed and what confidence interval is required to verify that the observed probability matches the configured one is the core problem.

This article uses the binomial distribution to derive confidence intervals for two scenarios: (a) when the probability is moderate and the sample size is large (np>5 and n(1-p)>5), the Normal Approximation Method can be applied; (b) when the probability is extreme or the sample size is small (np≤5 or n(1-p)≤5), the exact Clopper‑Pearson method is used.

For the normal approximation, the confidence interval is computed as p ± Z·√[p(1‑p)/n], where Z is the standard normal quantile (e.g., 1.96 for 95% confidence). The article provides the formula (image) and explains the meaning of p, n, a, and Z.

For the exact method, the interval is obtained from the binomial cumulative distribution function, which can be expressed via the Beta inverse function (BetaInv). The simplified formula uses n, k (number of successes), a (type‑I error), and BetaInv. This approach works even for extreme probabilities (p=0 or 1).

Two practical cases are presented:

Case 1 – Normal approximation: a 50 % interface called 50 times with 28 successes yields a 95 % confidence interval of [0.42, 0.70], confirming the probability is acceptable.

Case 2 – Exact method: a 17 % interface called 50 times with 3 successes gives a 95 % interval of [0.0125, 0.1655], showing the observed rate falls outside the expected range.

Code snippets illustrate how to extract reply information from a JSON response and how to compute the probability margin in Java:

if (replyListResponse.getJSONObject("data").getJSONObject("simpleReply")
    .getJSONArray("list").getJSONObject(j).getJSONObject("childReply")
    .getJSONArray("list") != null) {
    DuAssert.getInstance().assertTrue(replyListResponse
        .getJSONObject("data").getJSONObject("simpleReply")
        .getJSONArray("list").getJSONObject(j).getJSONObject("childReply")
        .getJSONArray("list").getJSONObject(0).getString("userName")
        .equals("ai鉴别"));
}

public static double getProbability(int times, double targetP) {
    double z = 1.96;
    return Math.sqrt(targetP * (1 - targetP) / times) * z;
}

double realP = n / 50.00;
double max = Math.ceil(50 * realP + 50 * Probability.getProbability(50, realP));
double min = Math.ceil(50 * realP - 50 * Probability.getProbability(50, realP));
if (25 <= max && 25 > min) {
    DuAssert.getInstance().assertEquals("在区间内，概率可信", "在区间内，概率可信");
} else {
    DuAssert.getInstance().assertEquals("不在区间内，不可信，提bug！", "在区间内，概率可信");
}

The article concludes that testing probabilistic features should start with at least 30 trials, prefer the normal approximation when applicable, and always be aware of type‑I error when interpreting confidence intervals.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

binomial confidence interval Clopper-Pearson Java Testing normal approximation probability testing statistical methods

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.