Artificial Intelligence 11 min read

How τ‑FPL Reduces False Positives in High‑Risk Classification Tasks

τ‑FPL introduces a novel ranking‑threshold approach that explicitly incorporates a false‑positive‑rate constraint into binary classifier training, offering linear‑time optimal solutions, theoretical error bounds, and superior experimental performance on high‑risk tasks such as disease monitoring and autonomous driving.

Alibaba Cloud Developer

Aug 15, 2018

How τ‑FPL Reduces False Positives in High‑Risk Classification Tasks

Abstract Many real‑world applications require learning a binary classifier under a false‑positive‑rate (FPR) upper bound. Existing methods adjust standard classifiers or add imbalance‑aware losses, but they do not embed the FPR constraint directly, limiting accuracy. This paper proposes a new ranking‑threshold method, τ‑FPL, which explicitly incorporates the FPR limit, solves the resulting ranking problem in linear time, and converts the learned ranking function into a low‑FPR classifier. Theoretical error analysis and experiments demonstrate τ‑FPL’s superiority over traditional approaches.

Research Background

In high‑risk classification scenarios such as high‑mortality disease monitoring and autonomous driving, the cost of false positives differs greatly from false negatives. Missing a potential patient is far more severe than misdiagnosing a healthy individual. Consequently, the learning objective shifts to minimizing the false‑negative rate while keeping the FPR below a threshold τ, rendering conventional accuracy‑ or AUC‑based objectives unsuitable.

Motivation: From Constrained Classification to Ranking Learning

The empirical Neyman‑Pearson (NP) classification problem seeks a scoring function f and threshold b that satisfy the FPR constraint while minimizing the false‑negative probability. This problem is equivalent to a ranking learning formulation that compares each positive sample with the τ‑th largest negative score, effectively a partial‑AUC optimization around the FPR threshold.

Neyman‑Pearson classification equivalence

Although the ranking problem is NP‑hard even with convex surrogates, we design a convex upper‑bound that remains a ranking problem, aiming to maximize the distance between the centroid of the top‑τ fraction of negative scores and the positives.

τ‑FPL Algorithm Overview

τ‑FPL consists of two stages: ranking and thresholding. In the ranking stage, the algorithm learns a scoring function that pushes positives ahead of the top‑τ negative centroid. In the thresholding stage, an appropriate threshold converts the scoring function into a binary classifier.

Ranking Learning Optimization

We formulate an adversarial learning problem whose dual is smooth and free of nondifferentiable terms. Using projected gradient descent accelerated by Nesterov’s method yields optimal convergence rates.

Linear‑Time Projection

A key step is projecting gradient updates onto the feasible set Γ_k. Existing methods are super‑linear; we propose an O(n) algorithm based on a combination of bisection and divide‑and‑conquer, exploiting KKT conditions to solve a piecewise‑linear system with three dual variables. Experiments show 1–3 orders of magnitude speed‑up over prior solvers.

Threshold Selection

The threshold is chosen by repeatedly splitting the training set, training the ranking function on one part, and selecting the threshold on the other. Averaging thresholds over multiple splits controls bias and variance, and the procedure is parallelizable.

Theoretical Results

Combining accelerated gradient methods with the linear‑time projection guarantees linear‑time per iteration and optimal convergence. We also derive a generalization error bound, showing that with high probability the true error is bounded by the empirical error.

Experimental Results

Figures illustrate that τ‑FPL consistently achieves higher partial‑AUC and lower NP‑score across various τ values compared to cost‑sensitive SVM, binary search ranking, and other baselines. The OOB thresholding further ensures the FPR constraint is respected while improving accuracy.

Conclusion

Controlling the false‑positive rate is crucial in high‑risk classification. This work formulates a ranking problem that explicitly maximizes the probability of placing positives before the τ‑percentile of negatives, solves it in linear time via accelerated gradient and projection, and selects thresholds that yield low‑FPR classifiers. Both theoretical analysis and extensive experiments validate the effectiveness of τ‑FPL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ranking machine-learning false-positive-rate linear-time Neyman-Pearson

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.