Which Neural Network Method Best Estimates Uncertainty in Regression? A Comparative Study
This article examines why regression models need uncertainty estimates, explains aleatoric and epistemic uncertainty, compares four neural‑network approaches (Mean + LogStd, Mean + LogVariance, MC Dropout, simplified PPO) on a concrete‑strength dataset, and analyzes their experimental performance and limitations.
Uncertainty in Regression
Regression models used in domains such as weather forecasting, autonomous driving, medical diagnosis, and energy consumption often output a single point estimate. Without an uncertainty measure, users cannot assess the reliability of a prediction, which can be critical in high‑risk applications.
Types of Uncertainty
Aleatoric uncertainty (data uncertainty) originates from inherent noise or randomness in the observations (e.g., sensor error). It cannot be reduced by collecting more data.
Epistemic uncertainty (model uncertainty) stems from limited knowledge or insufficient training coverage. It can be mitigated by improving model capacity or adding diverse data.
Uncertainty Estimation Methods
Mean + LogStd The network predicts a mean μ and the logarithm of the standard deviation logσ . Assuming a Gaussian likelihood, the negative log‑likelihood is used as loss:
x = 'input features'
y = 'targets'
mu, log_std = mean_logstd_model(x)
# Normal distribution with scale = exp(log_std)
dist = torch.distributions.Normal(loc=mu, scale=log_std.exp())
loss = -dist.log_prob(y).mean()Mean + LogVariance Similar to the previous method, but the network outputs the logarithm of the variance log var . The loss combines the variance term and the squared error:
x = 'input features'
y = 'targets'
mu, log_variance = mean_logvariance_model(x)
loss = 0.5 * log_variance + 0.5 * ((y - mu) ** 2) / torch.exp(log_variance)
loss = loss.mean()Monte Carlo Dropout (MC Dropout) Dropout layers remain active during inference. For each input, perform T stochastic forward passes (e.g., T=30 ), collect the predictions {ŷ_t} , and compute:
Predictive mean: ŷ = (1/T) Σ_t ŷ_t Predictive uncertainty (standard deviation): σ = sqrt((1/T) Σ_t (ŷ_t - ŷ)^2) Implementation sketch:
model.train() # keep dropout on
samples = []
for _ in range(T):
samples.append(model(x))
preds = torch.stack(samples)
mean = preds.mean(dim=0)
std = preds.std(dim=0)Simplified PPO for Regression The Proximal Policy Optimization algorithm is adapted to a supervised regression setting. The actor network outputs a mean μ and a standard deviation σ . A single‑step “environment” is defined where the reward is the negative mean‑squared error of a sampled prediction:
# Sample a prediction
z = torch.distributions.Normal(mu, sigma).sample()
reward = -((z - y) ** 2).mean()
# Value network predicts expected reward (baseline)
value = value_net(x)
advantage = reward - value
# PPO clipped objective (simplified, no GAE)
ratio = torch.exp(log_prob_new - log_prob_old)
clipped = torch.clamp(ratio, 1 - ε, 1 + ε) * advantage
loss = -torch.min(ratio * advantage, clipped).mean()Because regression is treated as a one‑step episode, Generalized Advantage Estimation (GAE) is omitted.
Experimental Setup
Dataset : Concrete compressive strength (1,030 samples, 8 input features, 1 target). Data are split 70 % training / 30 % testing. Input features are standardized (mean 0, std 1) and the target is divided by 100 to lie in the interval (0, 1).
Model architecture : Fully‑connected network with four hidden layers of 64 neurons each. The output layer varies according to the uncertainty method (e.g., two heads for mean + log‑std).
Training hyper‑parameters : 2,000 epochs, batch size 256, learning rate 0.0001 (Adam optimizer assumed).
Results Analysis
Mean squared error (MSE) on training and test sets was measured for each method. The baseline regression (no uncertainty) achieved the lowest test MSE, while the Mean + LogStd method had the highest among the uncertainty‑aware approaches. Mean + LogStd and MC Dropout performed similarly and were second best; the simplified PPO ranked third.
Key observations:
When predictions are filtered by an uncertainty threshold, the baseline shows a flat MSE curve (no uncertainty information).
Mean + LogStd continues to reduce MSE beyond an uncertainty threshold of ≈0.55, indicating a more reliable ordering of confidence.
Mean + LogVariance plateaus earlier, suggesting less discriminative uncertainty.
MC Dropout provides reasonable uncertainty estimates comparable to Mean + LogStd.
The simplified PPO produces an inverted uncertainty signal: removing low‑uncertainty predictions increases MSE, implying that its predicted standard deviation does not correlate with true error.
Figures (originally included in the article) illustrate the MSE curves, true‑vs‑predicted scatter plots, and the effect of uncertainty‑based filtering.
Conclusion
On the concrete‑strength dataset, the Mean + LogStd and Mean + LogVariance approaches provide the most accurate and useful uncertainty estimates. MC Dropout is an acceptable alternative with comparable performance. The simplified PPO adaptation fails to deliver reliable uncertainty signals, even when its confidence ordering is reversed.
Code repository:
https://github.com/navid-bamdad-roshan/regression-with-uncertainty-methods-comparisonSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
