Tagged articles
1 articles
Page 1 of 1
Meituan Technology Team
Meituan Technology Team
Aug 15, 2019 · Big Data

Inconsistent Predictions in XGBoost on Spark Due to Different Missing Value Handling

The discrepancy between XGBoost’s Java engine and Spark arose because XGBoost4j treats zero as the default missing value while Spark’s sparse vectors use NaN, causing inconsistent predictions, and was resolved by explicitly setting Float.NaN as the missing value or converting sparse vectors to dense so both engines handle zeros uniformly.

SparkSparseVectorXGBoost
0 likes · 13 min read
Inconsistent Predictions in XGBoost on Spark Due to Different Missing Value Handling