关于 python 中 XGBoost 中 predict_proba 函数的问题

Problem regarding predict_proba function in XGBoost in python

目前我正在研究二元分类问题。 我希望我的预测输出是概率,而不是使用 XGBoost 的 1 或 0。

我把数据集分为训练集、验证集和测试集。

global label, id_column, features
label = 'is_default'
id_column = 'emp_id'
features = ['age', 'income', 'dependent','A','B','C']

train, valid, test = np.split(df.sample(frac=1), [int(.8*len(df)), int(.95*len(df))])

X_train, y_train = train[features], train[label]
X_valid, y_valid = valid[features], valid[label]
X_test, y_test = test[features], test[label]

params = {
 'num_class' : 2,
 'learning_rate' : 0.1,
 'n_estimators':5,
 'max_depth':5,
 'min_child_weight':1,
 'gamma':2,
 'subsample':0.8,
 'colsample_bytree':0.5,
 'objective':'multi:softprob',
 'scale_pos_weight':2.14,
 'nthread':4,
 'seed':27}

# fit model 
model = XGBClassifier(**params)
model.fit(X_train, y_train)

valid_pred = model.predict_proba(X_test)

print(valid_pred) 

#My output looks like - 
#
#array([[0.39044815, 0.6095518 ],
#       [0.4008397 , 0.59916025],
#       [0.40074524, 0.5992548 ],
#       ...,
#       [0.3613969 , 0.6386031 ],
#       [0.45495912, 0.5450409 ],
#       [0.41036654, 0.58963346]], dtype=float32)
#
#It's give me the 1 or 0 value which I don't want. I want only the max probability. Like 0.6095518,0.59916025...etc.How to do this things?

best_valid_preds = [np.argmax(x) for x in valid_pred]
print(best_valid_preds)

因为你只想要最大概率。比如0.6095518,0.59916025...等等

你可以使用下面的代码,

best_valid_preds = [np.max(x) for x in valid_pred]

见下方的玩具样品,

preds = np.random.rand(100, 2)

best = [np.max(x) for x in preds]

print(best) # [0.9935469310532575,
 0.7121431432601246,
 0.5863137762128169,
 0.6562235545646353,
 0.7955074578808067,