你怎么能得到一个大于 1 的排列特征重要性?

How come you can get a permutation feature importance greater than 1?

使用这个简单的代码:

import lightgbm as lgb
from sklearn.inspection import permutation_importance
X_train, X_test, y_train, y_test = train_test_split(X, y)
lgbr = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.15)
model_lgb = lgbr.fit(X_train, y_train)
r = permutation_importance(model_lgb, X_test, y_test, n_repeats=30, random_state=0)
for i in r.importances_mean.argsort()[::-1]:
    print(f"{i} " f"{r.importances_mean[i]:.3f}" f" +/- {r.importances_std[i]:.3f}")

当我在我的数据集上 运行 时,最高值约为 1.20

但我认为 permutation_importance 特征的均值是通过排列特征列平均改变分数的量,所以这不能超过 1 对吗?

我错过了什么?

(如果我将 lightgbm 替换为 xgboost,我会遇到同样的问题,所以我认为这不是特定回归方法的一个特征。)

But I thought that the permutation_importance mean for a feature was the amount that the score was changed on average by permuting the feature column[...]

正确。

so this can't be more than 1 can it?

那要看分数能不能“恶化”1以上了。permutation_importancescoring参数默认是None,它使用了模型的[=13] =] 功能。对于 LGBMRegressor(和大多数回归变量),这是 R2 分数,它的最大值为 1,但可以取任意大的负值,因此分数确实可以恶化任意大的量。