你怎么能得到一个大于 1 的排列特征重要性？

Question

使用这个简单的代码：

import lightgbm as lgb
from sklearn.inspection import permutation_importance
X_train, X_test, y_train, y_test = train_test_split(X, y)
lgbr = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.15)
model_lgb = lgbr.fit(X_train, y_train)
r = permutation_importance(model_lgb, X_test, y_test, n_repeats=30, random_state=0)
for i in r.importances_mean.argsort()[::-1]:
    print(f"{i} " f"{r.importances_mean[i]:.3f}" f" +/- {r.importances_std[i]:.3f}")

当我在我的数据集上运行时，最高值约为 1.20。

但我认为 permutation_importance 特征的均值是通过排列特征列平均改变分数的量，所以这不能超过 1 对吗？

我错过了什么？

（如果我将 lightgbm 替换为 xgboost，我会遇到同样的问题，所以我认为这不是特定回归方法的一个特征。）

Answer 1

But I thought that the permutation_importance mean for a feature was the amount that the score was changed on average by permuting the feature column[...]

正确。

so this can't be more than 1 can it?

那要看分数能不能“恶化”1以上了。permutation_importance的scoring参数默认是None，它使用了模型的[=13] =] 功能。对于 LGBMRegressor（和大多数回归变量），这是 R2 分数，它的最大值为 1，但可以取任意大的负值，因此分数确实可以恶化任意大的量。

你怎么能得到一个大于 1 的排列特征重要性？

How come you can get a permutation feature importance greater than 1?

python

scikit-learn