你怎么能得到一个大于 1 的排列特征重要性?
How come you can get a permutation feature importance greater than 1?
使用这个简单的代码:
import lightgbm as lgb
from sklearn.inspection import permutation_importance
X_train, X_test, y_train, y_test = train_test_split(X, y)
lgbr = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.15)
model_lgb = lgbr.fit(X_train, y_train)
r = permutation_importance(model_lgb, X_test, y_test, n_repeats=30, random_state=0)
for i in r.importances_mean.argsort()[::-1]:
print(f"{i} " f"{r.importances_mean[i]:.3f}" f" +/- {r.importances_std[i]:.3f}")
当我在我的数据集上 运行 时,最高值约为 1.20
。
但我认为 permutation_importance 特征的均值是通过排列特征列平均改变分数的量,所以这不能超过 1 对吗?
我错过了什么?
(如果我将 lightgbm 替换为 xgboost,我会遇到同样的问题,所以我认为这不是特定回归方法的一个特征。)
But I thought that the permutation_importance mean for a feature was the amount that the score was changed on average by permuting the feature column[...]
正确。
so this can't be more than 1 can it?
那要看分数能不能“恶化”1以上了。permutation_importance
的scoring
参数默认是None
,它使用了模型的[=13] =] 功能。对于 LGBMRegressor
(和大多数回归变量),这是 R2 分数,它的最大值为 1,但可以取任意大的负值,因此分数确实可以恶化任意大的量。
使用这个简单的代码:
import lightgbm as lgb
from sklearn.inspection import permutation_importance
X_train, X_test, y_train, y_test = train_test_split(X, y)
lgbr = lgb.LGBMRegressor(n_estimators=500, learning_rate=0.15)
model_lgb = lgbr.fit(X_train, y_train)
r = permutation_importance(model_lgb, X_test, y_test, n_repeats=30, random_state=0)
for i in r.importances_mean.argsort()[::-1]:
print(f"{i} " f"{r.importances_mean[i]:.3f}" f" +/- {r.importances_std[i]:.3f}")
当我在我的数据集上 运行 时,最高值约为 1.20
。
但我认为 permutation_importance 特征的均值是通过排列特征列平均改变分数的量,所以这不能超过 1 对吗?
我错过了什么?
(如果我将 lightgbm 替换为 xgboost,我会遇到同样的问题,所以我认为这不是特定回归方法的一个特征。)
But I thought that the permutation_importance mean for a feature was the amount that the score was changed on average by permuting the feature column[...]
正确。
so this can't be more than 1 can it?
那要看分数能不能“恶化”1以上了。permutation_importance
的scoring
参数默认是None
,它使用了模型的[=13] =] 功能。对于 LGBMRegressor
(和大多数回归变量),这是 R2 分数,它的最大值为 1,但可以取任意大的负值,因此分数确实可以恶化任意大的量。