XGBoost 特征重要性——只展示了两个特征

Question

我想查看我发送到 XGBoost 模型的特征集中的所有特征的重要性。我好像只见过两个。好消息是它看起来确实是应该被确定为重要的集合中的 2 个。但是，我真的很想看到所有功能。训练集中共有 20 个特征。任何帮助将不胜感激。

plot_importance 的默认设置是显示所有功能（我查看了代码以确认）。

https://xgboost.readthedocs.io/en/latest/python/python_api.html

max_num_features (int, default None) – 图上显示的最大特征数。如果None，将显示所有特征。

显示情节的代码：

import shap
import numpy as np
import matplotlib.pylab as pl

xgb.plot_importance(model,max_num_features=None)
pl.title("xgboost.plot_importance(model)")
pl.show()

当我查看模型中的元组或 booster.get_scores 时，我看到了相同的两个：

{'locations': 80, 'avg_loc_dist': 20}

图表：

已添加 plot_tree 图片：

Answer 1

我能够从 XGBoost 论坛上的人们那里得到 answer/help。我在这里为可能遇到相同问题的其他人发布回复（和 link）。

之所以只有这两个特征出现，是因为它们是拆分中仅有的两个。他们建议使用 xgb.dump_model() 在我能够看到的转储中查看它。只是无知，缺乏对我的理解。

回复：

Most likely only these two features are being used in the splits. You can verify this by running xgb.dump_model() to get the text representation of the model.

https://discuss.xgboost.ai/t/xgboost-feature-importance-only-shows-two-features/1541/2

Answer 2

您也可以尝试使用 scikit-learn 中的 permutation_importance。

我看到你导入了 shap 包，shap 包有可用的重要性图：

shap.summary_plot(shap_values, X_test, plot_type="bar")

这两种方法应该有助于调试模型。您可以在我的 blog post 中阅读有关在 Xgboost 中计算特征重要性的不同方法的更多详细信息。

XGBoost 特征重要性——只展示了两个特征

XGBoost feature importance - only shows two features

python

scikit-learn

xgboost