我无法使用随机森林输出最佳选择的特征？

Question

我制作了阈值 = 0.15 的随机森林分类器，但是当我尝试迭代所选模型时，它没有输出最佳选择的特征。

代码：

X = data.loc[:,'IFATHER':'VEREP']
y = data.loc[:,'Criminal']

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import accuracy_score

    # Split the data into 30% test and 70% training
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

clf = RandomForestClassifier(n_estimators=100, random_state=0)

# Train the classifier
clf.fit(X_train, y_train)

# Print the name and gini importance of each feature
for feature in zip(X, clf.feature_importances_):
    print(feature)

# Create a selector object that will use the random forest classifier to identify
# features that have an importance of more than 0.15
sfm = SelectFromModel(clf, threshold=0.15)

# Train the selector
sfm.fit(X_train, y_train)

下面的代码不起作用：

# Print the names of the most important features
for feature_list_index in sfm.get_support(indices=True):
    print(X[feature_list_index])

我能够使用随机森林分类器而不是使用阈值来获得每个特征的特征重要性。我认为 get_support() 不是正确的方法。

截图：

Answer 1

要创建包含最重要特征的新 X 数据集：

X_selected_features = sfm.fit_transform(X_train, y_train)

要查看功能名称：

features = np.array(list_of_feature_names)
print(features[sfm.get_support()])

如果 X 是 Pandas.DataFrame:

features = X.columns.values

我无法使用随机森林输出最佳选择的特征？

I am not able to output the best selected features using random forest?

python

pandas

random-forest

scikit-learn