在 Python 的决策树中使用 decision_path

Question

我想检索每个实例在决策树或随机森林中所采用的路径。例如，我需要这样的输出：

# 1  1 3 4 8 NA NA
# 2  1 2 5 7 11 NA
# 3  1 3 4 9 10 13
# 4  1 3 4 8 NA NA
# etc

这意味着实例#1 通过从节点 1、3、4 到终端节点 8 的路径，依此类推。很明显，某些实例的路径长度比其他实例的路径长度短。

我使用了 decision_path 但它给出了一个稀疏矩阵，我无法理解并找到这样的路径。即使我无法读取输出。这是 Iris 数据库的示例代码：

from sklearn.datasets import load_iris
iris = load_iris()
import numpy as np
ytrain = iris.target
xtrain = iris.data
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
fitted_tree = dtree.fit(X=xtrain,y=ytrain)
predictiontree = dtree.predict(xtrain)
fitted_tree.decision_path(xtrain)

输出是这样的：

<150x17 sparse matrix of type '<class 'numpy.int64'>'
with 560 stored elements in Compressed Sparse Row format>

请帮我做一个矩阵，比如我在上面提到的那个。我不知道如何处理稀疏矩阵。

Answer 1

感谢@Patrick Artner的评论，这是答案：

dense_matrix = fitted_tree.decision_path(xtrain).todense()

它将给出类似

的输出

#matrix([[1, 1, 0, ..., 0, 0, 0],
#        [1, 1, 0, ..., 0, 0, 0],
#        [1, 1, 0, ..., 0, 0, 0],
#        ..., 
#        [1, 0, 1, ..., 0, 0, 1],
#        [1, 0, 1, ..., 0, 0, 1],
#        [1, 0, 1, ..., 0, 0, 1]], dtype=int64)

第一行为一例，依此类推。例如，这是第一行 [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]，这意味着第一个实例通过节点编号 1 和 2，并且从不通过其他节点。

Answer 2

或者如果您需要更好地控制每个样本的决策路径，您也可以执行以下操作：

decision_paths = fitted_tree.decision_path(xtrain)
decision_path_list = list(decision_paths.toarray())
for path in decision_path_list:
    *#Analyse different paths here*

在 Python 的决策树中使用 decision_path

using decision_path in Decision Tree in Python

python

classification

decision-tree