DecisiontreeClassifier，为什么值的总和不对？

Question

我可视化了我的决策树分类器，我注意到，样本总和是错误的或公式不同 'value' 值与样本值不符（屏幕截图）？我是否误解了我的决策树？我想如果我的节点中有 100 个样本，其中 40 个是 True，60 个是 False，我在我的下一个节点中得到 40（或 60）个样本，这些样本再次被划分...

import matplotlib.pyplot as plt
from sklearn import tree
tree1=DecisionTreeClassifier(criterion="entropy",max_features=13,max_leaf_nodes=75,min_impurity_decrease=0.001,min_samples_leaf=12,min_samples_split=20,splitter="best",max_depth=9)

tree1.fit(X_train,y_train)
feature_names=Daten.drop("Abwanderung_LabelEncode",axis=1).columns
class_names=["Keine Abwanderung","Abwanderung"]
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(tree1, 
               feature_names=feature_names,
               class_names=class_names,
               rounded=True,
               filled=True)

Answer 1

情节正确。

value中的两个值不是去children个节点的样本数；相反，它们是节点中的负计数和正计数 class。例如，748=101+647；该节点中有 748 个样本，其中 647 个是正样本 class。 child个节点有685和63个样本，685+63=647。左边child有47个负样本，右边节点有54个，47+54=101，负样本总数

Answer 2

value 字段不表示拆分的大小，而是表示每个 class 有多少个数据点。例如，顶部节点 voicemail_tarif_labelencode <= 0.5 处的拆分有 748 个样本，其中 101 个属于索引 0 处的 class，647 个属于索引 1 处的 class。它不显示 <= 0.5 和那些 > 0.5 的数据点数。如果您现在查看接下来的两个节点，则这些样本大小的总和 (685 + 63) = 748 这是父节点中的样本数。

DecisiontreeClassifier，为什么值的总和不对？

DecisiontreeClassifier, why is the sum of values wrong?

python

matplotlib

decision-tree

scikit-learn