Python Scikit-Learn DecisionTreeClassifier.fit() throws KeyError: 'default'

Python Scikit-Learn DecisionTreeClassifier.fit() throws KeyError: 'default'

我有一个小数据集,正在尝试使用 sklearn 创建决策树分类器。我使用 sklearn.tree.DecisionTreeClassifier 作为模型并使用其 .fit() 函数来拟合数据。四处搜索,我找不到其他 运行 遇到同样问题的人。

将数据加载到一个数组并将标签加载到另一个数组后,打印出这两个数组(数据和标签)得到:

[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0.]
 [1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 0. 1. 0. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0.]
 [0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 1. 1.]
 [0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1. 0. 1. 0. 0.]
 [0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1.]
 [1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0.]
 [1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0.]]

['Alcaligenes_faecalis' 'Bacillus_circulans' 'Bacillus_megaterium'
 'Bacillus_sphaericus' 'Citrobacter_freundii' 'Enterobacter_aerogenes'
 'Escherichia_coli' 'Micrococcus_luteus' 'Proteus_mirabilis'
 'Salmonella_arizonae' 'Serratia_marcescens' 'Staphylococcus_epidermidis'
 'Staphylococcus_saprophyticus']

我定义了一个函数来进行拟合,我尝试删除该函数并直接 运行ning .fit() 函数:

def decisiontree(data, labels, criterion = "gini", splitter = "default", max_depth = None): #expects *2d data and 1d labels

    model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
    model = model.fit(data,labels)

    return model

然后我调用了函数:

model = decisiontree(data, labels)

此时,出现KeyError:

KeyError                                  Traceback (most recent call last)
<ipython-input-21-3574397ccfb6> in <module>
----> 1 model = decisiontree(data, labels)

<ipython-input-18-e85883291477> in decisiontree(data, labels, criterion, splitter, max_depth)
      2 
      3     model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
----> 4     model = model.fit(data,labels)
      5 
      6     return model

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    875             sample_weight=sample_weight,
    876             check_input=check_input,
--> 877             X_idx_sorted=X_idx_sorted)
    878         return self
    879 

~/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    333         splitter = self.splitter
    334         if not isinstance(self.splitter, Splitter):
--> 335             splitter = SPLITTERS[self.splitter](criterion,
    336                                                 self.max_features_,
    337                                                 min_samples_leaf,

KeyError: 'default'

数据存储在data.csv:

Alcaligenes_faecalis,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0
Bacillus_circulans,1,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0
Bacillus_megaterium,1,1,1,0,1,0,1,0,0,0,0,0,0,0,0,1
Bacillus_sphaericus,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0
Citrobacter_freundii,0,1,1,1,1,0,1,1,1,0,1,1,0,1,0,0
Enterobacter_aerogenes,0,1,1,1,1,1,1,1,0,0,1,0,0,1,1,0
Escherichia_coli,0,1,1,1,1,1,1,1,1,0,1,0,1,0,0,0
Micrococcus_luteus,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Proteus_mirabilis,0,1,1,1,0,0,0,0,0,1,1,1,0,1,1,1
Salmonella_arizonae,0,1,1,1,0,0,0,0,1,0,1,1,0,1,0,0
Serratia_marcescens,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1
Staphylococcus_epidermidis,1,0,1,0,1,0,1,0,1,0,1,0,0,0,0,0
Staphylococcus_saprophyticus,1,0,1,0,1,0,1,0,1,1,0,0,0,0,0,0

根据 the docssplitter 必须是 "best" 或 "random"。

sklearn.tree.DecisionTreeClassifier 拆分器参数没有 default 值,默认值为 best,因此您可以使用:

def decisiontree(data, labels, criterion = "gini", splitter = "best", max_depth = None): #expects *2d data and 1d labels

    model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
    model = model.fit(data,labels)

    return model