Python Scikit-Learn DecisionTreeClassifier.fit() throws KeyError: 'default'
Python Scikit-Learn DecisionTreeClassifier.fit() throws KeyError: 'default'
我有一个小数据集,正在尝试使用 sklearn 创建决策树分类器。我使用 sklearn.tree.DecisionTreeClassifier 作为模型并使用其 .fit() 函数来拟合数据。四处搜索,我找不到其他 运行 遇到同样问题的人。
将数据加载到一个数组并将标签加载到另一个数组后,打印出这两个数组(数据和标签)得到:
[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0.]
[1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 0. 1. 0. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 1. 1.]
[0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1. 0. 1. 0. 0.]
[0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1.]
[1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0.]
[1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0.]]
['Alcaligenes_faecalis' 'Bacillus_circulans' 'Bacillus_megaterium'
'Bacillus_sphaericus' 'Citrobacter_freundii' 'Enterobacter_aerogenes'
'Escherichia_coli' 'Micrococcus_luteus' 'Proteus_mirabilis'
'Salmonella_arizonae' 'Serratia_marcescens' 'Staphylococcus_epidermidis'
'Staphylococcus_saprophyticus']
我定义了一个函数来进行拟合,我尝试删除该函数并直接 运行ning .fit() 函数:
def decisiontree(data, labels, criterion = "gini", splitter = "default", max_depth = None): #expects *2d data and 1d labels
model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
model = model.fit(data,labels)
return model
然后我调用了函数:
model = decisiontree(data, labels)
此时,出现KeyError:
KeyError Traceback (most recent call last)
<ipython-input-21-3574397ccfb6> in <module>
----> 1 model = decisiontree(data, labels)
<ipython-input-18-e85883291477> in decisiontree(data, labels, criterion, splitter, max_depth)
2
3 model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
----> 4 model = model.fit(data,labels)
5
6 return model
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
875 sample_weight=sample_weight,
876 check_input=check_input,
--> 877 X_idx_sorted=X_idx_sorted)
878 return self
879
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
333 splitter = self.splitter
334 if not isinstance(self.splitter, Splitter):
--> 335 splitter = SPLITTERS[self.splitter](criterion,
336 self.max_features_,
337 min_samples_leaf,
KeyError: 'default'
数据存储在data.csv:
Alcaligenes_faecalis,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0
Bacillus_circulans,1,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0
Bacillus_megaterium,1,1,1,0,1,0,1,0,0,0,0,0,0,0,0,1
Bacillus_sphaericus,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0
Citrobacter_freundii,0,1,1,1,1,0,1,1,1,0,1,1,0,1,0,0
Enterobacter_aerogenes,0,1,1,1,1,1,1,1,0,0,1,0,0,1,1,0
Escherichia_coli,0,1,1,1,1,1,1,1,1,0,1,0,1,0,0,0
Micrococcus_luteus,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Proteus_mirabilis,0,1,1,1,0,0,0,0,0,1,1,1,0,1,1,1
Salmonella_arizonae,0,1,1,1,0,0,0,0,1,0,1,1,0,1,0,0
Serratia_marcescens,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1
Staphylococcus_epidermidis,1,0,1,0,1,0,1,0,1,0,1,0,0,0,0,0
Staphylococcus_saprophyticus,1,0,1,0,1,0,1,0,1,1,0,0,0,0,0,0
根据 the docs,splitter
必须是 "best" 或 "random"。
sklearn.tree.DecisionTreeClassifier 拆分器参数没有 default
值,默认值为 best
,因此您可以使用:
def decisiontree(data, labels, criterion = "gini", splitter = "best", max_depth = None): #expects *2d data and 1d labels
model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
model = model.fit(data,labels)
return model
我有一个小数据集,正在尝试使用 sklearn 创建决策树分类器。我使用 sklearn.tree.DecisionTreeClassifier 作为模型并使用其 .fit() 函数来拟合数据。四处搜索,我找不到其他 运行 遇到同样问题的人。
将数据加载到一个数组并将标签加载到另一个数组后,打印出这两个数组(数据和标签)得到:
[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0.]
[1. 1. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 0. 1. 0. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0.]
[0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 1. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 1. 1.]
[0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1. 0. 1. 0. 0.]
[0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1.]
[1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0.]
[1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 0.]]
['Alcaligenes_faecalis' 'Bacillus_circulans' 'Bacillus_megaterium'
'Bacillus_sphaericus' 'Citrobacter_freundii' 'Enterobacter_aerogenes'
'Escherichia_coli' 'Micrococcus_luteus' 'Proteus_mirabilis'
'Salmonella_arizonae' 'Serratia_marcescens' 'Staphylococcus_epidermidis'
'Staphylococcus_saprophyticus']
我定义了一个函数来进行拟合,我尝试删除该函数并直接 运行ning .fit() 函数:
def decisiontree(data, labels, criterion = "gini", splitter = "default", max_depth = None): #expects *2d data and 1d labels
model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
model = model.fit(data,labels)
return model
然后我调用了函数:
model = decisiontree(data, labels)
此时,出现KeyError:
KeyError Traceback (most recent call last)
<ipython-input-21-3574397ccfb6> in <module>
----> 1 model = decisiontree(data, labels)
<ipython-input-18-e85883291477> in decisiontree(data, labels, criterion, splitter, max_depth)
2
3 model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
----> 4 model = model.fit(data,labels)
5
6 return model
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
875 sample_weight=sample_weight,
876 check_input=check_input,
--> 877 X_idx_sorted=X_idx_sorted)
878 return self
879
~/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
333 splitter = self.splitter
334 if not isinstance(self.splitter, Splitter):
--> 335 splitter = SPLITTERS[self.splitter](criterion,
336 self.max_features_,
337 min_samples_leaf,
KeyError: 'default'
数据存储在data.csv:
Alcaligenes_faecalis,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0
Bacillus_circulans,1,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0
Bacillus_megaterium,1,1,1,0,1,0,1,0,0,0,0,0,0,0,0,1
Bacillus_sphaericus,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0
Citrobacter_freundii,0,1,1,1,1,0,1,1,1,0,1,1,0,1,0,0
Enterobacter_aerogenes,0,1,1,1,1,1,1,1,0,0,1,0,0,1,1,0
Escherichia_coli,0,1,1,1,1,1,1,1,1,0,1,0,1,0,0,0
Micrococcus_luteus,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Proteus_mirabilis,0,1,1,1,0,0,0,0,0,1,1,1,0,1,1,1
Salmonella_arizonae,0,1,1,1,0,0,0,0,1,0,1,1,0,1,0,0
Serratia_marcescens,0,1,1,0,0,0,1,0,0,0,1,0,0,1,0,1
Staphylococcus_epidermidis,1,0,1,0,1,0,1,0,1,0,1,0,0,0,0,0
Staphylococcus_saprophyticus,1,0,1,0,1,0,1,0,1,1,0,0,0,0,0,0
根据 the docs,splitter
必须是 "best" 或 "random"。
sklearn.tree.DecisionTreeClassifier 拆分器参数没有 default
值,默认值为 best
,因此您可以使用:
def decisiontree(data, labels, criterion = "gini", splitter = "best", max_depth = None): #expects *2d data and 1d labels
model = sklearn.tree.DecisionTreeClassifier(criterion = criterion, splitter = splitter, max_depth = max_depth)
model = model.fit(data,labels)
return model