ERROR : While using splitting the dataset for providing to machine learning algorithms
ERROR : While using splitting the dataset for providing to machine learning algorithms
这很奇怪,因为这个错误只发生在特定的 3 列上,并且与其他列一起工作。
错误:
Traceback (most recent call last):
File "/version1/analyze.py", line 447, in <module>
cv_results = model_selection.cross_val_score(model, X_train, Y_train,cv=kfold, scoring=scoring)
File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
for train, test in cv_iter)
fac = 1. / (n_samples - n_classes)
ZeroDivisionError: float division by zero
我的代码:
validation_size = 0.20
seed = 10
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
seed = 10
scoring = 'accuracy'
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(model, X_train, Y_train,cv=kfold, scoring=scoring) #error occurs here
仅当我 select 我的数据集中的几个特定列时才会发生上述错误,它适用于其他列!
所有列都具有相同的数据编号和相似类型的值。
#code for Split-out validation dataset
array = dataset.values
if field == "rh":
X = array[:,0:8]
elif field == "rm":
X = array[:,0:8]
elif field == "wh":
X = array[:,0:8]
elif field == "wm":
X = array[:,0:8]
else :
print"wrong field"
if field == "rh":
Y = array[:,0] #works fine , even for 4,5,6,7 it works
elif field == "rm": #gives the above error only for 1,2,3
Y = array[:,1] #gives the above error
elif field == "wh": #gives the above error
Y = array[:,2]
elif field == "wm": #gives the above error
Y = array[:,3]
else :
print"wrong field"
这是我的数据集:
index,1column,2 column,3column,….,8column
0,238,240,1103,409,1038,4,67,0
1,41,359,995,467,1317,8,71,0
2,102,616,1168,480,1206,7,59,0
3,0,34,994,181,1115,4,68,0
4,88,1419,1175,413,1060,8,71,0
5,826,10886,1316,6885,2086,263,119,0
6,88,472,1200,652,1047,7,64,0
7,0,322,957,533,1062,11,73,0
8,0,200,1170,421,1038,5,63,0
9,103,1439,1085,1638,1151,29,66,0
10,0,1422,1074,4832,1084,27,74,0
11,1828,754,11030,263845,1209,10,79,0
12,340,1644,11181,175099,4127,13,136,0
13,71,1018,1029,2480,1276,18,66,1
14,0,3077,1116,1696,1129,6,62,0
“”””””
‘”””””
共105条数据记录
但是对于1列,即Y = 1列时,不会出现上述错误,
但是当我选择任何其他列 2 、 3 或 4 时,会发生上述相同的错误。
n_samples 是数据集中的行,n_classes 是标签数组中唯一的 class 标签。上面的错误是因为数据集没有标签 classes in it !
这很奇怪,因为这个错误只发生在特定的 3 列上,并且与其他列一起工作。
错误:
Traceback (most recent call last):
File "/version1/analyze.py", line 447, in <module>
cv_results = model_selection.cross_val_score(model, X_train, Y_train,cv=kfold, scoring=scoring)
File "/usr/lib64/python2.7/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
for train, test in cv_iter)
fac = 1. / (n_samples - n_classes)
ZeroDivisionError: float division by zero
我的代码:
validation_size = 0.20
seed = 10
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)
seed = 10
scoring = 'accuracy'
kfold = model_selection.KFold(n_splits=10, random_state=seed)
cv_results = model_selection.cross_val_score(model, X_train, Y_train,cv=kfold, scoring=scoring) #error occurs here
仅当我 select 我的数据集中的几个特定列时才会发生上述错误,它适用于其他列!
所有列都具有相同的数据编号和相似类型的值。
#code for Split-out validation dataset
array = dataset.values
if field == "rh":
X = array[:,0:8]
elif field == "rm":
X = array[:,0:8]
elif field == "wh":
X = array[:,0:8]
elif field == "wm":
X = array[:,0:8]
else :
print"wrong field"
if field == "rh":
Y = array[:,0] #works fine , even for 4,5,6,7 it works
elif field == "rm": #gives the above error only for 1,2,3
Y = array[:,1] #gives the above error
elif field == "wh": #gives the above error
Y = array[:,2]
elif field == "wm": #gives the above error
Y = array[:,3]
else :
print"wrong field"
这是我的数据集:
index,1column,2 column,3column,….,8column
0,238,240,1103,409,1038,4,67,0
1,41,359,995,467,1317,8,71,0
2,102,616,1168,480,1206,7,59,0
3,0,34,994,181,1115,4,68,0
4,88,1419,1175,413,1060,8,71,0
5,826,10886,1316,6885,2086,263,119,0
6,88,472,1200,652,1047,7,64,0
7,0,322,957,533,1062,11,73,0
8,0,200,1170,421,1038,5,63,0
9,103,1439,1085,1638,1151,29,66,0
10,0,1422,1074,4832,1084,27,74,0
11,1828,754,11030,263845,1209,10,79,0
12,340,1644,11181,175099,4127,13,136,0
13,71,1018,1029,2480,1276,18,66,1
14,0,3077,1116,1696,1129,6,62,0
“”””””
‘”””””
共105条数据记录
但是对于1列,即Y = 1列时,不会出现上述错误, 但是当我选择任何其他列 2 、 3 或 4 时,会发生上述相同的错误。
n_samples 是数据集中的行,n_classes 是标签数组中唯一的 class 标签。上面的错误是因为数据集没有标签 classes in it !