Python - 关于值格式的 scikit 学习随机森林错误
Python - Error with scikit learn Random Forest about values format
当我执行命令时:
clf.fit(train_data, train_label)
我收到以下错误
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
问题是大小为 (18000,20) 的数组 train_data
。我试过使用这个命令:
clf.fit(np.float32(train_data), train_label)
或
train_data = np.array([s[0].astype('float32') for s in train_data])
在train文件(python)中找到数据集train_data和train_label如下link:
https://www.dropbox.com/s/b3017gi18x6x325/train?dl=0
但是,我无法得到数组 "train_data" 中的所有值都对 clf.fit
函数有效。有帮助吗?
刚刚找到解决此错误的方法。您需要缩放数据:
代码:
from sklearn.ensemble import RandomForestClassifier
import pickle
import numpy as np
from sklearn.preprocessing import scale
with open('train', 'rb') as f:
train_data, train_label = pickle.load(f)
#some diagnostic to see if there are NaNs. No NaN were found !
print(np.isnan(train_data))
print(np.where(np.isnan(train_data)))
print(np.nan_to_num(train_data))
print(np.isnan(train_label))
print(np.where(np.isnan(train_label)))
#so need to scale
train_data = scale(train_data)
clf = RandomForestClassifier()
clf.fit(train_data, train_label)
当我执行命令时:
clf.fit(train_data, train_label)
我收到以下错误
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
问题是大小为 (18000,20) 的数组 train_data
。我试过使用这个命令:
clf.fit(np.float32(train_data), train_label)
或
train_data = np.array([s[0].astype('float32') for s in train_data])
在train文件(python)中找到数据集train_data和train_label如下link:
https://www.dropbox.com/s/b3017gi18x6x325/train?dl=0
但是,我无法得到数组 "train_data" 中的所有值都对 clf.fit
函数有效。有帮助吗?
刚刚找到解决此错误的方法。您需要缩放数据:
代码:
from sklearn.ensemble import RandomForestClassifier
import pickle
import numpy as np
from sklearn.preprocessing import scale
with open('train', 'rb') as f:
train_data, train_label = pickle.load(f)
#some diagnostic to see if there are NaNs. No NaN were found !
print(np.isnan(train_data))
print(np.where(np.isnan(train_data)))
print(np.nan_to_num(train_data))
print(np.isnan(train_label))
print(np.where(np.isnan(train_label)))
#so need to scale
train_data = scale(train_data)
clf = RandomForestClassifier()
clf.fit(train_data, train_label)