Python - 关于值格式的 scikit 学习随机森林错误

Python - Error with scikit learn Random Forest about values format

当我执行命令时:

clf.fit(train_data, train_label)

我收到以下错误

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

问题是大小为 (18000,20) 的数组 train_data。我试过使用这个命令:

clf.fit(np.float32(train_data), train_label)

train_data = np.array([s[0].astype('float32') for s in train_data])

在train文件(python)中找到数据集train_data和train_label如下link:

https://www.dropbox.com/s/b3017gi18x6x325/train?dl=0

但是,我无法得到数组 "train_data" 中的所有值都对 clf.fit 函数有效。有帮助吗?

刚刚找到解决此错误的方法。您需要缩放数据:

代码:

from sklearn.ensemble import RandomForestClassifier
import pickle
import numpy as np
from sklearn.preprocessing import scale

with open('train', 'rb') as f: 
    train_data, train_label = pickle.load(f)

#some diagnostic to see if there are NaNs. No NaN were found !
print(np.isnan(train_data))
print(np.where(np.isnan(train_data)))
print(np.nan_to_num(train_data))
print(np.isnan(train_label))
print(np.where(np.isnan(train_label)))

#so need to scale
train_data = scale(train_data)

clf = RandomForestClassifier()
clf.fit(train_data, train_label)