sklearn Imputer() 返回的特征不适合拟合函数
sklearn Imputer() returned features does not fit in fit function
我有一个包含缺失值 NaN 的特征矩阵,因此我需要先初始化这些缺失值。但是,最后一行抱怨并抛出以下错误行:
Expected sequence or array-like, got Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)
。
查了一下,好像是train_fea_imputed不是np.array格式,而是sklearn.preprocessing.imputation.Imputer格式。我应该如何解决这个问题?
顺便说一句,如果我使用 train_fea_imputed = imp.fit_transform(train_fea),代码工作正常,但是 train_fea_imputed return一维小于 train_fea
的数组
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
train_fea_imputed = imp.fit(train_fea)
# train_fea_imputed = imp.fit_transform(train_fea)
rf = RandomForestClassifier(n_estimators=5000,n_jobs=1, min_samples_leaf = 3)
rf.fit(train_fea_imputed, train_label)
更新:我改为
imp = Imputer(missing_values='NaN', strategy='mean', axis=1)
现在尺寸问题没有出现了。我认为插补函数存在一些固有问题。我做完项目就回来
对于 scikit-learn
,初始化模型、训练模型和获得预测是不同的步骤。在你的情况下你有:
train_fea = np.array([[1,1,0],[0,0,1],[1,np.nan,0]])
train_fea
array([[ 1., 1., 0.],
[ 0., 0., 1.],
[ 1., nan, 0.]])
#initialise the model
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
#train the model
imp.fit(train_fea)
#get the predictions
train_fea_imputed = imp.transform(train_fea)
train_fea_imputed
array([[ 1. , 1. , 0. ],
[ 0. , 0. , 1. ],
[ 1. , 0.5, 0. ]])
我认为在这种情况下 axis = 1 是不正确的,因为您想对特征 vector/column(轴 = 0)而不是行(轴 = 1)的值取平均值。
我有一个包含缺失值 NaN 的特征矩阵,因此我需要先初始化这些缺失值。但是,最后一行抱怨并抛出以下错误行:
Expected sequence or array-like, got Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)
。
查了一下,好像是train_fea_imputed不是np.array格式,而是sklearn.preprocessing.imputation.Imputer格式。我应该如何解决这个问题?
顺便说一句,如果我使用 train_fea_imputed = imp.fit_transform(train_fea),代码工作正常,但是 train_fea_imputed return一维小于 train_fea
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
train_fea_imputed = imp.fit(train_fea)
# train_fea_imputed = imp.fit_transform(train_fea)
rf = RandomForestClassifier(n_estimators=5000,n_jobs=1, min_samples_leaf = 3)
rf.fit(train_fea_imputed, train_label)
更新:我改为
imp = Imputer(missing_values='NaN', strategy='mean', axis=1)
现在尺寸问题没有出现了。我认为插补函数存在一些固有问题。我做完项目就回来
对于 scikit-learn
,初始化模型、训练模型和获得预测是不同的步骤。在你的情况下你有:
train_fea = np.array([[1,1,0],[0,0,1],[1,np.nan,0]])
train_fea
array([[ 1., 1., 0.],
[ 0., 0., 1.],
[ 1., nan, 0.]])
#initialise the model
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
#train the model
imp.fit(train_fea)
#get the predictions
train_fea_imputed = imp.transform(train_fea)
train_fea_imputed
array([[ 1. , 1. , 0. ],
[ 0. , 0. , 1. ],
[ 1. , 0.5, 0. ]])
我认为在这种情况下 axis = 1 是不正确的,因为您想对特征 vector/column(轴 = 0)而不是行(轴 = 1)的值取平均值。