RandomeForestRegressor - 不可散列类型：'Int64Index' 错误

Question

我正在拟合模型以使用 three column dataset 的 Python 中的 RandomForestRegressor 来预测真值（单击 link 下载完整的 CSV-数据集格式如下

t_stamp,X,Y
0.000543,0,10
0.000575,0,10
0.041324,1,10
0.041331,2,10
0.041336,3,10
0.04134,4,10
0.041345,5,10
0.04135,6,10
0.041354,7,10

这是我们进行预测的方法。

import pandas as pd
import numpy as np
import glob, os
from io import StringIO
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import accuracy_score
import math
from math import sqrt
from sklearn.cross_validation import train_test_split

df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "data.csv"))))

for i in range(1,10):
    df['X_t'+str(i)] = df['X'].shift(i)

print(df)

df.dropna(inplace=True)

X = pd.DataFrame({ 'X_%d'%i : df['X'].shift(i) for i in range(10)}).apply(np.nan_to_num, axis=0).values
y = df['Y'].values

train_index, test_index = train_test_split(df.index, test_size=0.40)

X_train = df.X[[train_index]]
y_train = df.Y[[train_index]]
X_test = df.X[[test_index]]
y_test = df.Y[[test_index]]


#X_train = df.X[train_index].values
#y_train = df.Y[train_index].values
#X_train = df.X[test_index].values
#y_test = df.Y[test_index].values


#X = X[:, None]
#y = df['Y'][:, None]
print(X.shape)
print(df['Y'].shape)

print()
print("Size of X_train:",(len(X_train)))
print("Size of Y_train:",(len(X_train)))
print("Size of X_test:",(len(X_test)))
print("Size of Y_test:",(len(y_test)))

print()
reg = RandomForestRegressor(criterion='mse')
reg.fit(X_train,y_train)

但是，当我这样做时 reg.fit(X_train,y_train) - 我收到此错误

    raise TypeError("unhashable type: %r" % type(self).__name__)

TypeError: unhashable type: 'Int64Index'

我们如何解决这个问题？

提前致谢。

Answer 1

我认为这里的问题是您的火车测试拆分不正确。您可以获得 train test split 以输出 4 个新对象，X_train、X_test、y_train 和 y_test，就像您在代码中拆分后尝试应用一样。我会像这样设置您的数据：

X = df.drop('Y', axis=1)
y = df['Y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)

然后您就可以将这些应用到您的模型中了。

RandomeForestRegressor - 不可散列类型：'Int64Index' 错误

RandomeForestRegressor - unhashable type: 'Int64Index' error

python

numpy

python-3.x

pandas

random-forest