我在使用 dataframe(X.inverseTransform(data_out), columns=data_out.columns) 时遇到问题

I am having trouble working with dataframe(X.inverseTransform(data_out), columns=data_out.columns)

我一直在研究这个线性回归案例,我在验证我的工作时遇到了困难。为了验证我必须使用:

sns.regplot(x=X_2["pk"], y=y_2)

scaler_2 = StandardScaler()
scaler_2.fit(df)
# type(scaler_2)

X_2 = df.drop(['prijs'], axis=1)
# print(X_2.shape)
# type(X_2)

y_2 = df['prijs']
# print(y_2.shape)
# type(y_2)

#======================
test_data = 0.30
X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_2,y_2, test_size=test_data, random_state=12)
# print(f"formaat X_train_2 {X_train_2.shape}")
# print(f"formaat y_train_2 {y_train_2.shape}")
# print(f"formaat X_test_2  {X_test_2.shape}")
# print(f"formaat y_test_2  {y_test_2.shape}")

# X_train_2 = None
# X_test_2 = None
# y_train_2 = None
# y_test_2 = None
model_2 = LinearRegression()
X_train_simpel = X_train_2[['pk']]
X_test_simpel = X_test_2[['pk']]
fit_2 = model_2.fit(X_train_simpel, y_train_2)
uitkomst_2 = fit_2.predict(X_train_simpel)
uitkomst_3 = fit_2.predict(X_test_simpel)

data_out = X_train_2
data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns)
data_out['groep'] = uitkomst_2

data_out.head(5)

但是 运行 最后两行代码时出现此错误:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [136], in 1 #haal de originele ongeschaalde waardes terug ----> 2 data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns) 3 data_out['groep'] = uitkomst_2 4 data_out.head(5)

File C:\Python310\lib\site-packages\sklearn\preprocessing_data.py:1035, in StandardScaler.inverse_transform(self, X, copy) 1033 else: 1034 if self.with_std: -> 1035 X *= self.scale_ 1036 if self.with_mean: 1037 X += self.mean_

ValueError: operands could not be broadcast together with shapes (11484,7) (8,) (11484,7)

'scaler_2' 适合所有列,但 'scaler_2.inverse_transform(data_out)' 想要转换具有较少列的数据框

我的意思是 'prijs' 列在 'scaler_2' 适合后删除,稍后会在 'scaler_2.inverse_transform(data_out)' 处产生错误,因此您必须先删除 'prijs' 列并将数据适合scaler_2

以下代码可以解决您的问题:

...
scaler_2 = StandardScaler()
X_2 = df.drop(['prijs'], axis=1)
scaler_2.fit(X_2 )
...