我在使用 dataframe(X.inverseTransform(data_out), columns=data_out.columns) 时遇到问题
I am having trouble working with dataframe(X.inverseTransform(data_out), columns=data_out.columns)
我一直在研究这个线性回归案例,我在验证我的工作时遇到了困难。为了验证我必须使用:
sns.regplot(x=X_2["pk"], y=y_2)
scaler_2 = StandardScaler()
scaler_2.fit(df)
# type(scaler_2)
X_2 = df.drop(['prijs'], axis=1)
# print(X_2.shape)
# type(X_2)
y_2 = df['prijs']
# print(y_2.shape)
# type(y_2)
#======================
test_data = 0.30
X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_2,y_2, test_size=test_data, random_state=12)
# print(f"formaat X_train_2 {X_train_2.shape}")
# print(f"formaat y_train_2 {y_train_2.shape}")
# print(f"formaat X_test_2 {X_test_2.shape}")
# print(f"formaat y_test_2 {y_test_2.shape}")
# X_train_2 = None
# X_test_2 = None
# y_train_2 = None
# y_test_2 = None
model_2 = LinearRegression()
X_train_simpel = X_train_2[['pk']]
X_test_simpel = X_test_2[['pk']]
fit_2 = model_2.fit(X_train_simpel, y_train_2)
uitkomst_2 = fit_2.predict(X_train_simpel)
uitkomst_3 = fit_2.predict(X_test_simpel)
data_out = X_train_2
data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns)
data_out['groep'] = uitkomst_2
data_out.head(5)
但是 运行 最后两行代码时出现此错误:
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) Input In [136], in
1 #haal de originele ongeschaalde waardes terug
----> 2 data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns)
3 data_out['groep'] = uitkomst_2
4 data_out.head(5)
File
C:\Python310\lib\site-packages\sklearn\preprocessing_data.py:1035, in
StandardScaler.inverse_transform(self, X, copy) 1033 else: 1034
if self.with_std:
-> 1035 X *= self.scale_ 1036 if self.with_mean: 1037 X += self.mean_
ValueError: operands could not be broadcast together with shapes
(11484,7) (8,) (11484,7)
'scaler_2' 适合所有列,但 'scaler_2.inverse_transform(data_out)' 想要转换具有较少列的数据框
我的意思是 'prijs' 列在 'scaler_2' 适合后删除,稍后会在 'scaler_2.inverse_transform(data_out)' 处产生错误,因此您必须先删除 'prijs' 列并将数据适合scaler_2
以下代码可以解决您的问题:
...
scaler_2 = StandardScaler()
X_2 = df.drop(['prijs'], axis=1)
scaler_2.fit(X_2 )
...
我一直在研究这个线性回归案例,我在验证我的工作时遇到了困难。为了验证我必须使用:
sns.regplot(x=X_2["pk"], y=y_2)
scaler_2 = StandardScaler()
scaler_2.fit(df)
# type(scaler_2)
X_2 = df.drop(['prijs'], axis=1)
# print(X_2.shape)
# type(X_2)
y_2 = df['prijs']
# print(y_2.shape)
# type(y_2)
#======================
test_data = 0.30
X_train_2, X_test_2, y_train_2, y_test_2 = train_test_split(X_2,y_2, test_size=test_data, random_state=12)
# print(f"formaat X_train_2 {X_train_2.shape}")
# print(f"formaat y_train_2 {y_train_2.shape}")
# print(f"formaat X_test_2 {X_test_2.shape}")
# print(f"formaat y_test_2 {y_test_2.shape}")
# X_train_2 = None
# X_test_2 = None
# y_train_2 = None
# y_test_2 = None
model_2 = LinearRegression()
X_train_simpel = X_train_2[['pk']]
X_test_simpel = X_test_2[['pk']]
fit_2 = model_2.fit(X_train_simpel, y_train_2)
uitkomst_2 = fit_2.predict(X_train_simpel)
uitkomst_3 = fit_2.predict(X_test_simpel)
data_out = X_train_2
data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns)
data_out['groep'] = uitkomst_2
data_out.head(5)
但是 运行 最后两行代码时出现此错误:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [136], in 1 #haal de originele ongeschaalde waardes terug ----> 2 data_out = pd.DataFrame(scaler_2.inverse_transform(data_out),columns=data_out.columns) 3 data_out['groep'] = uitkomst_2 4 data_out.head(5)
File C:\Python310\lib\site-packages\sklearn\preprocessing_data.py:1035, in StandardScaler.inverse_transform(self, X, copy) 1033 else: 1034 if self.with_std: -> 1035 X *= self.scale_ 1036 if self.with_mean: 1037 X += self.mean_
ValueError: operands could not be broadcast together with shapes (11484,7) (8,) (11484,7)
'scaler_2' 适合所有列,但 'scaler_2.inverse_transform(data_out)' 想要转换具有较少列的数据框
我的意思是 'prijs' 列在 'scaler_2' 适合后删除,稍后会在 'scaler_2.inverse_transform(data_out)' 处产生错误,因此您必须先删除 'prijs' 列并将数据适合scaler_2
以下代码可以解决您的问题:
...
scaler_2 = StandardScaler()
X_2 = df.drop(['prijs'], axis=1)
scaler_2.fit(X_2 )
...