Python 多元线性回归无法绘制
Python Multiple linear regression can't plot
我正在尝试 运行 多元线性回归,但我在绘制结果时遇到了问题。我正在尝试绘制我的 3D 图,我得到了这个输出 ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,) and requested shape (34,)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], Y)
ax.plot(X.iloc[:, 0], X.iloc[:, 1], y_pred, color='red')
ax.set_xlabel('Annual Income (k$)')
ax.set_ylabel('Age')
ax.set_zlabel('Spending Score')
plt.show()
已编辑:
编辑 2:
绘图命令应该是:
ax.plot(X_test.iloc[:, 0], X_test.iloc[:, 1], y_pred, color='red')
因为 y_pred
只包含子集 X_test
的 y 值,而不是整个输入 X
。
用连接线绘制 (ax.plot
) 没有意义,因为输入数据可能未按有意义的方式排序,即使输入数据已排序,测试集也绝对未排序。
我会这样画:
from sklearn.model_selection import train_test_split
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# generate some data as an example.
np.random.seed(1)
n = 20
X = pd.DataFrame(np.random.uniform(size=(n, 2)), columns=['foo', 'bar'])
Y = X['foo'] + 2*X['bar'] + np.random.normal(scale=0.2, size=n)
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['foo'], X['bar'], Y, label='data')
for x0, x1, yt, yp in zip(X_test['foo'], X_test['bar'], y_test, y_pred):
ax.plot([x0, x0], [x1, x1], [yt, yp], color='red')
ax.scatter(X_test['foo'], X_test['bar'], y_pred, color='red', marker='s', label='prediction')
ax.set_xlabel('X0')
ax.set_ylabel('X1')
ax.set_zlabel('y')
ax.legend()
fig.show()
还有其他方法可以进行可视化。您可以使用 np.meshgrid
在网格上生成 X
值并从预测器中获取 y
值并使用 plot_wireframe
绘制它并使用垂直线绘制火车和测试数据以指示它们与线框的垂直距离。这取决于有意义的数据。
我正在尝试 运行 多元线性回归,但我在绘制结果时遇到了问题。我正在尝试绘制我的 3D 图,我得到了这个输出 ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,) and requested shape (34,)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X.iloc[:, 0], X.iloc[:, 1], Y)
ax.plot(X.iloc[:, 0], X.iloc[:, 1], y_pred, color='red')
ax.set_xlabel('Annual Income (k$)')
ax.set_ylabel('Age')
ax.set_zlabel('Spending Score')
plt.show()
已编辑:
编辑 2:
绘图命令应该是:
ax.plot(X_test.iloc[:, 0], X_test.iloc[:, 1], y_pred, color='red')
因为 y_pred
只包含子集 X_test
的 y 值,而不是整个输入 X
。
用连接线绘制 (ax.plot
) 没有意义,因为输入数据可能未按有意义的方式排序,即使输入数据已排序,测试集也绝对未排序。
我会这样画:
from sklearn.model_selection import train_test_split
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# generate some data as an example.
np.random.seed(1)
n = 20
X = pd.DataFrame(np.random.uniform(size=(n, 2)), columns=['foo', 'bar'])
Y = X['foo'] + 2*X['bar'] + np.random.normal(scale=0.2, size=n)
X_train, X_test, y_train,y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X['foo'], X['bar'], Y, label='data')
for x0, x1, yt, yp in zip(X_test['foo'], X_test['bar'], y_test, y_pred):
ax.plot([x0, x0], [x1, x1], [yt, yp], color='red')
ax.scatter(X_test['foo'], X_test['bar'], y_pred, color='red', marker='s', label='prediction')
ax.set_xlabel('X0')
ax.set_ylabel('X1')
ax.set_zlabel('y')
ax.legend()
fig.show()
还有其他方法可以进行可视化。您可以使用 np.meshgrid
在网格上生成 X
值并从预测器中获取 y
值并使用 plot_wireframe
绘制它并使用垂直线绘制火车和测试数据以指示它们与线框的垂直距离。这取决于有意义的数据。