当数据非线性时，我通过 python scikit-learn 使用 SVR 进行线性回归

Question

train.sort_values(by=['mass'], ascending=True, inplace=True)
x = train['mass']
y = train['pa']

# Fit regression model
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_lin = SVR(kernel='linear', C=1e3)
svr_poly = SVR(kernel='poly', C=1e3, degree=2)
x_train = x.reshape(x.shape[0], 1)
x = x_train
y_rbf = svr_rbf.fit(x, y).predict(x)
y_lin = svr_lin.fit(x, y).predict(x)
y_poly = svr_poly.fit(x, y).predict(x)

# look at the results
plt.scatter(x, y, c='k', label='data')
plt.hold('on')
plt.plot(x, y_rbf, c='g', label='RBF model')
plt.plot(x, y_lin, c='r', label='Linear model')
plt.plot(x, y_poly, c='b', label='Polynomial model')
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.show()

代码复制自http://scikit-learn.org/stable/auto_examples/svm/plot_svm_regression.html。而我改变的只是数据集。不知道怎么回事

Answer 1

很可能与您的数据规模有关。您使用的是与示例中相同的惩罚超参数，但您的 y 值要大几个数量级。因此，SVR 算法将更倾向于简单而不是准确性，因为与您的 y 值相比，您对错误的惩罚现在很小。您需要增加 C 以表示 1e6（或标准化您的 y 值）。

你可以看到，如果你在他们的示例代码中将 C 做得非常小，就会出现这种情况，比如说 C=.00001。然后您将获得与您在代码中获得的相同类型的结果。

（更多关于算法 here。）

附带说明一下，机器学习实践的很大一部分是超参数调整。这是一个很好的例子，说明如果提供了错误的超参数，即使是一个好的基础模型也会产生糟糕的结果。

当数据非线性时，我通过 python scikit-learn 使用 SVR 进行线性回归

I get a linear regression using the SVR by python scikit-learn when the data is not linear

python

machine-learning

svm

scikit-learn