R：`xy.coords(x, y) 中的错误：'x' 和 'y' 长度不同`

Question

我正在使用 R 编程语言。我正在尝试按照本教程中的说明创建回归模型并绘制结果 (https://rdrr.io/cran/kernlab/man/gausspr.html):

#load library
library(kernlab)

# create regression data
x <- seq(-20,20,0.1)
y <- sin(x)/x + rnorm(401,sd=0.03)


# regression with gaussian processes
foo <- gausspr(x, y)
foo

# predict and plot
ytest <- predict(foo, x)
plot(x, y, type ="l")
lines(x, ytest, col="red")


#predict and variance
x = c(-4, -3, -2, -1,  0, 0.5, 1, 2)
y = c(-2,  0,  -0.5,1,  2, 1, 0, -1)

plot(x,y)

foo2 <- gausspr(x, y, variance.model = TRUE)

xtest <- seq(-4,2,0.2)

lines(xtest, predict(foo2, xtest))
lines(xtest,
      predict(foo2, xtest)+2*predict(foo2,xtest, type="sdeviation"),
      col="red")
lines(xtest,
      predict(foo2, xtest)-2*predict(foo2,xtest, type="sdeviation"),
      col="red")

这工作正常，但上面的代码是针对只有两个变量的回归问题。我正在尝试将此代码扩展为具有三个变量的回归问题。下面，我尝试为三个变量重新创建上面的代码（x，y，z：响应变量是 z，预测变量是 x 和 y）：

# create regression data for new problem
x <- seq(-20,20,0.1)
y <- sin(x)/x + rnorm(401,sd=0.03)
z <- sin(x)/x + rnorm(401,sd=0.01)

#put into data frame
my_data = data.frame(x,y,z)

# regression with gaussian processes 
foo <- gausspr(z ~., data = my_data)
foo

# predict and plot (this is where the error is)
ytest <- predict(foo, c(x,y))

#plot
plot(x, y, type ="l")
lines(x, ytest, col="red")

这会产生以下错误：Error in xy.coords(x, y) : 'x' and 'y' lengths differ

是否有另一种方法可以指定您希望使用“x”和“y”变量进行预测？我想在 R 中，您可以对这样的实例使用 c 命令吗？

ytest <- predict(foo, c(x,y))

这使我无法继续前进并在高斯过程 (foo2) 与 xtest 和 ytest 之间制作两个单独的图表，其中显示了置信区间：

foo2 <- gausspr(z ~., data = my_data, variance.model = TRUE)

xtest <- seq(-4,2,0.2)
ytest <- seq(-4,2,0.2)

#first plot
lines(xtest, predict(foo2, xtest))

lines(xtest,
      predict(foo2, xtest)+2*predict(foo2,xtest, type="sdeviation"),
      col="red")

lines(xtest,
      predict(foo2, xtest)-2*predict(foo2,xtest, type="sdeviation"),
      col="red")


#second plot
lines(ytest, predict(foo2, ytest))

lines(ytest,
      predict(foo2, ytest)+2*predict(foo2,ytest, type="sdeviation"),
      col="red")

lines(ytest,
      predict(foo2, ytest)-2*predict(foo2,ytest, type="sdeviation"),
      col="red")

有人可以告诉我我做错了什么吗？

谢谢

Answer 1

您的代码中有几处需要考虑；有 NaN 个值导致不同的矢量长度，并且您在 predict 中错误地传递了 newdata。

使用您的数据和模型：

library(kernlab)
x <- seq(-20,20,0.1)
y <- sin(x)/x + rnorm(401,sd=0.03)
z <- sin(x)/x + rnorm(401,sd=0.01)
my_data <- data.frame(x,y,z)
foo <- gausspr(z ~., data = my_data)

请注意，在此阶段 400 个数据点被 gausspr 使用，而不是 401。

foo
... Number of training instances learned : 400

这是因为 y 和 z 的 NaN 值会自动删除。由于 sin(x)/x 项是 0/0，因此当 x = 0（参见运行 y[x==0] 和 z[x==0]）时，它们是 NaN。因此，这暗示了不同数量的观察结果可能来自何处。

接下来您使用 predict 不正确。来自 ?predict.gausspr 的新数据应该是

a data frame or matrix containing new data

但是你传递了一个向量；事实上，您将 x 和 y 连接到一个带有 c(x,y) 的向量中。所以改变

ytest <- predict(foo, c(x,y))

到

ytest <- predict(foo, data.frame(x=x, y=y)) # or cbind(x,y)

请注意，有 400 个样本内预测 (length(ytest))，因为 y 值之一是 NaN，因此不会为该值生成预测。对于plot，x和y的长度必须相同，因此必须删除与麻烦的x=0项相关的值。

plot(x, y, type ="l") # x and y are both length 401
lines(x[x != 0], ytest, col="red") # both length 400

您问题的下一段代码中还有一些错误。

如果只有一个预测变量，那么

predict(foo2, xtest)

应该是

predict(foo2, data.frame(x=xtest))

但是，由于 y 也在您的模型中，因此您还需要将 y 的一个或一些值传递到 predict 语句中。您需要考虑使用什么值——也许是平均值？

一个稍微简单的工作流程是在开始建模之前准备数据，因为这可以更好地控制 NA/NAN 数据的处理方式。例如

# remove NA and NaN
my_data <- data.frame(x,y,z)
model_data <- na.omit(my_data)
# run model and predict
foo <- gausspr(z ~., data = model_data)
model_data$ytest <- predict(foo, data.frame(x=x, y=y))

# plot
plot(y ~ x, data=model_data, type ="l") 
lines(ytest ~ x, data=model_data, col="red")

R：`xy.coords(x, y) 中的错误：'x' 和 'y' 长度不同`

R: `Error in xy.coords(x, y) : 'x' and 'y' lengths differ`

statistics

regression

r

data-visualization