支持向量机-R代码-预测时间序列的残差
Support Vector Machine - R code - Predict Residual error of Time Series
我正在尝试使用 R 代码预测时间序列的残差。我的数据集有以下两列(我将放置前 10 行的示例):
Observation Residuals
1 -0,087527458
2 -0,06907199
3 -0,066604145
4 -0,07796713
5 -0,081723932
6 -0,094046868
7 -0,101535816
8 -0,101884203
9 -0,11131246
10 -0,092548176
为了预测,我正在使用 R 构建支持向量机:
# Load the data from the csv file
dataDirectory <- "C://"
data <- read.csv(paste(dataDirectory, "Data_SVM_Test.csv", sep=""),sep=";", header = TRUE)
head(data)
# Plot the data
plot(data, pch=16)
# Create a linear regression model
model <- lm(Residuals ~ Observation, data)
# Add the fitted line
abline(model)
predictedY <- predict(model, data)
# display the predictions
points(data$Observation, predictedY, col = "blue", pch=4)
# This function will compute the RMSE
rmse <- function(error)
{
sqrt(mean(error^2))
}
error <- model$residuals # same as data$Y - predictedY
predictionRMSE <- rmse(error) # 5.70377
plot(data, pch=16)
plot.new()
# svr model ==============================================
if(require(e1071)){
model <- svm(Residuals ~ Observation , data)
predictedY <- predict(model, data)
points(data$Observation, predictedY, col = "red", pch=4)
# /!\ this time svrModel$residuals is not the same as data$Y - predictedY
# so we compute the error like this
error <- data$Residuals - predictedY
svrPredictionRMSE <- rmse(error) # 3.157061
}
当我执行上面的代码时,我收到以下错误消息并且没有任何输出:
Warning message:
In Ops.factor(data$Residuals, predictedY) : ‘-’ not meaningful for factors
有人知道如何解决这个错误吗?
非常感谢!
使用svm
进行分类时,输出类型为factor。这是来自文档:
Output of svm: A vector of predicted values (for classification: a vector of labels, for density estimation: a logical vector).
这可以从下面的例子看出:
library(e1071)
model <- svm(Species ~ ., data = iris)
> str( predict(model, iris))
Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
你的数据也一样。级别显示 PredictedY
是一个因素:
> predictedY <- predict(model, df)
> predictedY
1 2 3 4 5 6 7 8 9 10
-0,087527458 -0,06907199 -0,066604145 -0,07796713 -0,081723932 -0,094046868 -0,101535816 -0,101884203 -0,11131246 -0,092548176
Levels: -0,066604145 -0,06907199 -0,07796713 -0,081723932 -0,087527458 -0,092548176 -0,094046868 -0,101535816 -0,101884203 -0,11131246
在您的代码行 predictedY <- predict(model, data)
中,predictedY
是 factor 类型。如果您尝试从一个因子中减去一个数字(反之亦然),您会得到错误:
> 1:10 - as.factor(1:10)
[1] NA NA NA NA NA NA NA NA NA NA
Warning message:
In Ops.factor(1:10, as.factor(1:10)) : ‘-’ not meaningful for factors
如果你想让它起作用,你需要使用 as.numeric
将因子转换成数字。 1:10 - as.numeric(as.factor(1:10))
.
我不知道你的数据是什么样的,但我从问题的标题来看 svm
对于时间序列来说可能不是一个好主意。
我正在尝试使用 R 代码预测时间序列的残差。我的数据集有以下两列(我将放置前 10 行的示例):
Observation Residuals
1 -0,087527458
2 -0,06907199
3 -0,066604145
4 -0,07796713
5 -0,081723932
6 -0,094046868
7 -0,101535816
8 -0,101884203
9 -0,11131246
10 -0,092548176
为了预测,我正在使用 R 构建支持向量机:
# Load the data from the csv file
dataDirectory <- "C://"
data <- read.csv(paste(dataDirectory, "Data_SVM_Test.csv", sep=""),sep=";", header = TRUE)
head(data)
# Plot the data
plot(data, pch=16)
# Create a linear regression model
model <- lm(Residuals ~ Observation, data)
# Add the fitted line
abline(model)
predictedY <- predict(model, data)
# display the predictions
points(data$Observation, predictedY, col = "blue", pch=4)
# This function will compute the RMSE
rmse <- function(error)
{
sqrt(mean(error^2))
}
error <- model$residuals # same as data$Y - predictedY
predictionRMSE <- rmse(error) # 5.70377
plot(data, pch=16)
plot.new()
# svr model ==============================================
if(require(e1071)){
model <- svm(Residuals ~ Observation , data)
predictedY <- predict(model, data)
points(data$Observation, predictedY, col = "red", pch=4)
# /!\ this time svrModel$residuals is not the same as data$Y - predictedY
# so we compute the error like this
error <- data$Residuals - predictedY
svrPredictionRMSE <- rmse(error) # 3.157061
}
当我执行上面的代码时,我收到以下错误消息并且没有任何输出:
Warning message:
In Ops.factor(data$Residuals, predictedY) : ‘-’ not meaningful for factors
有人知道如何解决这个错误吗?
非常感谢!
使用svm
进行分类时,输出类型为factor。这是来自文档:
Output of svm: A vector of predicted values (for classification: a vector of labels, for density estimation: a logical vector).
这可以从下面的例子看出:
library(e1071)
model <- svm(Species ~ ., data = iris)
> str( predict(model, iris))
Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
你的数据也一样。级别显示 PredictedY
是一个因素:
> predictedY <- predict(model, df)
> predictedY
1 2 3 4 5 6 7 8 9 10
-0,087527458 -0,06907199 -0,066604145 -0,07796713 -0,081723932 -0,094046868 -0,101535816 -0,101884203 -0,11131246 -0,092548176
Levels: -0,066604145 -0,06907199 -0,07796713 -0,081723932 -0,087527458 -0,092548176 -0,094046868 -0,101535816 -0,101884203 -0,11131246
在您的代码行 predictedY <- predict(model, data)
中,predictedY
是 factor 类型。如果您尝试从一个因子中减去一个数字(反之亦然),您会得到错误:
> 1:10 - as.factor(1:10)
[1] NA NA NA NA NA NA NA NA NA NA
Warning message:
In Ops.factor(1:10, as.factor(1:10)) : ‘-’ not meaningful for factors
如果你想让它起作用,你需要使用 as.numeric
将因子转换成数字。 1:10 - as.numeric(as.factor(1:10))
.
我不知道你的数据是什么样的,但我从问题的标题来看 svm
对于时间序列来说可能不是一个好主意。