我的 Kaggle SVM 脚本代码中的行不匹配

Question

我正在审查我的 e1071 Kaggle Titanic 数据的 SVM 代码。最后我知道，这部分工作正常，但现在我遇到了一个相当奇怪的错误。当我尝试构建 data.frame 以便提交给 kaggle 时，我的预测似乎是我的训练集而不是测试集的大小。

问题

Error in data.frame(PassengerId = test$passengerid, Survived = prediction) : arguments imply differing number of rows: 418, 714

很明显，它们应该都是418，我不明白这是怎么回事？

详情

这是我的脚本：

setwd("Path\To\Data")
train <- read.csv("train.csv")
test <- read.csv("test.csv")

library("e1071")
bestModel = svm(Survived ~ Pclass + Sex + Age + Sex * Pclass, data = train, kernel = "linear", cost = 1)

prediction <- predict(bestModel, newData=test, type="response")
prediction[prediction >= 0.5] <- 1
prediction[prediction != 1] <- 0
prediction[is.na(prediction)] <- 0

这是给我错误的行：

predictionSubmit <- data.frame(PassengerId = test$passengerid, Survived = prediction)

尝试次数

我已经使用 names(train) 和 names(test) 来验证我的列变量名称是否相同。可以查到数据here。我知道我的预测代码可以优化成一行，但这不是这里的问题。我希望在这个问题上有第二双眼睛。我正在考虑使用 kernlab 库，但想知道是否存在我在这里忽略的语法糖问题。感谢您的建议和线索。

Answer 1

#10 items in training set
y <- sample(0:1, 10, T)
x <- rnorm(10)
bestModel <- svm(y~x,kernel = "linear", cost = 1)

#Six in test set
prediction <- predict(bestModel, newdata=rnorm(6), type="response")

#Output has 10 values (unexpected)
prediction
#           1          2          3          4          5          6       <NA>       <NA> 
#  0.05163974 0.58048905 0.49524846 0.13524885 0.12592718 0.06082822 0.55393256 1.08488424 
#        <NA>       <NA> 
#  0.94836026 0.47679646 

#For correct output, remove names with <NA>
prediction[na.omit(names(prediction))]
#         1          2          3          4          5          6 
#0.05163974 0.58048905 0.49524846 0.13524885 0.12592718 0.06082822

我的 Kaggle SVM 脚本代码中的行不匹配

My rows are mismatched in my SVM scripting code for Kaggle

r

svm

kaggle