R 中的这行代码可能有什么错误?
What might be the error in this line of code in R?
fit <- randomForest(class~. ,data = train_data)
谁能告诉我这行代码有什么问题?
这里train_data是预测收入>50k或<50k的训练数据,我在这行得到的错误是:
Error in y - ymean : non-numeric argument to binary operator In
addition: Warning messages: 1: In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want
to do regression? 2: In mean.default(y) : argument is not numeric or
logical: returning NA
您似乎在尝试对字符因变量进行分类。假设我们使用来自 kaggle 的这个奇妙的 dataset:
library(randomForest)
train_data = read.csv("credit_train.csv",stringsAsFactors=FALSE)
str(train_data)
'data.frame': 808 obs. of 17 variables:
$ Class : chr "Good" "Bad" "Good" "Good" ...
$ Duration : int 6 48 12 36 24 12 30 48 12 24 ...
$ Amount : int 1169 5951 2096 9055 2835 3059 5234 4308 1567 1199 ...
$ InstallmentRatePercentage : int 4 2 2 2 3 2 4 3 1 4 ...
$ ResidenceDuration : int 4 2 3 4 4 4 2 4 1 4 ...
$ Age : int 67 22 49 35 53 61 28 24 22 60 ...
$ NumberExistingCredits : int 2 1 1 1 1 1 2 1 1 2 ...
$ NumberPeopleMaintenance : int 1 1 2 2 1 1 1 1 1 1 ...
$ Telephone : int 0 1 1 0 1 1 1 1 0 1 ...
$ ForeignWorker : int 1 1 1 1 1 1 1 1 1 1 ...
$ CheckingAccountStatus.lt.0 : int 1 0 0 0 0 0 0 1 0 1 ...
$ CheckingAccountStatus.0.to.200: int 0 1 0 0 0 0 1 0 1 0 ...
$ CheckingAccountStatus.gt.200 : int 0 0 0 0 0 0 0 0 0 0 ...
$ CreditHistory.ThisBank.AllPaid: int 0 0 0 0 0 0 0 0 0 0 ...
$ CreditHistory.PaidDuly : int 0 1 0 1 1 1 0 1 1 0 ...
$ CreditHistory.Delay : int 0 0 0 0 0 0 0 0 0 0 ...
$ CreditHistory.Critical : int 1 0 1 0 0 0 1 0 0 1 ...
fit <- randomForest(Class~. ,data = train_data)
Error in y - ymean : non-numeric argument to binary operator
In addition: Warning messages:
1: In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
2: In mean.default(y) : argument is not numeric or logical: returning NA
你可以看到我得到了同样的错误。您的因变量是一个字符。我们将它转换成一个因子,它起作用了:
train_data$Class = factor(train_data$Class)
fit <- randomForest(Class~. ,data = train_data)
fit <- randomForest(class~. ,data = train_data)
谁能告诉我这行代码有什么问题?
这里train_data是预测收入>50k或<50k的训练数据,我在这行得到的错误是:
Error in y - ymean : non-numeric argument to binary operator In addition: Warning messages: 1: In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression? 2: In mean.default(y) : argument is not numeric or logical: returning NA
您似乎在尝试对字符因变量进行分类。假设我们使用来自 kaggle 的这个奇妙的 dataset:
library(randomForest)
train_data = read.csv("credit_train.csv",stringsAsFactors=FALSE)
str(train_data)
'data.frame': 808 obs. of 17 variables:
$ Class : chr "Good" "Bad" "Good" "Good" ...
$ Duration : int 6 48 12 36 24 12 30 48 12 24 ...
$ Amount : int 1169 5951 2096 9055 2835 3059 5234 4308 1567 1199 ...
$ InstallmentRatePercentage : int 4 2 2 2 3 2 4 3 1 4 ...
$ ResidenceDuration : int 4 2 3 4 4 4 2 4 1 4 ...
$ Age : int 67 22 49 35 53 61 28 24 22 60 ...
$ NumberExistingCredits : int 2 1 1 1 1 1 2 1 1 2 ...
$ NumberPeopleMaintenance : int 1 1 2 2 1 1 1 1 1 1 ...
$ Telephone : int 0 1 1 0 1 1 1 1 0 1 ...
$ ForeignWorker : int 1 1 1 1 1 1 1 1 1 1 ...
$ CheckingAccountStatus.lt.0 : int 1 0 0 0 0 0 0 1 0 1 ...
$ CheckingAccountStatus.0.to.200: int 0 1 0 0 0 0 1 0 1 0 ...
$ CheckingAccountStatus.gt.200 : int 0 0 0 0 0 0 0 0 0 0 ...
$ CreditHistory.ThisBank.AllPaid: int 0 0 0 0 0 0 0 0 0 0 ...
$ CreditHistory.PaidDuly : int 0 1 0 1 1 1 0 1 1 0 ...
$ CreditHistory.Delay : int 0 0 0 0 0 0 0 0 0 0 ...
$ CreditHistory.Critical : int 1 0 1 0 0 0 1 0 0 1 ...
fit <- randomForest(Class~. ,data = train_data)
Error in y - ymean : non-numeric argument to binary operator
In addition: Warning messages:
1: In randomForest.default(m, y, ...) :
The response has five or fewer unique values. Are you sure you want to do regression?
2: In mean.default(y) : argument is not numeric or logical: returning NA
你可以看到我得到了同样的错误。您的因变量是一个字符。我们将它转换成一个因子,它起作用了:
train_data$Class = factor(train_data$Class)
fit <- randomForest(Class~. ,data = train_data)