R 中的在线机器学习

Online machine learning in R

我在 R 插入符号中有一个运行良好的合奏算法,但我想考虑新的传入数据。我想避免使用所有新旧数据重新学习算法。

library(caret)
data <- iris
model <- train(Species ~.,data=data[1:145,],method="rf",trControl=trainControl(method="boot",number=10))

## now assume that we get data[146,1:4] after we have completed our model and
## after some time we learn what was the correct outcome. I want to include that
## knowledge into existing algorithm.

# I want to avoid the following call because it is too time consuming;
train(Species ~.,data=data[1:146,],method="rf",trControl=trainControl(method="boot",number=10))

我正在寻找类似 partial_fit 形式的东西 python SGDClassifier 或任何其他建议。

谢谢。

编辑:我尝试从答案中寻求帮助并得到了这个:

library(caret)
data <- iris
model <- train(Species ~.,data=data[1:120,],method="rf",trControl=trainControl(method="boot",number=10))
a <- (predict(model,newdata=data[121:150,1:4])==data[121:150,5])
print(a)
previousModel <- model  # load previously saved model object
previousModel$trainingData <- data # change training data to new data
newModel <- update(object = previousModel,forceRefit=T)
b <- (predict(newModel,newdata=data[121:150,1:4])==data[121:150,5])
all(a==b)


 [1]  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE
[13]  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
### prediction is not perfect
[1] TRUE # after including newdata, nothing changes.. why?

我认为只有更新训练参数的功能:

previousModel <- readRDS("....xxx.rds")  # load previously saved model object
previousModel$trainingData <- trainData # change training data to new data
newModel <- update(object = previousModel)

关于基于新训练数据的更新,我发现了 issue and fix for this train.update by an Option forceRefit with which the model is updated even if there is no Change in the Training Parameters (Code can be seen here)。

我希望这在某种程度上有所帮助,你可以从那里开始。