RevoScaleR rxDataStep 行选择在使用变量时失败
RevoScaleR rxDataStep rowselection fails when using variable
我正在尝试使用 rxDataStep 在 xdf 文件上执行选择。
我正在使用 rowSelection,它在我使用显式值时有效,但在我使用变量时无效,例如:
这有效:
tmp <- rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT, UNIT_PRICE = RT_VALUE / TT_AMOUNT), varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == 1646041))
但这不是:
x <- 1646041
tmp <- rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT, UNIT_PRICE = RT_VALUE / TT_AMOUNT), varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == x))
失败并显示消息:
ERROR: The sample data set for the analysis has no variables.
Caught exception in file: CxAnalysis.cpp, line: 3848. ThreadID: 31156 Rethrowing.
Caught exception in file: CxAnalysis.cpp, line: 5375. ThreadID: 31156 Rethrowing.
这里有什么问题?我已经为此苦苦挣扎了几个小时,尝试了我在网上找到的每一个语法。
谢谢
我们可能需要在 transformObjects
上传递它
library(RevoScaleR)
rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT,
UNIT_PRICE = RT_VALUE / TT_AMOUNT),
varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == x1), transformObjects = list(x1=x))
使用可重现的例子
set.seed(100)
myData <- data.frame(x = 1:100, y = rep(c("a", "b", "c", "d"), 25),
z = rnorm(100), w = runif(100))
z1 <- 2
myDataSubset <- rxDataStep(inData = myData,
varsToKeep = c("x", "w", "z"),
rowSelection = z > zNew,
transformObjects = list(zNew=z1))
#Rows Read: 100, Total Rows Processed: 100, Total Chunk Time: 0.007 seconds
myDataSubset
# x w z
#1 20 0.03609544 2.310297
#2 64 0.79408518 2.581959
#3 96 0.07123327 2.445683
这也可以用dplyr
来完成
library(dplyr)
myData %>%
select(x, w, z) %>%
filter(z > z1)
# x w z
#1 20 0.03609544 2.310297
#2 64 0.79408518 2.581959
#3 96 0.07123327 2.445683
我正在尝试使用 rxDataStep 在 xdf 文件上执行选择。 我正在使用 rowSelection,它在我使用显式值时有效,但在我使用变量时无效,例如: 这有效:
tmp <- rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT, UNIT_PRICE = RT_VALUE / TT_AMOUNT), varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == 1646041))
但这不是:
x <- 1646041
tmp <- rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT, UNIT_PRICE = RT_VALUE / TT_AMOUNT), varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == x))
失败并显示消息:
ERROR: The sample data set for the analysis has no variables.
Caught exception in file: CxAnalysis.cpp, line: 3848. ThreadID: 31156 Rethrowing.
Caught exception in file: CxAnalysis.cpp, line: 5375. ThreadID: 31156 Rethrowing.
这里有什么问题?我已经为此苦苦挣扎了几个小时,尝试了我在网上找到的每一个语法。 谢谢
我们可能需要在 transformObjects
library(RevoScaleR)
rxDataStep(alias.Xdf, transforms = list(TT_AMOUNT = DC_AMOUNT * RT_AMOUNT,
UNIT_PRICE = RT_VALUE / TT_AMOUNT),
varsToKeep = c('DC_AMOUNT', 'RT_AMOUNT', 'RT_VALUE'),
rowSelection = (A_ID == x1), transformObjects = list(x1=x))
使用可重现的例子
set.seed(100)
myData <- data.frame(x = 1:100, y = rep(c("a", "b", "c", "d"), 25),
z = rnorm(100), w = runif(100))
z1 <- 2
myDataSubset <- rxDataStep(inData = myData,
varsToKeep = c("x", "w", "z"),
rowSelection = z > zNew,
transformObjects = list(zNew=z1))
#Rows Read: 100, Total Rows Processed: 100, Total Chunk Time: 0.007 seconds
myDataSubset
# x w z
#1 20 0.03609544 2.310297
#2 64 0.79408518 2.581959
#3 96 0.07123327 2.445683
这也可以用dplyr
library(dplyr)
myData %>%
select(x, w, z) %>%
filter(z > z1)
# x w z
#1 20 0.03609544 2.310297
#2 64 0.79408518 2.581959
#3 96 0.07123327 2.445683