由 r/data.table 中评估的第一个元素设置的列类型

Question

我有一个函数在某些条件下 returns NA 和一个整数（实际上是一个整数向量，但现在不重要）。

当我将此函数应用于 data.table 中的元素组和第一组 returns NA 时，整个列被错误地设置为 logical 从而搞砸了以下内容元素。我怎样才能防止这种行为？

示例：

library(data.table)

myfun <- function(x) {
    if(x == 0) {
        return(NA)
    } else {
        return(x*2)
    }
}

DT <- data.table(x= c(0, 1, 2, 3), y= LETTERS[1:4])
DT
   x y
1: 0 A
2: 1 B
3: 2 C
4: 3 D

以下应将值 c(NA, 2, 4, 6) 分配给列 x2。相反，我得到 c(NA, TRUE, TRUE, TRUE) 警告：

DT[, x2 := myfun(x), by= y]
Warning messages:
1: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
  Group 2 column 'x2': 2.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
2: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
  Group 3 column 'x2': 4.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'
3: In `[.data.table`(DT, , `:=`(x2, myfun(x)), by = y) :
  Group 4 column 'x2': 6.000000 (type 'double') at RHS position 1 taken as TRUE when assigning to type 'logical'

DT
   x y   x2
1: 0 A   NA
2: 1 B TRUE
3: 2 C TRUE
4: 3 D TRUE

改变行的顺序给出了预期的结果：

DT <- data.table(x= c(1, 2, 3, 0), y= LETTERS[1:4])
DT[, x2 := myfun(x), by= y]
DT
   x y x2
1: 1 A  2
2: 2 B  4
3: 3 C  6
4: 0 D NA

我可以预设第x2列的值：

DT <- data.table(x= c(0, 1, 2, 3), y= LETTERS[1:4])
DT[, x2 := integer()]
DT[, x2 := myfun(x), by= y]
DT
   x y x2
1: 0 A NA
2: 1 B  2
3: 2 C  4
4: 3 D  6

但我想知道是否有更好的选项不需要我事先设置列类型。

这是 data.table v1.14.0，R 3.6.3

Answer 1

不要让你的函数 return NA，而是 NA_integer_，或 NA_real_.. 问题已解决 ;-)

myfun <- function(x) {
  if(x == 0) {
    return(NA_integer_)  #<-- !!
  } else {
    return(x*2)
  }
}

由 r/data.table 中评估的第一个元素设置的列类型

Column type set by first element being evaluated in r/data.table

r

data.table