R data.table - integer/numeric 和字符列的不同连接行为

R data.table - different join behavior for integer/numeric and character columns

我有两个 data.tables DTADT,我想将它们加入列 a, new.a:

R> DT
   a   b
1: 1 1.0
2: 1 1.0
3: 2 2.0
4: 3 3.5
5: 4 4.5
6: 5 5.5

R> ADT
   new.a type
1:     1    3
2:     1    5
3:     2    3
4:     4    5
5:     4    3

R> setkey(DT, a)
R> DT[ADT[, new.a]]
# This is the desired result:
   a   b
1: 1 1.0
2: 1 1.0
3: 1 1.0
4: 1 1.0
5: 2 2.0
6: 4 4.5
7: 4 4.5

data.table 不是期望的结果,而是将 ADT[, new.a] 中的数字信息作为一组行号。

DT[ADT[, new.a]] # taking row numbers... even truncating comma-values!
setkey(DT, a)
DT[ADT[, new.a]] # the key sorts the DT, so slightly different result, still using row numbers

如果相反,我以不同方式定义 data.tables,现在包含 character 列,如果我在设置键之前尝试连接,我会正确地得到一个错误,然后我得到了想要的结果。但是有没有办法直接使用数字索引?预先将整个 DT 转换为字符可能会很慢...

DTchar <- data.table(
  a = as.character(c(1, 2, 1, 3, 4, 5)),
  b = c(1, 2, 1, 3.5, 4.5, 5.5)
)
ADTchar <- data.table(
  new.a = as.character(c(1, 1, 2, 4, 4)),
  type  = as.character(c(3, 5, 3, 5, 3))
)

DTchar[ADTchar[, new.a]] # error - correctly
setkey(DTchar, a)
DTchar[ADTchar[, new.a]] # desired result

首先,您应该使用 ADT[, list(new.a)],returns 和 data.table,而不是 ADT[, new.a],其中 returns 一个向量。 您还缺少参数 allow.cartesian = TRUE.

DT[ADT[, list(new.a)], allow.cartesian = TRUE]
##    a   b
## 1: 1 1.0
## 2: 1 1.0
## 3: 1 1.0
## 4: 1 1.0
## 5: 2 2.0
## 6: 4 4.5
## 7: 4 4.5

来自 data.tablei 的文档:

integer and logical vectors work the same way they do in [.data.frame.

character is matched to the first column of x's key.

When i is a data.table, x must have a key. i is joined to x using x's key and the rows in x that match are returned.