R data.table - integer/numeric 和字符列的不同连接行为
R data.table - different join behavior for integer/numeric and character columns
我有两个 data.tables DT
和 ADT
,我想将它们加入列 a, new.a
:
R> DT
a b
1: 1 1.0
2: 1 1.0
3: 2 2.0
4: 3 3.5
5: 4 4.5
6: 5 5.5
R> ADT
new.a type
1: 1 3
2: 1 5
3: 2 3
4: 4 5
5: 4 3
R> setkey(DT, a)
R> DT[ADT[, new.a]]
# This is the desired result:
a b
1: 1 1.0
2: 1 1.0
3: 1 1.0
4: 1 1.0
5: 2 2.0
6: 4 4.5
7: 4 4.5
data.table 不是期望的结果,而是将 ADT[, new.a]
中的数字信息作为一组行号。
DT[ADT[, new.a]] # taking row numbers... even truncating comma-values!
setkey(DT, a)
DT[ADT[, new.a]] # the key sorts the DT, so slightly different result, still using row numbers
如果相反,我以不同方式定义 data.tables,现在包含 character
列,如果我在设置键之前尝试连接,我会正确地得到一个错误,然后我得到了想要的结果。但是有没有办法直接使用数字索引?预先将整个 DT 转换为字符可能会很慢...
DTchar <- data.table(
a = as.character(c(1, 2, 1, 3, 4, 5)),
b = c(1, 2, 1, 3.5, 4.5, 5.5)
)
ADTchar <- data.table(
new.a = as.character(c(1, 1, 2, 4, 4)),
type = as.character(c(3, 5, 3, 5, 3))
)
DTchar[ADTchar[, new.a]] # error - correctly
setkey(DTchar, a)
DTchar[ADTchar[, new.a]] # desired result
首先,您应该使用 ADT[, list(new.a)]
,returns 和 data.table
,而不是 ADT[, new.a]
,其中 returns 一个向量。
您还缺少参数 allow.cartesian = TRUE
.
DT[ADT[, list(new.a)], allow.cartesian = TRUE]
## a b
## 1: 1 1.0
## 2: 1 1.0
## 3: 1 1.0
## 4: 1 1.0
## 5: 2 2.0
## 6: 4 4.5
## 7: 4 4.5
来自 data.table
中 i
的文档:
integer and logical vectors work the same way they do in [.data.frame.
character is matched to the first column of x's key.
When i is a data.table, x must have a key. i is joined to x using x's key and the rows in x that match are returned.
我有两个 data.tables DT
和 ADT
,我想将它们加入列 a, new.a
:
R> DT
a b
1: 1 1.0
2: 1 1.0
3: 2 2.0
4: 3 3.5
5: 4 4.5
6: 5 5.5
R> ADT
new.a type
1: 1 3
2: 1 5
3: 2 3
4: 4 5
5: 4 3
R> setkey(DT, a)
R> DT[ADT[, new.a]]
# This is the desired result:
a b
1: 1 1.0
2: 1 1.0
3: 1 1.0
4: 1 1.0
5: 2 2.0
6: 4 4.5
7: 4 4.5
data.table 不是期望的结果,而是将 ADT[, new.a]
中的数字信息作为一组行号。
DT[ADT[, new.a]] # taking row numbers... even truncating comma-values!
setkey(DT, a)
DT[ADT[, new.a]] # the key sorts the DT, so slightly different result, still using row numbers
如果相反,我以不同方式定义 data.tables,现在包含 character
列,如果我在设置键之前尝试连接,我会正确地得到一个错误,然后我得到了想要的结果。但是有没有办法直接使用数字索引?预先将整个 DT 转换为字符可能会很慢...
DTchar <- data.table(
a = as.character(c(1, 2, 1, 3, 4, 5)),
b = c(1, 2, 1, 3.5, 4.5, 5.5)
)
ADTchar <- data.table(
new.a = as.character(c(1, 1, 2, 4, 4)),
type = as.character(c(3, 5, 3, 5, 3))
)
DTchar[ADTchar[, new.a]] # error - correctly
setkey(DTchar, a)
DTchar[ADTchar[, new.a]] # desired result
首先,您应该使用 ADT[, list(new.a)]
,returns 和 data.table
,而不是 ADT[, new.a]
,其中 returns 一个向量。
您还缺少参数 allow.cartesian = TRUE
.
DT[ADT[, list(new.a)], allow.cartesian = TRUE]
## a b
## 1: 1 1.0
## 2: 1 1.0
## 3: 1 1.0
## 4: 1 1.0
## 5: 2 2.0
## 6: 4 4.5
## 7: 4 4.5
来自 data.table
中 i
的文档:
integer and logical vectors work the same way they do in [.data.frame.
character is matched to the first column of x's key.
When i is a data.table, x must have a key. i is joined to x using x's key and the rows in x that match are returned.