在 R 中按键拆分 data.table

Question

我在 R 中有一个 data.table 对象，我想沿着它的键拆分。

>myTable[1:11]
   ID     length  hash
1: 2578   52.5   26566273
2: 4066   52.5   26566273
3: 2578   53.5   26566273
4: 4066   53.5   26566273
5: 2207   29.5   54352910
6: 3719   29.5   54352910
7: 5166    9.5  613353882
8: 5167    9.5  613353882
9: 5169    9.5  613353882
10:5170    9.5  613353882
11:5171    9.5  613353882

first_hash 和长度是我的 2 个键我想要的输出是 ID 列的每个键的列表

所以它可能看起来像

first_hash length ID_list
26566273   52.5   [1] 2578 4066
26566273   53.5   [1] 2578 4066
54352910   29.5   [1] 2207 3719
613353882   9.5   [1] 5166 5167 5168 5169 5170 5171

或某种列表...

我认为 plyr 可以给出一些答案，但我更喜欢 data.table 方式

最终目标是创建具有相同密钥的所有 ID 对我知道函数 expand.grid

谢谢

Answer 1

我们按 'hash' 和 'length' 分组，并将 'ID' 放在 list 中。

DT <-  myTable[,list(ID_list=list(ID)) , by =.(first_hash=hash, length)]
DT
#   first_hash length                  ID_list
#1:   26566273   52.5                2578,4066
#2:   26566273   53.5                2578,4066
#3:   54352910   29.5                2207,3719
#4:  613353882    9.5 5166,5167,5169,5170,5171

str(DT)
# Classes ‘data.table’ and 'data.frame':  4 obs. of  3 variables:
# $ first_hash: int  26566273 26566273 54352910 613353882
# $ length    : num  52.5 53.5 29.5 9.5
# $ ID_list   :List of 4
# ..$ : int  2578 4066
# ..$ : int  2578 4066
# ..$ : int  2207 3719
# ..$ : int  5166 5167 5169 5170 5171

或者正如@Frank 提到的那样，我们可以 paste 'ID' 按组创建列而不是 list

 myTable[,list(ID_list= toString(ID)) , by =.(first_hash=hash, length)]

在 R 中按键拆分 data.table

split a data.table in R by key

grouping

split

r

key

data.table