在 R 中的 data.table 中使用 max() 有条件地更改列的内容

Question

我有一个 data.table，其中包含以下信息：

   data.table(id = c(rep(1,5)),
               year = c(rep(2015,3), rep(2016,2)), 
               class = c(rep("A", 3), rep("B", 2)),
               origin = c("Europe", "Asia", "Africa", "Europe", "Asia"), 
               count = c(30299, 3, 34, 2, 800))

   id year class origin count
1:  1 2015     A Europe 30299
2:  1 2015     A   Asia     3
3:  1 2015     A Africa    34
4:  1 2016     B Europe     2
5:  1 2016     B   Asia   800

但是，对于每个 id、year、class，只有一个位置是允许的。这里，第一个组合有三个位置：

1:  1 2015     A Europe 30299
2:  1 2015     A   Asia     3
3:  1 2015     A Africa    34

第二个组合有两个位置：

4:  1 2016     B Europe     2
5:  1 2016     B   Asia   800

我想更改位置，这样对于每个 id、year、class 组合，将使用具有最高 count 的位置。这应该导致 table 像这样：

   id year class origin count
1:  1 2015     A Europe 30299
2:  1 2015     A Europe     3
3:  1 2015     A Europe    34
4:  1 2016     B   Asia     2
5:  1 2016     B   Asia   800

如何实现？我正在考虑将数据 table 拆分为列表列表，然后应用 lapply，但我确定有 better/simpßler 解决方案？有小费吗？

Answer 1

DT[, origin := origin[which.max(count)], by = .(id, year, class)]

Answer 2

您还可以将 which 与 dplyr 工作流程一起使用。 which 解决方案已由 sindri_baldur 发布（感谢他）

library(dplyr)
df %>% 
  group_by(id, year, class) %>% 
  mutate(origin = origin[which.max(count)])

输出：

     id  year class origin count
  <dbl> <dbl> <chr> <chr>  <dbl>
1     1  2015 A     Europe 30299
2     1  2015 A     Europe     3
3     1  2015 A     Europe    34
4     1  2016 B     Asia       2
5     1  2016 B     Asia     800

在 R 中的 data.table 中使用 max() 有条件地更改列的内容

conditionally changing contents of a columns using max() in data.table in R

r

list

data.table