使用 dplyr 根据每个组中唯一外观的总数给出一个 ID

Give an ID based on the total number of unique appearances in each group using dplyr

我一直在努力解决这个问题,希望得到您的指导和帮助 我有一个 data.frame 看起来像这样

col1 <- c("a","a","b", "a","b","c","a","c","d")
replicate <- c("rep1","rep1","rep1","rep2","rep2","rep2","rep3","rep3","rep3")
df = data.frame(col1, replicate)

  col1 replicate
1    a      rep1
2    a      rep1
3    b      rep1
4    a      rep2
5    b      rep2
6    c      rep2
7    a      rep3
8    c      rep3
9    d      rep3

我想创建另一个包含每个元素出现次数的列 col1 出现在 replicate 列中,但我不想考虑每个复制中的重复项。我希望我的数据看起来像这样

  col1 replicate  ID
1    a      rep1  3
2    a      rep1  3
3    b      rep1  2
4    a      rep2  3
5    b      rep2  2
6    c      rep2  2
7    a      rep3  3
8    c      rep3  2
9    d      rep3  1

这是因为“a”出现在所有 3 个重复中 “b”存在于 rep1 和 rep2 中 rep2 和 rep3 中的“c” 而“d”仅在 rep3

df %>% group_by(col1) %>%
  mutate(ID = n_distinct(col1, replicate))

# A tibble: 9 x 3
# Groups:   col1 [4]
  col1  replicate    ID
  <chr> <chr>     <int>
1 a     rep1          3
2 a     rep1          3
3 b     rep1          2
4 a     rep2          3
5 b     rep2          2
6 c     rep2          2
7 a     rep3          3
8 c     rep3          2
9 d     rep3          1

使用uniqueN

library(data.table)
setDT(df)[, ID := uniqueN(paste(col1, replicate)), col1]

-输出

df
   col1 replicate ID
1:    a      rep1  3
2:    a      rep1  3
3:    b      rep1  2
4:    a      rep2  3
5:    b      rep2  2
6:    c      rep2  2
7:    a      rep3  3
8:    c      rep3  2
9:    d      rep3  1