R - 数据帧到频率 table

Question

我在 R:

中有以下数据框 test

   test <- data.frame(
        ID = c(1,1,2,2,2),
        Color = c("green","blue",rep("green",3)))

    > test
      ID Color
    1  1 green
    2  1  blue
    3  2 green
    4  2 green
    5  2 green

我想要的输出是一个频率 table，显示每个 ID 的不同颜色的数量。例如，

    > desired_output
    
       1    2 <NA> 
       1    1    0

我使用以下 dplyr 代码生成此结果：

    test_2 <- test %>% 
      group_by(ID) %>% 
      mutate(nDistColors = n_distinct(Color)) %>% 
      ungroup() %>% 
      as.data.frame() %>% 
      select(ID,nDistColors) %>% 
      distinct()

    desired_output <- table(test_2$nDistColors, useNA = "always")

我经常看这样的东西，所以我想知道是否有更好的方法来编写代码来实现这个结果。特别是我觉得我记得 使用了一个不需要行的函数 :

      select(ID,nDistColors) %>% 
      distinct()

如果我不必存储对象 test_2 也很好 但是当我直接输入 table 时，它会改变格式为我不喜欢的 2 路频率 table。 这可以避免吗？我没有在管道链中找到指定我想要频率的列的方法：

test %>% group_by(ID) %>% mutate(nDistColors = n_distinct(Color)) %>% ungroup() %>% as.data.frame() %>% select(ID,nDistColors) %>% distinct() %>% table(useNA = "always") nDistColors ID 1 2 <NA> 1 0 1 0 2 1 0 0 <NA> 0 0 0

Answer 1

可以用summarise而不是mutate来简化，从而避免distinct的步骤。此外，不是将输出存储到临时对象，而是可以 pull 列 'n' 并在

上应用 table

library(dplyr)
test %>% 
    group_by(ID) %>% 
    summarise(n = n_distinct(Color), .groups = 'drop') %>%
    pull(n) %>% 
    table(useNA = 'always')
# 1    2 <NA> 
# 1    1    0

R - 数据帧到频率 table

R - data frame to frequency table

r

subset

dataframe

dplyr