如何 return 数据框每组中唯一观察值的数量

Question

我有一个类似这样的数据框：

data <- data.frame(
  Location = rep(letters[1:10], each = 20),
  ID = rep(1:40, each = 5)
)

我想要 return 一个 table，其中包含一列中的每个唯一 Location 以及每个 [=13] 中唯一 ID 的计数=] 在另一列中，所以它看起来像这样：

Location   Count
   a         4
   b         4
   ...      ...

注意：在我的实际数据集中，每个Location中的ID个数不同，其他列中还有其他变量。

最好的方法是什么？

Answer 1

按 'Location' 分组后，我们可以在 'ID' 列上使用 n_distinct。在示例中，它是所有 4

library(dplyr)
data %>% 
    group_by(Location) %>%
    summarise(Count = n_distinct(ID))

如果我们需要添加新列，请使用 mutate 而不是 summarise

使用 data.table，可以使用 uniqueN

library(data.table)
setDT(data)[, .(Count = uniqueN(ID)), Location]

Answer 2

对象的tableclass有一个as.data.frame方法：

as.data.frame(table(data$Location))
   Var1 Freq
1     a   20
2     b   20
3     c   20
4     d   20
5     e   20
6     f   20
7     g   20
8     h   20
9     i   20
10    j   20

如何 return 数据框每组中唯一观察值的数量

How to return the number of unique observations in each group of a data frame

r

count

summarize