如何使用 R data.table 按组计算分类变量的 frequency/table?

How do I compute the frequency/table of categorical variables by group with R data.table?

我有以下 data.table 和 R

library(data.table)
dt = data.table(ID = c("person1", "person1", "person1", "person2", "person2", "person2", "person2", "person2", ...), category = c("red", "red", "blue", "red", "red", "blue", "green", "green", ...))

dt
ID         category
person1    red
person1    red
person1    blue
person2    red
person2    red
person2    blue
person2    green
person2    green
person3    blue
....

我正在寻找如何为每个唯一 ID 创建 "frequency" 的分类变量 redbluegreen,然后展开这些列以进行记录每个的计数。结果 data.table 看起来像这样:

dt
ID        red    blue    green
person1   2      1       0
person2   2      1       2    
...

我错误地认为从 data.table 开始的正确方法是按组计算 table(),例如

dt[, counts :=table(category), by=ID]

但这似乎是按组 ID 计算分类值的总数。这也没有解决我的 "expanding" 和 data.table 的问题。

正确的做法是什么?

像这样?

library(data.table)
library(dplyr)
dt[, .N, by = .(ID, category)] %>% dcast(ID ~ category)

如果要将这些列添加到原来的data.table

counts <- dt[, .N, by = .(ID, category)] %>% dcast(ID ~ category) 
counts[is.na(counts)] <- 0
output <- merge(dt, counts, by = "ID")

这是以命令式的方式完成的,可能有更简洁、实用的方式来完成。

library(data.table)
library(dtplyr)
dt = data.table(ID = c("person1", "person1", "person1", "person2", "person2", "person2", "person2", "person2"), 
                category = c("red", "red", "blue", "red", "red", "blue", "green", "green"))


ids <- unique(dt$ID)
categories <- unique(dt$category)
counts <- matrix(nrow=length(ids), ncol=length(categories))
rownames(counts) <- ids
colnames(counts) <- categories

for (i in seq_along(ids)) {
  for (j in seq_along(categories)) {
    count <- dt %>%
      filter(ID == ids[i], category == categories[j]) %>%
      nrow()

    counts[i, j] <- count
  }
}

然后:

>counts
##         red blue green
##person1   2    1     0
##person2   2    1     2

一行即可使用reshape库

library(reshape2)
dcast(data=dt,
      ID ~ category,
      fun.aggregate = length,
      value.var = "category")

       ID blue green red
1 person1    1     0   2
2 person2    1     2   2

此外,如果你只需要一个简单的2-way table,你可以使用内置的R table函数。

table(dt$ID,dt$category)