如何使用 R data.table 按组计算分类变量的 frequency/table？

Question

我有以下 data.table 和 R

library(data.table)
dt = data.table(ID = c("person1", "person1", "person1", "person2", "person2", "person2", "person2", "person2", ...), category = c("red", "red", "blue", "red", "red", "blue", "green", "green", ...))

dt
ID         category
person1    red
person1    red
person1    blue
person2    red
person2    red
person2    blue
person2    green
person2    green
person3    blue
....

我正在寻找如何为每个唯一 ID 创建 "frequency" 的分类变量 red、blue、green，然后展开这些列以进行记录每个的计数。结果 data.table 看起来像这样：

dt
ID        red    blue    green
person1   2      1       0
person2   2      1       2    
...

我错误地认为从 data.table 开始的正确方法是按组计算 table()，例如

dt[, counts :=table(category), by=ID]

但这似乎是按组 ID 计算分类值的总数。这也没有解决我的 "expanding" 和 data.table 的问题。

正确的做法是什么？

Answer 1

像这样？

library(data.table)
library(dplyr)
dt[, .N, by = .(ID, category)] %>% dcast(ID ~ category)

如果要将这些列添加到原来的data.table

counts <- dt[, .N, by = .(ID, category)] %>% dcast(ID ~ category) 
counts[is.na(counts)] <- 0
output <- merge(dt, counts, by = "ID")

Answer 2

这是以命令式的方式完成的，可能有更简洁、实用的方式来完成。

library(data.table)
library(dtplyr)
dt = data.table(ID = c("person1", "person1", "person1", "person2", "person2", "person2", "person2", "person2"), 
                category = c("red", "red", "blue", "red", "red", "blue", "green", "green"))


ids <- unique(dt$ID)
categories <- unique(dt$category)
counts <- matrix(nrow=length(ids), ncol=length(categories))
rownames(counts) <- ids
colnames(counts) <- categories

for (i in seq_along(ids)) {
  for (j in seq_along(categories)) {
    count <- dt %>%
      filter(ID == ids[i], category == categories[j]) %>%
      nrow()

    counts[i, j] <- count
  }
}

然后：

>counts
##         red blue green
##person1   2    1     0
##person2   2    1     2

Answer 3

一行即可使用reshape库

library(reshape2)
dcast(data=dt,
      ID ~ category,
      fun.aggregate = length,
      value.var = "category")

       ID blue green red
1 person1    1     0   2
2 person2    1     2   2

此外，如果你只需要一个简单的2-way table，你可以使用内置的R table函数。

table(dt$ID,dt$category)

如何使用 R data.table 按组计算分类变量的 frequency/table？

How do I compute the frequency/table of categorical variables by group with R data.table?

r

frequency

dataframe

data.table