使用 data.table 计算 R 中项目的顺序

Question

我想知道有多少个 ID 在不同的顺序中具有不同的类型。例如，我有如下数据集：

data <- data.table( id      = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,5),
                    type    = c(3,3,3,3,3,2,3,1,2,3,2,2,3,3,3,2,3), 
                    nr_item = c(1,1,2,3,1,2,3,1,2,3,1,2,3,1,2,2,3))

我想先知道有多少个 id 是类型 1 (nr_item = 1)。然后我想知道有多少 id 首先 (nr_item = 1) 输入 1 然后再 (nr_item = 2) 输入 1。然后，有多少首先 (nr_item = 1) 输入 1 然后再次输入 (nr_item = 2) 输入 1 然后再次输入 1 (nr_item = 2) 等等..

我想知道所有可能的组合和继承。

结果应该包含 order/shape 中的 ID 数量，如下所示：

1
1 -> 1
1 -> 1 -> 1
1 -> 1 -> 2
1 -> 1 -> 3
1 -> 2
1 -> 2 -> 1
1 -> 2 -> 2
1 -> 2 -> 3
etc..

请注意，有些 ID 有两次提及，如果它们导致不同类型的切换，那么它们也应该被计算为两次。例如 id 1 以 type 3 开始两次，这个可以忽略。但是id 5同时有type 2和type 3作为第二个nr_item，这应该算作两个不同的场合来计算。

Answer 1

你可以简单地使用 table 函数来获得一个枢轴 table

table(data$type, data$nr_item)

顺便说一句，如果您只想总结这些数据，您可以使用这些。几乎相同的结果。

table(data$type)

或从 dplyr 包中总结

data %>%
   group_by(type) %>%
   summarise(count = n())

最后，如果你想 data.table 本身，你可以使用这个

data[,.(count = .N), by = type]

Answer 2

如果我理解正确，OP 想要计算 data.

给出的所有可能路径

这是第一次尝试

library(data.table)
data[, CJ(type1 = .SD[nr_item == 1, type], 
          type2 = .SD[nr_item == 2, type], 
          type3 = .SD[nr_item == 3, type], unique = TRUE), by = id][
            , rollup(.SD, list(count=.N), by = c("type1", "type2", "type3"))][
              order(type1, type2, type3, na.last = FALSE)]

    type1 type2 type3 count
 1:    NA    NA    NA     6
 2:     1    NA    NA     1
 3:     1     2    NA     1
 4:     1     2     3     1
 5:     2    NA    NA     1
 6:     2     2    NA     1
 7:     2     2     3     1
 8:     3    NA    NA     4
 9:     3     2    NA     2
10:     3     2     3     2
11:     3     3    NA     2
12:     3     3     3     2

可能的路径由

创建

data[, CJ(type1 = .SD[nr_item == 1, type], 
          type2 = .SD[nr_item == 2, type], 
          type3 = .SD[nr_item == 3, type], unique = TRUE), by = id]

   id type1 type2 type3
1:  1     3     3     3
2:  2     3     2     3
3:  3     1     2     3
4:  4     2     2     3
5:  5     3     2     3
6:  5     3     3     3

所以，一共有6条路径，其中4条路径以type 3开头，2条路径以types[=37=开头] 3 -> 2，例如

编辑：微调输出

为了使结果看起来更像 OP 的预期结果，可以修改 rollup() 的输出：

library(data.table) library(magrittr) library(stringr) cols <- paste0("type", 1:3) data[, CJ(type1 = .SD[nr_item == 1, type], type2 = .SD[nr_item == 2, type], type3 = .SD[nr_item == 3, type], unique = TRUE), by = id][ , rollup(.SD, list(count = .N), by = cols)][ , .(path = unlist(.SD) %>% na.omit() %>% paste(collapse = " -> "), count), .SDcols = cols, by = .(rn = seq_along(count))][ , path := path %>% str_pad(max(str_length(.)), "right")][ order(path), -"rn"]

path count 1: 6 2: 1 1 3: 1 -> 2 1 4: 1 -> 2 -> 3 1 5: 2 1 6: 2 -> 2 1 7: 2 -> 2 -> 3 1 8: 3 4 9: 3 -> 2 2 10: 3 -> 2 -> 3 2 11: 3 -> 3 2 12: 3 -> 3 -> 3 2

使用 data.table 计算 R 中项目的顺序

Calculate order of items in R using data.table

r

count

data.table

编辑：微调输出