如何交叉制表(xtabs)与多个变量但相同的细分

How to cross tabulate (xtabs) with multiple vars but the same breakdown

我有一个如下所示的数据框:

  SubjectID Activity        V1          V2          V3
1         2        S 0.2571778 -0.02328523 -0.01465376
2         2        W 0.2860267 -0.01316336 -0.11908252
3         3        R 0.2754848 -0.02605042 -0.11815167
4         3        W 0.2702982 -0.03261387 -0.11752018
5         4        A 0.2748330 -0.02784779 -0.12952716
6         4        S 0.2792199 -0.01862040 -0.11390197
...

(其实Vn变量还有很多,但这说明了问题。)

我想使用 xtabs() 查看所有 Vn 变量,但保持 SubjectID 和 Activity 不变 - 类似于

xtabs(c(V1, V2, V3) ~ SubjectID + Activity, data = DF)

lapply(c(V1, V2, V3), function(x) xtabs(x ~ SubjectID + Activity, data = DF))

但这些当然不起作用。这里的正确方法是什么?


编辑:我想要的是

的输出
xtabs(V1 ~ SubjectID + Activty, data = DF)
xtabs(V2 ~ SubjectID + Activty, data = DF)
xtabs(V3 ~ SubjectID + Activty, data = DF)
...

您应该能够在提供感兴趣的列的字符向量后使用 get

lapply(c("V1", "V2", "V3"), function(x) xtabs(get(x) ~ SubjectID + Activity, data = DF))

使用 "airquality" 数据集进行尝试:

setNames(lapply(names(airquality)[1:4], 
                function(x) xtabs(get(x) ~ Month + Day, airquality)), 
         names(airquality)[1:4])

根据您的意见,如果您需要广泛的数据集,我建议您考虑使用 "data.table" 和 dcasting。

这是一个例子:

set.seed(1)
DF <- cbind(warpbreaks, V2 = sample(100, nrow(warpbreaks)), V3 = sample(100, nrow(warpbreaks)))
library(data.table)
setDT(DF)
lapply(c("breaks", "V2", "V3"), function(x) {
  dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, value.var = x) 
})
# [[1]]
#    wool        L        M        H
# 1:    A 44.55556 24.00000 24.55556
# 2:    B 28.22222 28.77778 18.77778
# 
# [[2]]
#    wool        L        M        H
# 1:    A 59.22222 46.33333 33.22222
# 2:    B 49.44444 44.77778 43.22222
# 
# [[3]]
#    wool  L        M        H
# 1:    A 40 68.11111 74.22222
# 2:    B 48 40.11111 37.77778

或者,您可以使用完全宽的 "data.table",如下所示:

dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, 
      value.var = c("breaks", "V2", "V3"))
#    wool breaks_L breaks_M breaks_H     V2_L     V2_M     V2_H V3_L     V3_M     V3_H
# 1:    A 44.55556 24.00000 24.55556 59.22222 46.33333 33.22222   40 68.11111 74.22222
# 2:    B 28.22222 28.77778 18.77778 49.44444 44.77778 43.22222   48 40.11111 37.77778

使用 tidy 方法,这就是我解决问题的方法:

library(tidyr)
library(dplyr)
library(purrr)

df <- tribble(
  ~SubjectID, ~Activity,       ~V1,         ~V2,         ~V3,
           2,       "S", 0.2571778, -0.02328523, -0.01465376,
           2,       "W", 0.2860267, -0.01316336, -0.11908252,
           3,       "R", 0.2754848, -0.02605042, -0.11815167,
           3,       "W", 0.2702982, -0.03261387, -0.11752018,
           4,       "A", 0.2748330, -0.02784779, -0.12952716,
           4,       "S", 0.2792199, -0.01862040, -0.11390197
)

df %>%
  select(starts_with("V")) %>%
  map(~{
    as_tibble(xtabs(.x ~ SubjectID + Activity, data = df))
  }) %>%
  bind_rows(.id = "var") %>%
  spread(Activity, n)

# # A tibble: 9 x 6
#     var SubjectID           A           R           S           W
# * <chr>     <chr>       <dbl>       <dbl>       <dbl>       <dbl>
# 1    V1         2  0.00000000  0.00000000  0.25717780  0.28602670
# 2    V1         3  0.00000000  0.27548480  0.00000000  0.27029820
# 3    V1         4  0.27483300  0.00000000  0.27921990  0.00000000
# 4    V2         2  0.00000000  0.00000000 -0.02328523 -0.01316336
# 5    V2         3  0.00000000 -0.02605042  0.00000000 -0.03261387
# 6    V2         4 -0.02784779  0.00000000 -0.01862040  0.00000000
# 7    V3         2  0.00000000  0.00000000 -0.01465376 -0.11908252
# 8    V3         3  0.00000000 -0.11815167  0.00000000 -0.11752018
# 9    V3         4 -0.12952716  0.00000000 -0.11390197  0.00000000