R中的二元变量组合分析

Question

我有一个数据集，里面有很多二进制变量。为了便于说明，这里是一个只有 4 个变量的较小版本：

set.seed(5)
my_data<-data.frame("Slept Well"=sample(c(0,1),10,TRUE),
                    "Had Breakfast"=sample(c(0,1),10,TRUE),
                    "Worked out"=sample(c(0,1),10,TRUE),
                    "Meditated"=sample(c(0,1),10,TRUE))

在上面，每一行对应一个观察值。我有兴趣分析每个唯一变量组合的频率。例如，有多少观察表明他们都睡得很好并且冥想，但没有吃早餐或锻炼？

我希望能够按照从最常出现到最不经常出现的顺序对唯一组合进行排名。编写代码的最佳方法是什么？

Answer 1

dplyr 解决方案怎么样：

library(dplyr)
   
    my_data %>%
  # group it
  group_by_all() %>%
  # frequencies
  summarise(freq = n()) %>%
  # order decreasing
  arrange(-freq)

# A tibble: 9 x 5
  Slept.Well Had.Breakfast Worked.out Meditated  freq
  <chr>      <chr>         <chr>      <chr>     <int>
1 0          1             1          0             2
2 0          0             0          0             1
3 0          0             0          1             1
4 0          0             1          0             1
5 0          0             1          1             1
6 0          1             0          1             1
7 0          1             1          1             1
8 1          0             0          1             1
9 1          1             0          0             1

或 data.table:

res <- setorder(data.table(my_data)[,"."(freq = .N), by = names(my_data)],-freq)
res
   Slept.Well Had.Breakfast Worked.out Meditated freq
1:          0             1          1         0    2
2:          1             0          0         1    1
3:          0             0          1         0    1
4:          0             0          0         0    1
5:          0             1          0         1    1
6:          0             1          1         1    1
7:          0             0          1         1    1
8:          0             0          0         1    1
9:          1             1          0         0    1

Answer 2

您可以使用 aggregate.

x <- aggregate(list(n=rep(1, nrow(my_data))), my_data, length)
#x <- aggregate(list(n=my_data[,1]), my_data, length) #Alternative
x[order(-x$n),]
#  Slept.Well Had.Breakfast Worked.out Meditated n
#4          0             1          1         0 2
#1          0             0          0         0 1
#2          1             1          0         0 1
#3          0             0          1         0 1
#5          0             0          0         1 1
#6          1             0          0         1 1
#7          0             1          0         1 1
#8          0             0          1         1 1
#9          0             1          1         1 1

R中的二元变量组合分析

Binary Variables Combinations Analysis in R

binary

combinations

r

frequency

dataframe