根据成对的 data/every 两行计算和过滤数据？

Question

正在尝试设置 McNemar 测试，但我不能很好地编码（使用 R）

我的数据是成对的，它有 1000 对长，所以我有一列指定对数，例如

 c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4)

一列指定哪对成员在控制组或治疗组中（每对都有每个玩家，但顺序随机），如：

c(0, 1, 1, 0, 1, 0, 0, 1, 0, 1)

并且有一个名为 response 的列，在该列中，这对成员中的任何一个、一个或两个都不会收到如下响应：

c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1)

我正在尝试创建一个计算结果的矩阵，例如：

a <- count of pairs in which both members received a response
b <- count of pairs in which the control only received a response
c <- treatment only response
d <- Neither response
matrix(c(a, b, c, d), 2, 2)

我可以使用哪些代码行运行过滤我的数据以获得 a、b、c 和 d？我一直在尝试使用 tidyverse 包，所以它可能是 base R 或 tidyverse

Answer 1

tidyverse/dplyr 的这种方法有效：

1.Loading 你的数据:

library(tidyverse)

pair <- c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4)
treat <- c(0, 1, 1, 0, 1, 0, 0, 1, 0, 1)
response <- c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1)
data <- data.frame(pair, treat, response)

2。计算你想要的计数：

d <- data %>% group_by(pair) %>%
    mutate(total_response = sum(response)) %>%
    ungroup() %>% mutate(a = case_when(
        total_response==2 ~ 1,
        TRUE ~ 0),
        b = case_when(
            total_response==1 & treat==0 & response == 1 ~ 1,
        TRUE ~ 0),
        c = case_when(
            total_response==1 & treat==1 & response == 1  ~ 1,
        TRUE ~ 0), 
        d = case_when(
            total_response == 0 ~ 1,
        TRUE ~ 0)) %>% group_by(pair) %>%
    summarise(a = max(a),
              b = max(b),
              c = max(c),
              d = max(d)) %>%
    ungroup() %>%
    summarise(a = sum(a),
              b = sum(b),
              c = sum(c),
              d = sum(d))

3。您的矩阵：

matrix(c(d$a, d$b, d$c, d$d), 2, 2)

4.解释计算：

首先，您对成对分组的回复求和；
然后，你解组，当有两个成对的响应时，a=1；当一个响应和控制响应时，b=1；当一个反应和治疗反应时，c=1；当没有反应时，d=1;
然后再分组，取每个字母值的最大值，所以你只能得到一个字母值对；
最后，您取消分组并对每个变量求和（相当于计算每个变量的个数）；

Answer 2

假设您的数据框如下所示

> d
   group treatment response
1      0         0        0
2      0         1        1
3      1         1        1
4      1         0        1
5      2         1        1
6      2         0        0
7      3         0        0
8      3         1        0
9      4         0        0
10     4         1        1

那你可以试试这个

d <- within(d, {
  response <- factor(response, levels = c(1, 0), labels = c("positive", "negative"))
  treatment <- as.logical(treatment)
})

with(d, table(response[!treatment], response[treatment], dnn = c("control", "treatment")))

输出

          treatment
control    positive negative
  positive        1        0
  negative        3        1

Answer 3

这是使用 dplyr R 包的方法：

library(dplyr)

# your data
df <- data.frame(
    pair = c(0 , 0 , 1, 1, 2, 2, 3, 3, 4, 4), 
    treatment = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 1), 
    response = c(0, 1, 1, 1, 1, 0, 0, 0, 0, 1))

# data management
df2 <- df %>% 
    group_by(pair) %>% 
    arrange(treatment) %>% 
    summarise_all(funs(toString(na.omit(.))))
df2
## A tibble: 5 x 3
#   pair treatment response
#  <dbl> <chr>     <chr>   
#1     0 0, 1      0, 1    
#2     1 0, 1      1, 1    
#3     2 0, 1      0, 1    
#4     3 0, 1      0, 0    
#5     4 0, 1      0, 1 

# contingency table
df2 %>% summarise(
    a = sum(response == '1, 1'), # count of pairs in which both members received a response
    b = sum(response == '1, 0'), # count of pairs in which the control only received a response
    c = sum(response == '0, 1'), # count of pairs in which the treatment only received a response
    d = sum(response == '0, 0')  # count of pairs in which neither members received a response
) %>% matrix(2,2)
#     [,1] [,2]
#[1,] 1    3   
#[2,] 0    1

说明：数据管理

此处的目标是使用 summarise_all(funs(toString(na.omit(.)))) 折叠成对行的响应值。这将允许您确定数据中有多少配对的 c(1, 1)、c(1, 0)、c(0, 1) 和 c(0, 0) 响应。

group_by(pair) 使所有进一步的操作都在 pair 组内完成。
arrange(treatment) 根据 treatment 列（在每个 pair 组内）对行重新排序，以便每对的控制反应和治疗反应的顺序始终相同——即配对反应总是先控制，再治疗。
summarise_all(funs(toString(na.omit(.)))) 将所有非 NA 元素（在每个 pair 组内）连接成一行。

特别是因为 group_by(pair) 和 summarise_all(...)，df2 有一行对应每个 pair 标识符。

解释：偶然事件Table

在 summarise(...) 内，每个 TRUE 响应条件的计数都分配给它们各自的向量。偶然事件 table（矩阵）是根据计数创建的，与所讨论的 matrix(c(a, b, c, d), 2, 2) 的组织方式相同。

根据成对的 data/every 两行计算和过滤数据？

Count and filter data based on paired data/every two rows?

r

dataframe

data-cleaning

data-wrangling