对所有行对应用操作

Question

我有一个这种格式的标题：

   position condition replicate  value
   <dbl>    <chr>     <chr>      <dbl>
 1 10       1         a          0.16
 2 10       1         b          0.21
 3 10       2         a          0.19
 4 10       2         b          0.38
 5 10       3         a          0.12
 6 10       3         b          0.35
 7 20       1         a          0.22
 8 20       1         b          0.24
 9 20       2         a          0.56
10 20       2         b          0.47
11 20       3         a          0.14
12 20       3         b          0.23
 ...

由此我想得到每个位置每对条件的所有重复对之间的差异：

   position  1.a-2.a  1.a-2.b  1.b-2.a  1.b-2.b  1.a-3.a  1.a-3.b  1.b-3.a  1.b-3.b ...
   <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
 1 10        0.13     0.21     0.13     0.16     ...      ...      ...      ...
 2 20        ...      ...      ...      ...      ...      ...      ...      ...
 3 30        ...      ...      ...      ...      ...      ...      ...      ...
 ...

然后汇总得到每个位置每对条件的中位数：

   position  median(abs(1.a-2.a), abs(1.a-2.b), abs(1.b-2.a), abs(1.b-2.b)) ...
   <dbl>     <dbl>
 1 10        0.0161
 2 20        ...
 3 30        ...
 ...

我已经尝试 table %>% spread(replicate, value) 将复制值分布到列中，但我不知道从那里去哪里。我需要使解决方案具有普遍性，因为我不知道我将有多少条件或重复。我怎样才能做到这一点？

编辑：

像这样：

table %>%
  unite(condition.replicate, c(condition, replicate), sep = ".") %>%
  spread(condition.replicate, value) %>% group_by(position) %>%
  uncount(2)

给我这个：

    position  `1.a`   `1.b`  `2.a`  `2.b`  ...
    <dbl>     <dbl>   <dbl>  <dbl>  <dbl>
1   10        0.16    0.14   0.61   0.86
2   10        0.16    0.14   0.61   0.86

也许有一种方法可以重复列，使它们像这样重叠：

    position  `1.a`   `1.b`  `2.a`  `2.b`  `1.a`   `1.b`  `2.a`  `2.b` ...
    <dbl>     <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
1   10        0.16    0.14   0.61   0.86   0.16    0.14   0.61   0.86
2   10        0.16    0.16   0.16   0.16   0.14    0.14   0.14   0.14
    position  `1.a`   `1.a`  `1.a`  `1.a`  `1.b`   `1.b`  `1.b`  `1.b` ...

然后我可以 summarize 得到行之间的差异。

Answer 1

您可以使用嵌套的 lapply:

轻松制作差异矩阵

# Make result matrix
result <- do.call(rbind, 
                  lapply(split(df, df$position), function(x)
                    do.call(c, lapply(x$value, 
                                      function(y) y - x$value))))

名字有点难

# Names for result matrix
df$unique_mix <- paste0(df$condition, ".", df$replicate)
combos <- expand.grid(unique(df$unique_mix), unique(df$unique_mix)) 
colnames(result) <- paste(combos[,1], "-", combos[,2])

但这应该会给你一些你可以使用的东西：

#>    1.a - 1.a 1.b - 1.a 2.a - 1.a 2.b - 1.a 3.a - 1.a 3.b - 1.a 1.a - 1.b 1.b - 1.b
#> 10         0     -0.05     -0.03     -0.22      0.04     -0.19      0.05         0
#> 20         0     -0.02     -0.34     -0.25      0.08     -0.01      0.02         0
#>    2.a - 1.b 2.b - 1.b 3.a - 1.b 3.b - 1.b 1.a - 2.a 1.b - 2.a 2.a - 2.a 2.b - 2.a
#> 10      0.02     -0.17      0.09     -0.14      0.03     -0.02         0     -0.19
#> 20     -0.32     -0.23      0.10      0.01      0.34      0.32         0      0.09
#>    3.a - 2.a 3.b - 2.a 1.a - 2.b 1.b - 2.b 2.a - 2.b 2.b - 2.b 3.a - 2.b 3.b - 2.b
#> 10      0.07     -0.16      0.22      0.17      0.19         0      0.26      0.03
#> 20      0.42      0.33      0.25      0.23     -0.09         0      0.33      0.24
#>    1.a - 3.a 1.b - 3.a 2.a - 3.a 2.b - 3.a 3.a - 3.a 3.b - 3.a 1.a - 3.b 1.b - 3.b
#> 10     -0.04     -0.09     -0.07     -0.26         0     -0.23      0.19      0.14
#> 20     -0.08     -0.10     -0.42     -0.33         0     -0.09      0.01     -0.01
#>    2.a - 3.b 2.b - 3.b 3.a - 3.b 3.b - 3.b
#> 10      0.16     -0.03      0.23         0
#> 20     -0.33     -0.24      0.09         0

Answer 2

如果您正在使用 tidyverse，您可以尝试这种方法。它不是那么好，但它可能会给你想要的。

library(tidyverse)

wide_df <- df %>%
  unite(con_rep, c(condition, replicate), sep = ".") %>%
  pivot_wider(id_cols = position, names_from = con_rep, values_from = value) %>%
  as.data.frame(.)

data.frame(position = wide_df$position, combn(wide_df[-1], 2, function(x) x[,1]-x[,2])) %>%
  setNames(c("position", apply(combn(names(wide_df[-1]), 2), 2, paste0, collapse = "-")))

输出

  position 1.a-1.b 1.a-2.a 1.a-2.b 1.a-3.a 1.a-3.b 1.b-2.a 1.b-2.b 1.b-3.a 1.b-3.b 2.a-2.b 2.a-3.a 2.a-3.b 2.b-3.a
1       10   -0.05   -0.03   -0.22    0.04   -0.19    0.02   -0.17    0.09   -0.14   -0.19    0.07   -0.16    0.26
2       20   -0.02   -0.34   -0.25    0.08   -0.01   -0.32   -0.23    0.10    0.01    0.09    0.42    0.33    0.33
  2.b-3.b 3.a-3.b
1    0.03   -0.23
2    0.24   -0.09

对所有行对应用操作

Apply operation on all pairs of rows

r

dplyr

tidyr

tibble