R tidyverse - 按组关联,将多列与单列进行比较并返回单个数据框
R tidyverse - correlation by group, comparing multiple columns against and single column and returning a single data frame
我有一个包含计数数据的数据集,其结构如下所示:
Sample
ID
Expected
Observed_A
Observed_B
A
id1
10
8
10
A
id2
6
8
4
B
id1
15
12
18
B
id2
1
2
4
我想用 tidyr/dplyr 得到的是每个观察到的计数和预期计数之间的每个样本相关性(即我对每个观察到的列之间的相关性不感兴趣).
Sample
Dataset
Correlation
A
Observed_A
0.99
A
Observed_B
0.93
B
Observed_A
0.89
B
Observed_B
0.91
我可以通过循环来做到这一点,但想知道是否有 'clearer' 方法可以使用 tidyverse 函数?
非常感谢任何帮助!!
这个怎么样:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
d <- tibble::tribble(~Sample, ~ID, ~Expected, ~Observed_A, ~Observed_B,
"A", "id1", 10, 8, 10,
"A", "id2", 6, 8, 4,
"B", "id1", 15, 12, 18,
"B", "id2", 1, 2, 4)
d %>%
group_by(Sample) %>%
summarise(as.data.frame(cor(Expected, cbind(Observed_A, Observed_B)))) %>%
pivot_longer(-Sample, names_to = "Dataset", values_to="Correlation")
#> Warning in cor(Expected, cbind(Observed_A, Observed_B)): the standard deviation
#> is zero
#> # A tibble: 4 × 3
#> Sample Dataset Correlation
#> <chr> <chr> <dbl>
#> 1 A Observed_A NA
#> 2 A Observed_B 1
#> 3 B Observed_A 1
#> 4 B Observed_B 1
由 reprex package (v2.0.1)
于 2022 年 3 月 4 日创建
df %>%
group_by(Sample) %>%
summarize(across(Observed_A:Observed_B, ~cor(.x, Expected))) %>%
pivot_longer(!Sample, values_to = "Correlation", names_to = "Dataset")
我有一个包含计数数据的数据集,其结构如下所示:
Sample | ID | Expected | Observed_A | Observed_B |
---|---|---|---|---|
A | id1 | 10 | 8 | 10 |
A | id2 | 6 | 8 | 4 |
B | id1 | 15 | 12 | 18 |
B | id2 | 1 | 2 | 4 |
我想用 tidyr/dplyr 得到的是每个观察到的计数和预期计数之间的每个样本相关性(即我对每个观察到的列之间的相关性不感兴趣).
Sample | Dataset | Correlation |
---|---|---|
A | Observed_A | 0.99 |
A | Observed_B | 0.93 |
B | Observed_A | 0.89 |
B | Observed_B | 0.91 |
我可以通过循环来做到这一点,但想知道是否有 'clearer' 方法可以使用 tidyverse 函数?
非常感谢任何帮助!!
这个怎么样:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
d <- tibble::tribble(~Sample, ~ID, ~Expected, ~Observed_A, ~Observed_B,
"A", "id1", 10, 8, 10,
"A", "id2", 6, 8, 4,
"B", "id1", 15, 12, 18,
"B", "id2", 1, 2, 4)
d %>%
group_by(Sample) %>%
summarise(as.data.frame(cor(Expected, cbind(Observed_A, Observed_B)))) %>%
pivot_longer(-Sample, names_to = "Dataset", values_to="Correlation")
#> Warning in cor(Expected, cbind(Observed_A, Observed_B)): the standard deviation
#> is zero
#> # A tibble: 4 × 3
#> Sample Dataset Correlation
#> <chr> <chr> <dbl>
#> 1 A Observed_A NA
#> 2 A Observed_B 1
#> 3 B Observed_A 1
#> 4 B Observed_B 1
由 reprex package (v2.0.1)
于 2022 年 3 月 4 日创建df %>%
group_by(Sample) %>%
summarize(across(Observed_A:Observed_B, ~cor(.x, Expected))) %>%
pivot_longer(!Sample, values_to = "Correlation", names_to = "Dataset")