tidyverse

Question

我想在 tidyverse 中获得跨多个变量的相关矩阵。但是，我想按另一列分组进行此操作。例如。假设我有一个包含列 year 的数据框 df，我想查看 V1、V2、V3 之间的相关性。

year    V1    V2    V3    misc_var
2018    5     6     5     a
2018    4     6     4     b
2018    3     2     3     NA
2013    5     8     2     4
2013    6     3     8     8
2013    4     7     5     NA

我试过了……按照

cor_output = df %>%
  group_by(year) %>%
  select(V1, V2, V3, year) %>%
  cor(use = "pairwise.complete.obs")

但是，它不是计算每年从 V1 到 V3 的相关性，而是将 year 变量添加到相关性中。

所需的输出应该如下所示（请注意输出中的相关性是虚构的）

year    var    V1    V2    V3
2013    V1     1    0.7    0.3
2013    V2     ...    1    ...
...
...
2018    V2    0.6    1    0.7
...

有什么想法吗？

Answer 1

一种方法是将 corrr package 与 purrr::nest() 结合使用：

library(tidyverse)
library(corrr)

df <- tribble(
    ~year, ~V1, ~V2, ~V3, ~misc_var,
     2018,   5,   6,   5,       "a",
     2018,   4,   6,   4,       "b",
     2018,   3,   2,   3,        NA,
     2013,   5,   8,   2,       "4",
     2013,   6,   3,   8,       "8",
     2013,   4,   7,   5,        NA
    )

df %>%
  select_if(is.numeric) %>%
  group_by(year) %>%
  nest() %>%
  mutate(
    correlations = map(data, correlate)
  ) %>%
  unnest(correlations)
#> 
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'
#> 
#> 
#> Correlation method: 'pearson'
#> Missing treated using: 'pairwise.complete.obs'
#> # A tibble: 6 x 5
#>    year rowname     V1     V2     V3
#>   <dbl> <chr>    <dbl>  <dbl>  <dbl>
#> 1  2018 V1      NA      0.866  1    
#> 2  2018 V2       0.866 NA      0.866
#> 3  2018 V3       1      0.866 NA    
#> 4  2013 V1      NA     -0.756  0.5  
#> 5  2013 V2      -0.756 NA     -0.945
#> 6  2013 V3       0.5   -0.945 NA

或者，您可以使用 dplyr:

中更具实验性的 group_map 或 group_modify 函数

df %>%
  select_if(is.numeric) %>%
  group_by(year) %>%
  group_map(~ correlate(.x))      # or group_modify(~ correlate(.x))

Answer 2

笼统地说：

dataframe %>%
  select(grouping_variable, columns) %>%
  group_by(grouping_variable) %>%
  group_modify(~ corrr::correlate(.x))

其中 columns 可能是 c(col_1, col_2, ...) 或 col_1:col_10

tidyverse - 按其他列分组的多列之间的相关性

tidyverse - Correlations among multiple columns grouped by other column

r

correlation