对两个变量使用 group_by 进行滚动相关

Question

我正在尝试使用 correlation 包与 group_by 案例进行滚动关联 - 我同时拥有年份和产品 ID。我的解决方案仅适用于产品 ID，但不适用于滚动年份。我可以得到这个滚动工作的任何建议，或者我在 group_by 函数中做错了吗？

library(correlation)
library(dplyr)

dk <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 
2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L
), Products = c("apple", "orange", "melon", 
"apple", "orange", "melon", "apple", 
"orange", "melon", "apple", "orange", 
"melon", "apple", "orange", "melon"
), Quantity = c(35960.58, 9346.44, 18974.56, 45325.8, 12386.41, 20238.13, 
60766.81, 14695.38, 24441.08, 65596.34, 10673.11, 19686.87, 72737.28, 
8183.69, 21953.6), Sales = c(11811, 1300.46, 32134, 11069, 1194.63, 
35909.37, 11408, 1747.29, 40254.61, 12250, 2143.72, 38844.54, 
11937, 2066.28, 40234.98)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

dk %>% 
    group_by(Products) %>%
    correlation(select = c("Quantity", "Sales"))

Group  | Parameter1 | Parameter2 |     r |        95% CI |  t(3) |     p
------------------------------------------------------------------------
apple  |   Quantity |      Sales |  0.44 | [-0.72, 0.95] |  0.86 | 0.455
melon  |   Quantity |      Sales |  0.74 | [-0.41, 0.98] |  1.89 | 0.155
orange |   Quantity |      Sales | -0.23 | [-0.93, 0.82] | -0.42 | 0.705

# How can this work?
dk %>% 
    group_by(Year, Products) %>%
    correlation(select = c("Quantity", "Sales"))

Answer 1

这是一个使用 slider 的解决方案。我假设“滚动”相关性是基于连续三年。

我不得不放弃 correlation::correlation 并使用 cor.test 来获得类似的结果，因为我在使用 slider 创建“滚动”和 correlation 时遇到了麻烦同样的命令。我希望有更简单的方法来做到这一点：

library(slider)

dk %>% 
  group_by(Products) %>%
  summarize(res = slide(
    Year, 
    ~ with(cor.test(x = Quantity[Year %in% .x], y = Sales[Year %in% .x]),
           tibble(
             Years = paste0(.x, collapse = "-"),
             r = estimate,
             `95 % CI` = ifelse(exists("conf.int"), sprintf("[%0.3f, %0.3f]", conf.int[1], conf.int[2]), NA),
             t = statistic,
             df = parameter,
             p = p.value)),
    .before = 1, 
    .after = 1,
    .complete = TRUE)) %>%
  ungroup() %>%
  unnest(res)

但结果看起来很整洁！（虽然没有信心因为我需要4分）

# A tibble: 9 × 7
  Products Years               r `95 % CI`      t    df     p
  <chr>    <chr>           <dbl> <lgl>      <dbl> <int> <dbl>
1 apple    2015-2016-2017 -0.419 NA        -0.462     1 0.724
2 apple    2016-2017-2018  0.860 NA         1.69      1 0.340
3 apple    2017-2018-2019  0.531 NA         0.626     1 0.644
4 melon    2015-2016-2017  0.966 NA         3.75      1 0.166
5 melon    2016-2017-2018  0.675 NA         0.915     1 0.528
6 melon    2017-2018-2019  0.859 NA         1.67      1 0.343
7 orange   2015-2016-2017  0.708 NA         1.00      1 0.499
8 orange   2016-2017-2018 -0.337 NA        -0.358     1 0.781
9 orange   2017-2018-2019 -0.840 NA        -1.55      1 0.365

对两个变量使用 group_by 进行滚动相关

Use group_by on two variables to do a rolling correlation

r

dplyr