对两个变量使用 group_by 进行滚动相关
Use group_by on two variables to do a rolling correlation
我正在尝试使用 correlation
包与 group_by
案例进行滚动关联 - 我同时拥有年份和产品 ID。我的解决方案仅适用于产品 ID,但不适用于滚动年份。我可以得到这个滚动工作的任何建议,或者我在 group_by
函数中做错了吗?
library(correlation)
library(dplyr)
dk <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L
), Products = c("apple", "orange", "melon",
"apple", "orange", "melon", "apple",
"orange", "melon", "apple", "orange",
"melon", "apple", "orange", "melon"
), Quantity = c(35960.58, 9346.44, 18974.56, 45325.8, 12386.41, 20238.13,
60766.81, 14695.38, 24441.08, 65596.34, 10673.11, 19686.87, 72737.28,
8183.69, 21953.6), Sales = c(11811, 1300.46, 32134, 11069, 1194.63,
35909.37, 11408, 1747.29, 40254.61, 12250, 2143.72, 38844.54,
11937, 2066.28, 40234.98)), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
dk %>%
group_by(Products) %>%
correlation(select = c("Quantity", "Sales"))
Group | Parameter1 | Parameter2 | r | 95% CI | t(3) | p
------------------------------------------------------------------------
apple | Quantity | Sales | 0.44 | [-0.72, 0.95] | 0.86 | 0.455
melon | Quantity | Sales | 0.74 | [-0.41, 0.98] | 1.89 | 0.155
orange | Quantity | Sales | -0.23 | [-0.93, 0.82] | -0.42 | 0.705
# How can this work?
dk %>%
group_by(Year, Products) %>%
correlation(select = c("Quantity", "Sales"))
这是一个使用 slider
的解决方案。我假设“滚动”相关性是基于连续三年。
我不得不放弃 correlation::correlation
并使用 cor.test
来获得类似的结果,因为我在使用 slider
创建“滚动”和 correlation
时遇到了麻烦同样的命令。我希望有更简单的方法来做到这一点:
library(slider)
dk %>%
group_by(Products) %>%
summarize(res = slide(
Year,
~ with(cor.test(x = Quantity[Year %in% .x], y = Sales[Year %in% .x]),
tibble(
Years = paste0(.x, collapse = "-"),
r = estimate,
`95 % CI` = ifelse(exists("conf.int"), sprintf("[%0.3f, %0.3f]", conf.int[1], conf.int[2]), NA),
t = statistic,
df = parameter,
p = p.value)),
.before = 1,
.after = 1,
.complete = TRUE)) %>%
ungroup() %>%
unnest(res)
但结果看起来很整洁! (虽然没有信心因为我需要4分)
# A tibble: 9 × 7
Products Years r `95 % CI` t df p
<chr> <chr> <dbl> <lgl> <dbl> <int> <dbl>
1 apple 2015-2016-2017 -0.419 NA -0.462 1 0.724
2 apple 2016-2017-2018 0.860 NA 1.69 1 0.340
3 apple 2017-2018-2019 0.531 NA 0.626 1 0.644
4 melon 2015-2016-2017 0.966 NA 3.75 1 0.166
5 melon 2016-2017-2018 0.675 NA 0.915 1 0.528
6 melon 2017-2018-2019 0.859 NA 1.67 1 0.343
7 orange 2015-2016-2017 0.708 NA 1.00 1 0.499
8 orange 2016-2017-2018 -0.337 NA -0.358 1 0.781
9 orange 2017-2018-2019 -0.840 NA -1.55 1 0.365
我正在尝试使用 correlation
包与 group_by
案例进行滚动关联 - 我同时拥有年份和产品 ID。我的解决方案仅适用于产品 ID,但不适用于滚动年份。我可以得到这个滚动工作的任何建议,或者我在 group_by
函数中做错了吗?
library(correlation)
library(dplyr)
dk <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L
), Products = c("apple", "orange", "melon",
"apple", "orange", "melon", "apple",
"orange", "melon", "apple", "orange",
"melon", "apple", "orange", "melon"
), Quantity = c(35960.58, 9346.44, 18974.56, 45325.8, 12386.41, 20238.13,
60766.81, 14695.38, 24441.08, 65596.34, 10673.11, 19686.87, 72737.28,
8183.69, 21953.6), Sales = c(11811, 1300.46, 32134, 11069, 1194.63,
35909.37, 11408, 1747.29, 40254.61, 12250, 2143.72, 38844.54,
11937, 2066.28, 40234.98)), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
dk %>%
group_by(Products) %>%
correlation(select = c("Quantity", "Sales"))
Group | Parameter1 | Parameter2 | r | 95% CI | t(3) | p
------------------------------------------------------------------------
apple | Quantity | Sales | 0.44 | [-0.72, 0.95] | 0.86 | 0.455
melon | Quantity | Sales | 0.74 | [-0.41, 0.98] | 1.89 | 0.155
orange | Quantity | Sales | -0.23 | [-0.93, 0.82] | -0.42 | 0.705
# How can this work?
dk %>%
group_by(Year, Products) %>%
correlation(select = c("Quantity", "Sales"))
这是一个使用 slider
的解决方案。我假设“滚动”相关性是基于连续三年。
我不得不放弃 correlation::correlation
并使用 cor.test
来获得类似的结果,因为我在使用 slider
创建“滚动”和 correlation
时遇到了麻烦同样的命令。我希望有更简单的方法来做到这一点:
library(slider)
dk %>%
group_by(Products) %>%
summarize(res = slide(
Year,
~ with(cor.test(x = Quantity[Year %in% .x], y = Sales[Year %in% .x]),
tibble(
Years = paste0(.x, collapse = "-"),
r = estimate,
`95 % CI` = ifelse(exists("conf.int"), sprintf("[%0.3f, %0.3f]", conf.int[1], conf.int[2]), NA),
t = statistic,
df = parameter,
p = p.value)),
.before = 1,
.after = 1,
.complete = TRUE)) %>%
ungroup() %>%
unnest(res)
但结果看起来很整洁! (虽然没有信心因为我需要4分)
# A tibble: 9 × 7
Products Years r `95 % CI` t df p
<chr> <chr> <dbl> <lgl> <dbl> <int> <dbl>
1 apple 2015-2016-2017 -0.419 NA -0.462 1 0.724
2 apple 2016-2017-2018 0.860 NA 1.69 1 0.340
3 apple 2017-2018-2019 0.531 NA 0.626 1 0.644
4 melon 2015-2016-2017 0.966 NA 3.75 1 0.166
5 melon 2016-2017-2018 0.675 NA 0.915 1 0.528
6 melon 2017-2018-2019 0.859 NA 1.67 1 0.343
7 orange 2015-2016-2017 0.708 NA 1.00 1 0.499
8 orange 2016-2017-2018 -0.337 NA -0.358 1 0.781
9 orange 2017-2018-2019 -0.840 NA -1.55 1 0.365