通过匹配名称进行日常滚动关联
Day to day rolling correlations by matching name
假设我有下面的数据框。 (我的数据集不一定这么小。)
library(lubridate)
x <- data.frame(
date = c(rep(ymd(20160601), 4), rep(ymd(20160602), 3), rep(ymd(20160603), 3)),
name = c("a", "b", "c", "d", "a", "b", "c", "b", "c", "d"),
observation = sample(1:10)
)
# date name observation
# 1 2016-06-01 a 10
# 2 2016-06-01 b 7
# 3 2016-06-01 c 3
# 4 2016-06-01 d 2
# 5 2016-06-02 a 8
# 6 2016-06-02 b 6
# 7 2016-06-02 c 4
# 8 2016-06-03 b 5
# 9 2016-06-03 c 1
# 10 2016-06-03 d 9
我想找到匹配名称观察的日常相关性,即,对于日期 2016-06-02,我想找到 <8, 6, 4> 和 <10, 7 之间的相关性, 3> 因为2016-06-02和2016-06-01都只有a,b,c是common的。我可以这样做(可能有更好的方法来做到这一点):
filter(x, date %in% ymd(20160601)) %>%
left_join(filter(x, date %in% ymd(20160602)), by = "name") %>%
transmute(
date = ymd(20160602),
correlation = cor(observation.x, observation.y, use = "complete.obs")) %>%
`[`(1, )
# date correlation
# 1 2016-06-02 0.9966159
但是我如何使用 window 函数对整个数据框执行此操作,以便我得到一个包含所有日期及其与前一个日期的相关性的数据框?我更喜欢 dplyr/RcppRoll 解决方案!
dplyr
没有滚动合并。假设你确实需要一个(OP 不清楚,因为样本数据没有漏洞),你可以这样做:
library(data.table)
dt = as.data.table(x) # or setDT to convert in place
dt[, date := as.Date(date)] # not very clear from OP if you have dates or datetimes
# let's make sure it's dates
dt[.(name = name, old.date = date - 1, obs = observation),
on = c(name = 'name', date = 'old.date'), roll = T][
, cor(obs, observation, use = 'pairwise.complete.obs'), by = date]
# date V1
#1: 2016-06-01 NA
#2: 2016-06-02 0.9966159
#3: 2016-06-03 -0.5000000
假设我有下面的数据框。 (我的数据集不一定这么小。)
library(lubridate)
x <- data.frame(
date = c(rep(ymd(20160601), 4), rep(ymd(20160602), 3), rep(ymd(20160603), 3)),
name = c("a", "b", "c", "d", "a", "b", "c", "b", "c", "d"),
observation = sample(1:10)
)
# date name observation
# 1 2016-06-01 a 10
# 2 2016-06-01 b 7
# 3 2016-06-01 c 3
# 4 2016-06-01 d 2
# 5 2016-06-02 a 8
# 6 2016-06-02 b 6
# 7 2016-06-02 c 4
# 8 2016-06-03 b 5
# 9 2016-06-03 c 1
# 10 2016-06-03 d 9
我想找到匹配名称观察的日常相关性,即,对于日期 2016-06-02,我想找到 <8, 6, 4> 和 <10, 7 之间的相关性, 3> 因为2016-06-02和2016-06-01都只有a,b,c是common的。我可以这样做(可能有更好的方法来做到这一点):
filter(x, date %in% ymd(20160601)) %>%
left_join(filter(x, date %in% ymd(20160602)), by = "name") %>%
transmute(
date = ymd(20160602),
correlation = cor(observation.x, observation.y, use = "complete.obs")) %>%
`[`(1, )
# date correlation
# 1 2016-06-02 0.9966159
但是我如何使用 window 函数对整个数据框执行此操作,以便我得到一个包含所有日期及其与前一个日期的相关性的数据框?我更喜欢 dplyr/RcppRoll 解决方案!
dplyr
没有滚动合并。假设你确实需要一个(OP 不清楚,因为样本数据没有漏洞),你可以这样做:
library(data.table)
dt = as.data.table(x) # or setDT to convert in place
dt[, date := as.Date(date)] # not very clear from OP if you have dates or datetimes
# let's make sure it's dates
dt[.(name = name, old.date = date - 1, obs = observation),
on = c(name = 'name', date = 'old.date'), roll = T][
, cor(obs, observation, use = 'pairwise.complete.obs'), by = date]
# date V1
#1: 2016-06-01 NA
#2: 2016-06-02 0.9966159
#3: 2016-06-03 -0.5000000