通过匹配名称进行日常滚动关联

Day to day rolling correlations by matching name

假设我有下面的数据框。 (我的数据集不一定这么小。)

library(lubridate)

x <- data.frame(
  date = c(rep(ymd(20160601), 4), rep(ymd(20160602), 3), rep(ymd(20160603), 3)),
  name = c("a", "b", "c", "d", "a", "b", "c", "b", "c", "d"),
  observation = sample(1:10)
)

#          date name observation
# 1  2016-06-01    a          10
# 2  2016-06-01    b           7
# 3  2016-06-01    c           3
# 4  2016-06-01    d           2
# 5  2016-06-02    a           8
# 6  2016-06-02    b           6
# 7  2016-06-02    c           4
# 8  2016-06-03    b           5
# 9  2016-06-03    c           1
# 10 2016-06-03    d           9

我想找到匹配名称观察的日常相关性,即,对于日期 2016-06-02,我想找到 <8, 6, 4> 和 <10, 7 之间的相关性, 3> 因为2016-06-02和2016-06-01都只有a,b,c是common的。我可以这样做(可能有更好的方法来做到这一点):

filter(x, date %in% ymd(20160601)) %>%
  left_join(filter(x, date %in% ymd(20160602)), by = "name") %>%
  transmute(
    date = ymd(20160602),
    correlation = cor(observation.x, observation.y, use = "complete.obs")) %>%
  `[`(1, )

#         date correlation
# 1 2016-06-02   0.9966159

但是我如何使用 window 函数对整个数据框执行此操作,以便我得到一个包含所有日期及其与前一个日期的相关性的数据框?我更喜欢 dplyr/RcppRoll 解决方案!

dplyr 没有滚动合并。假设你确实需要一个(OP 不清楚,因为样本数据没有漏洞),你可以这样做:

library(data.table)
dt = as.data.table(x) # or setDT to convert in place

dt[, date := as.Date(date)] # not very clear from OP if you have dates or datetimes
                            # let's make sure it's dates

dt[.(name = name, old.date = date - 1, obs = observation),
     on = c(name = 'name', date = 'old.date'), roll = T][
   , cor(obs, observation, use = 'pairwise.complete.obs'), by = date]
#         date         V1
#1: 2016-06-01         NA
#2: 2016-06-02  0.9966159
#3: 2016-06-03 -0.5000000