筛选 data.table 时查找下一个可用日期

Find the next available date when filtering a data.table

我需要过滤某个日期的 data.table,即始终是该月的第 15 天。如果这是周末,日期将不在我的数据集中。然后它应该切换到 16 号或 17 号,具体取决于 15 号是相应月份的星期六还是星期日。

library(data.table)
library(lubridate)

dt.test <- structure(list(Date = structure(c(18536, 18537, 18540, 18541, 
                                             18542, 18543, 18544, 18547, 18548, 18549, 18550, 18551, 18554, 
                                             18555, 18556, 18557, 18558, 18561, 18562, 18563, 18564, 18565, 
                                             18568, 18569, 18570, 18571, 18572, 18575, 18576, 18577, 18578, 
                                             18579, 18582, 18583, 18584, 18585, 18586, 18589, 18590, 18591, 
                                             18592, 18593, 18596, 18597, 18598, 18599, 18600, 18603, 18604, 
                                             18605, 18606, 18607, 18610, 18611, 18612, 18613, 18614, 18617, 
                                             18618, 18619, 18624, 18625, 18626), class = "Date")
                          , Week.Day = c(5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 
                                         2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 
                                         3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 2, 3, 4), 
                          A = 1:63), row.names = c(NA, -63L), class = c("data.table", 
                                                                        "data.frame"))




dt.test[day(Date) == 15]
Date          Week.Day  A
1: 2020-10-15        5 11
2: 2020-12-15        3 54

预期输出:

        Date  Week.Day  A
1: 2020-10-15        5 11
2: 2020-11-16        2 33
3: 2020-12-15        3 54

我当然可以创建多个 if 条件来首先过滤掉月份,当第 15 天是周末时,但我确信有更优雅的 data.tabledplyr 函数。

这是我的做法,使用 slicewhich.min。您也可以过滤掉任何周末日期,但听起来周末不会出现在您的数据集中。

dt.test %>%
    group_by(year(Date), month(Date)) %>%
    filter(day(Date) >= 15) %>%
    slice(which.min(day(Date) - 15))

  Date       Week.Day     A `year(Date)` `month(Date)`
  <date>        <dbl> <int>        <dbl>         <dbl>
1 2020-10-15        5    11         2020            10
2 2020-11-16        2    33         2020            11
3 2020-12-15        3    54         2020            12

这是一个使用类似逻辑的data.table解决方案

dt.test[day(Date) >= 15, .SD[which.min(day(Date) - 15)], by = .(year(Date), month(Date))]