筛选 data.table 时查找下一个可用日期
Find the next available date when filtering a data.table
我需要过滤某个日期的 data.table
,即始终是该月的第 15 天。如果这是周末,日期将不在我的数据集中。然后它应该切换到 16 号或 17 号,具体取决于 15 号是相应月份的星期六还是星期日。
library(data.table)
library(lubridate)
dt.test <- structure(list(Date = structure(c(18536, 18537, 18540, 18541,
18542, 18543, 18544, 18547, 18548, 18549, 18550, 18551, 18554,
18555, 18556, 18557, 18558, 18561, 18562, 18563, 18564, 18565,
18568, 18569, 18570, 18571, 18572, 18575, 18576, 18577, 18578,
18579, 18582, 18583, 18584, 18585, 18586, 18589, 18590, 18591,
18592, 18593, 18596, 18597, 18598, 18599, 18600, 18603, 18604,
18605, 18606, 18607, 18610, 18611, 18612, 18613, 18614, 18617,
18618, 18619, 18624, 18625, 18626), class = "Date")
, Week.Day = c(5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6,
2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2,
3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 2, 3, 4),
A = 1:63), row.names = c(NA, -63L), class = c("data.table",
"data.frame"))
dt.test[day(Date) == 15]
Date Week.Day A
1: 2020-10-15 5 11
2: 2020-12-15 3 54
预期输出:
Date Week.Day A
1: 2020-10-15 5 11
2: 2020-11-16 2 33
3: 2020-12-15 3 54
我当然可以创建多个 if 条件来首先过滤掉月份,当第 15 天是周末时,但我确信有更优雅的 data.table
或 dplyr
函数。
这是我的做法,使用 slice
和 which.min
。您也可以过滤掉任何周末日期,但听起来周末不会出现在您的数据集中。
dt.test %>%
group_by(year(Date), month(Date)) %>%
filter(day(Date) >= 15) %>%
slice(which.min(day(Date) - 15))
Date Week.Day A `year(Date)` `month(Date)`
<date> <dbl> <int> <dbl> <dbl>
1 2020-10-15 5 11 2020 10
2 2020-11-16 2 33 2020 11
3 2020-12-15 3 54 2020 12
这是一个使用类似逻辑的data.table解决方案
dt.test[day(Date) >= 15, .SD[which.min(day(Date) - 15)], by = .(year(Date), month(Date))]
我需要过滤某个日期的 data.table
,即始终是该月的第 15 天。如果这是周末,日期将不在我的数据集中。然后它应该切换到 16 号或 17 号,具体取决于 15 号是相应月份的星期六还是星期日。
library(data.table)
library(lubridate)
dt.test <- structure(list(Date = structure(c(18536, 18537, 18540, 18541,
18542, 18543, 18544, 18547, 18548, 18549, 18550, 18551, 18554,
18555, 18556, 18557, 18558, 18561, 18562, 18563, 18564, 18565,
18568, 18569, 18570, 18571, 18572, 18575, 18576, 18577, 18578,
18579, 18582, 18583, 18584, 18585, 18586, 18589, 18590, 18591,
18592, 18593, 18596, 18597, 18598, 18599, 18600, 18603, 18604,
18605, 18606, 18607, 18610, 18611, 18612, 18613, 18614, 18617,
18618, 18619, 18624, 18625, 18626), class = "Date")
, Week.Day = c(5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6,
2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2,
3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 2, 3, 4),
A = 1:63), row.names = c(NA, -63L), class = c("data.table",
"data.frame"))
dt.test[day(Date) == 15]
Date Week.Day A
1: 2020-10-15 5 11
2: 2020-12-15 3 54
预期输出:
Date Week.Day A
1: 2020-10-15 5 11
2: 2020-11-16 2 33
3: 2020-12-15 3 54
我当然可以创建多个 if 条件来首先过滤掉月份,当第 15 天是周末时,但我确信有更优雅的 data.table
或 dplyr
函数。
这是我的做法,使用 slice
和 which.min
。您也可以过滤掉任何周末日期,但听起来周末不会出现在您的数据集中。
dt.test %>%
group_by(year(Date), month(Date)) %>%
filter(day(Date) >= 15) %>%
slice(which.min(day(Date) - 15))
Date Week.Day A `year(Date)` `month(Date)`
<date> <dbl> <int> <dbl> <dbl>
1 2020-10-15 5 11 2020 10
2 2020-11-16 2 33 2020 11
3 2020-12-15 3 54 2020 12
这是一个使用类似逻辑的data.table解决方案
dt.test[day(Date) >= 15, .SD[which.min(day(Date) - 15)], by = .(year(Date), month(Date))]