根据发生的天数生成给定频率的日期序列
Generate sequence of dates for given frequency as per days of occurence
尝试在 R 编程(使用 lubridate)中生成具有给定开始日期和频率的日期序列不是数值,而是日期可能出现的天数。
给出的是下面的table,其中定义了组、开始日期、日期和发生标志
+-------+------------+-----+-----+
| Group | start_date | Day | Y/N |
+-------+------------+-----+-----+
| foo | 02-06-2021 | Mon | 0 |
| foo | 02-06-2021 | Tue | 1 |
| foo | 02-06-2021 | Wed | 0 |
| foo | 02-06-2021 | Thu | 1 |
| foo | 02-06-2021 | Fri | 1 |
| foo | 02-06-2021 | Sat | 1 |
| foo | 02-06-2021 | Sun | 0 |
| bar | 02-06-2021 | Mon | 1 |
| bar | 02-06-2021 | Tue | 0 |
| bar | 02-06-2021 | Wed | 0 |
| bar | 02-06-2021 | Thu | 1 |
| bar | 02-06-2021 | Fri | 1 |
| bar | 02-06-2021 | Sat | 0 |
| bar | 02-06-2021 | Sun | 0 |
+-------+------------+-----+-----+
要求的输出如下。
+-------+------------+---------------------+
| Group | given_date | next_available_date |
+-------+------------+---------------------+
| foo | 02-06-2021 | 03-06-2021 |
| foo | 04-06-2021 | 04-06-2021 |
| foo | 06-06-2021 | 08-06-2021 |
| bar | 02-06-2021 | 03-06-2021 |
| bar | 05-06-2021 | 07-06-2021 |
+-------+------------+---------------------+
关于 while 循环的一些想法我认为可能很累。
for each given_date{
inputdate = given_date
while(true){
{
if(group =="Foo" & day(inputdate) in ('Tue','Thu','Fri','Sat')
next_available_date=inputdate
break
}
else
{
inputdate = inputdate+(1 day) (repeat the loop until if condition is satisfied)
}
}
}
如果不同组的条件可能不同。
无法弄清楚如何利用不均匀的频率来获得下一个可用日期。
正在研究更大的样本,如前面评论中所讨论的那样。策略遵循 -
- 由于您的
day
列始终从 Mon
开始,这不等于 start_date
因此需要匹配 weekday
的列。
- 因此创建了
day
字段以排序 factor
类型,以便可以将其操作为整数。
- 以这样的方式安排数据框,使您的每个组仅从那天开始。为此
使用模除法%%
- 布置任务后就轻松多了。我为每个工作日结束创建了七个日期,每个组和每个 start_date.
- 过滤掉任何地方
Y/N
为 0 的行。
- 现在你只需要顶行所以用了
slice_head()
df <- data.frame(
stringsAsFactors = FALSE,
Group = c("foo","foo","foo",
"foo","foo","foo","foo","foo","foo","foo",
"foo","foo","foo","foo","foo","foo","foo",
"foo","foo","foo","foo","bar","bar","bar",
"bar","bar","bar","bar","bar","bar","bar","bar",
"bar","bar","bar"),
start_date = c("02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","04-06-2021",
"04-06-2021","04-06-2021","04-06-2021","04-06-2021",
"04-06-2021","04-06-2021","06-06-2021","06-06-2021",
"06-06-2021","06-06-2021","06-06-2021",
"06-06-2021","06-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","05-06-2021","05-06-2021",
"05-06-2021","05-06-2021","05-06-2021","05-06-2021",
"05-06-2021"),
Day = c("Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed","Thu",
"Fri","Sat","Sun"),
y_n = c(0L,1L,0L,1L,1L,
1L,0L,0L,1L,0L,1L,1L,1L,0L,0L,1L,0L,1L,
1L,1L,0L,1L,0L,0L,1L,1L,0L,0L,1L,0L,
0L,1L,1L,0L,0L)
)
library(lubridate)
library(tidyverse)
df %>% group_by(Group, start_date) %>%
mutate(Day = factor(Day, levels = Day, ordered = T)) %>%
arrange(Group, (as.numeric(Day) + 7 - wday(dmy(start_date), week_start = 1)) %% 7, .by_group = T) %>%
mutate(next_available_date = dmy(start_date) + 0:6) %>%
filter(y_n !=0) %>%
slice_head()
#> # A tibble: 5 x 5
#> # Groups: Group, start_date [5]
#> Group start_date Day y_n next_available_date
#> <chr> <chr> <ord> <int> <date>
#> 1 bar 02-06-2021 Thu 1 2021-06-03
#> 2 bar 05-06-2021 Mon 1 2021-06-07
#> 3 foo 02-06-2021 Thu 1 2021-06-03
#> 4 foo 04-06-2021 Fri 1 2021-06-04
#> 5 foo 06-06-2021 Tue 1 2021-06-08
根据提供的数据
df <- data.frame(
stringsAsFactors = FALSE,
Group = c("foo","foo","foo",
"foo","foo","foo","foo","bar","bar","bar",
"bar","bar","bar","bar"),
start_date = c("02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021"),
Day = c("Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun"),
y_n = c(0L,1L,0L,1L,1L,
1L,0L,1L,0L,0L,1L,1L,0L,0L)
)
library(lubridate)
library(tidyverse)
df %>% group_by(Group, start_date) %>%
mutate(Day = factor(Day, levels = Day, ordered = T)) %>%
arrange(Group, (as.numeric(Day) + 7 - wday(dmy(start_date), week_start = 1)) %% 7, .by_group = T) %>%
mutate(next_available_date = dmy(start_date) + 0:6) %>%
filter(y_n !=0) %>%
slice_head()
#> # A tibble: 2 x 5
#> # Groups: Group, start_date [2]
#> Group start_date Day y_n next_available_date
#> <chr> <chr> <ord> <int> <date>
#> 1 bar 02-06-2021 Thu 1 2021-06-03
#> 2 foo 02-06-2021 Thu 1 2021-06-03
由 reprex package (v2.0.0)
创建于 2021-06-02
尝试在 R 编程(使用 lubridate)中生成具有给定开始日期和频率的日期序列不是数值,而是日期可能出现的天数。
给出的是下面的table,其中定义了组、开始日期、日期和发生标志
+-------+------------+-----+-----+
| Group | start_date | Day | Y/N |
+-------+------------+-----+-----+
| foo | 02-06-2021 | Mon | 0 |
| foo | 02-06-2021 | Tue | 1 |
| foo | 02-06-2021 | Wed | 0 |
| foo | 02-06-2021 | Thu | 1 |
| foo | 02-06-2021 | Fri | 1 |
| foo | 02-06-2021 | Sat | 1 |
| foo | 02-06-2021 | Sun | 0 |
| bar | 02-06-2021 | Mon | 1 |
| bar | 02-06-2021 | Tue | 0 |
| bar | 02-06-2021 | Wed | 0 |
| bar | 02-06-2021 | Thu | 1 |
| bar | 02-06-2021 | Fri | 1 |
| bar | 02-06-2021 | Sat | 0 |
| bar | 02-06-2021 | Sun | 0 |
+-------+------------+-----+-----+
要求的输出如下。
+-------+------------+---------------------+
| Group | given_date | next_available_date |
+-------+------------+---------------------+
| foo | 02-06-2021 | 03-06-2021 |
| foo | 04-06-2021 | 04-06-2021 |
| foo | 06-06-2021 | 08-06-2021 |
| bar | 02-06-2021 | 03-06-2021 |
| bar | 05-06-2021 | 07-06-2021 |
+-------+------------+---------------------+
关于 while 循环的一些想法我认为可能很累。
for each given_date{
inputdate = given_date
while(true){
{
if(group =="Foo" & day(inputdate) in ('Tue','Thu','Fri','Sat')
next_available_date=inputdate
break
}
else
{
inputdate = inputdate+(1 day) (repeat the loop until if condition is satisfied)
}
}
}
如果不同组的条件可能不同。
无法弄清楚如何利用不均匀的频率来获得下一个可用日期。
正在研究更大的样本,如前面评论中所讨论的那样。策略遵循 -
- 由于您的
day
列始终从Mon
开始,这不等于start_date
因此需要匹配weekday
的列。 - 因此创建了
day
字段以排序factor
类型,以便可以将其操作为整数。 - 以这样的方式安排数据框,使您的每个组仅从那天开始。为此 使用模除法
- 布置任务后就轻松多了。我为每个工作日结束创建了七个日期,每个组和每个 start_date.
- 过滤掉任何地方
Y/N
为 0 的行。 - 现在你只需要顶行所以用了
slice_head()
%%
df <- data.frame(
stringsAsFactors = FALSE,
Group = c("foo","foo","foo",
"foo","foo","foo","foo","foo","foo","foo",
"foo","foo","foo","foo","foo","foo","foo",
"foo","foo","foo","foo","bar","bar","bar",
"bar","bar","bar","bar","bar","bar","bar","bar",
"bar","bar","bar"),
start_date = c("02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","04-06-2021",
"04-06-2021","04-06-2021","04-06-2021","04-06-2021",
"04-06-2021","04-06-2021","06-06-2021","06-06-2021",
"06-06-2021","06-06-2021","06-06-2021",
"06-06-2021","06-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","05-06-2021","05-06-2021",
"05-06-2021","05-06-2021","05-06-2021","05-06-2021",
"05-06-2021"),
Day = c("Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed","Thu",
"Fri","Sat","Sun"),
y_n = c(0L,1L,0L,1L,1L,
1L,0L,0L,1L,0L,1L,1L,1L,0L,0L,1L,0L,1L,
1L,1L,0L,1L,0L,0L,1L,1L,0L,0L,1L,0L,
0L,1L,1L,0L,0L)
)
library(lubridate)
library(tidyverse)
df %>% group_by(Group, start_date) %>%
mutate(Day = factor(Day, levels = Day, ordered = T)) %>%
arrange(Group, (as.numeric(Day) + 7 - wday(dmy(start_date), week_start = 1)) %% 7, .by_group = T) %>%
mutate(next_available_date = dmy(start_date) + 0:6) %>%
filter(y_n !=0) %>%
slice_head()
#> # A tibble: 5 x 5
#> # Groups: Group, start_date [5]
#> Group start_date Day y_n next_available_date
#> <chr> <chr> <ord> <int> <date>
#> 1 bar 02-06-2021 Thu 1 2021-06-03
#> 2 bar 05-06-2021 Mon 1 2021-06-07
#> 3 foo 02-06-2021 Thu 1 2021-06-03
#> 4 foo 04-06-2021 Fri 1 2021-06-04
#> 5 foo 06-06-2021 Tue 1 2021-06-08
根据提供的数据
df <- data.frame(
stringsAsFactors = FALSE,
Group = c("foo","foo","foo",
"foo","foo","foo","foo","bar","bar","bar",
"bar","bar","bar","bar"),
start_date = c("02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021","02-06-2021","02-06-2021",
"02-06-2021","02-06-2021"),
Day = c("Mon","Tue","Wed",
"Thu","Fri","Sat","Sun","Mon","Tue","Wed",
"Thu","Fri","Sat","Sun"),
y_n = c(0L,1L,0L,1L,1L,
1L,0L,1L,0L,0L,1L,1L,0L,0L)
)
library(lubridate)
library(tidyverse)
df %>% group_by(Group, start_date) %>%
mutate(Day = factor(Day, levels = Day, ordered = T)) %>%
arrange(Group, (as.numeric(Day) + 7 - wday(dmy(start_date), week_start = 1)) %% 7, .by_group = T) %>%
mutate(next_available_date = dmy(start_date) + 0:6) %>%
filter(y_n !=0) %>%
slice_head()
#> # A tibble: 2 x 5
#> # Groups: Group, start_date [2]
#> Group start_date Day y_n next_available_date
#> <chr> <chr> <ord> <int> <date>
#> 1 bar 02-06-2021 Thu 1 2021-06-03
#> 2 foo 02-06-2021 Thu 1 2021-06-03
由 reprex package (v2.0.0)
创建于 2021-06-02