给定年、月和月份的周数,为每个观察结果找到中间日期
Find middle date for each observation given year, month, and week number of the month
我的日期数据分为三列,如下所示:
年
月份
周
2019
4 月
1
2019
4 月
2
2019
4 月
3
2019
4 月
4
2019
五月
1
2019
五月
2
2019
五月
3
2019
五月
4
其中 'week' 对应于该月的第 (1-4) 周位置。我想要的是计算相应日期的中间日期。例如,对于第一行(2019 年、4 月和第 1 周),这对应于 2019 年 4 月 1 日 - 2019 年 4 月 6 日,所以理想情况下我想要本周中点的一列,这样(大约)是 4 月 3 日或 2019 年 4 日。不需要确切的日期。
所以结尾 table 应该是这样的:
年
月份
周
中点
2019
4 月
1
2019-04-03
下面是一些示例数据:
structure(list(Month = c("May", "May", "August", "April", "May",
"May", "July", "August", "July", "May", "July", "May", "July",
"April", "May", "July", "May", "April", "August", "July"), Week = c("wk 4",
"wk 4", "wk 2", "wk 4", "wk 4", "wk 5", "wk 4", "wk 2", "wk 3",
"wk 4", "wk 1", "wk 3", "wk 3", "wk 3", "wk 2", "wk 3", "wk 5",
"wk 4", "wk 3", "wk 1"), Year = c("2016", "2007", "2010", "1991",
"2012", "1990", "2011", "2014", "2020", "2011", "2010", "1992",
"2017", "2020", "2014", "1996", "2012", "1995", "2018", "2019"
)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))
创建一个函数来查找中点,如下所示(my_df
如果您的示例数据):
library(tidyverse)
library(lubridate)
# <-- Function to find midpoint of week -->
midweek <- \(year, month, month_week) {
# first day of the month
date <- paste(year, month, "1st") |> lubridate::ymd()
# return middle date of the month week
date + 7 * month_week - 4
}
my_df |> mutate(
Week = gsub("wk ", "", Week) |> as.integer(),
midpoint = midweek(Year, Month, Week)
)
#> # A tibble: 20 x 4
#> Month Week Year midpoint
#> <chr> <int> <chr> <date>
#> 1 May 4 2016 2016-05-25
#> 2 May 4 2007 2007-05-25
#> 3 August 2 2010 2010-08-11
#> 4 April 4 1991 1991-04-25
#> 5 May 4 2012 2012-05-25
#> 6 May 5 1990 1990-06-01
#> 7 July 4 2011 2011-07-25
#> 8 August 2 2014 2014-08-11
#> 9 July 3 2020 2020-07-18
#> 10 May 4 2011 2011-05-25
#> 11 July 1 2010 2010-07-04
#> 12 May 3 1992 1992-05-18
#> 13 July 3 2017 2017-07-18
#> 14 April 3 2020 2020-04-18
#> 15 May 2 2014 2014-05-11
#> 16 July 3 1996 1996-07-18
#> 17 May 5 2012 2012-06-01
#> 18 April 4 1995 1995-04-25
#> 19 August 3 2018 2018-08-18
#> 20 July 1 2019 2019-07-04
下面应该做的。虽然它非常 hacky。
shift_to_middle_of_week <- function(Year, Month, Week) {
Week = str_extract(Week, "\d+") %>% as.numeric()
date_first_week = (paste(Year, Month, "1", sep = "-")) %>% ymd()
dow = wday(date_first_week)
week_start = if_else(
Week == 1,
date_first_week,
date_first_week + ((Week-1)*7)-dow+1
)
shift = floor((7-dow)/2)
ceil = ceiling_date(date_first_week, unit = "month")-1
diff_ceil_week_start = ceil - week_start
shift = if_else(diff_ceil_week_start<6,
floor(diff_ceil_week_start/2),
shift)
return(week_start + shift)
}
df %>%
mutate(
date_mid_week = shift_to_middle_of_week(Year, Month, Week)
)
#> # A tibble: 20 x 4
#> Month Week Year date_mid_week
#> <chr> <chr> <chr> <date>
#> 1 May wk 4 2016 2016-05-25
#> 2 May wk 4 2007 2007-05-22
#> 3 August wk 2 2010 2010-08-11
#> 4 April wk 4 1991 1991-04-23
#> 5 May wk 4 2012 2012-05-22
#> 6 May wk 5 1990 1990-05-29
#> 7 July wk 4 2011 2011-07-17
#> 8 August wk 2 2014 2014-08-03
#> 9 July wk 3 2020 2020-07-13
#> 10 May wk 4 2011 2011-05-25
#> 11 July wk 1 2010 2010-07-02
#> 12 May wk 3 1992 1992-05-10
#> 13 July wk 3 2017 2017-07-09
#> 14 April wk 3 2020 2020-04-13
#> 15 May wk 2 2014 2014-05-05
#> 16 July wk 3 1996 1996-07-16
#> 17 May wk 5 2012 2012-05-29
#> 18 April wk 4 1995 1995-04-16
#> 19 August wk 3 2018 2018-08-13
#> 20 July wk 1 2019 2019-07-03
更多细节。
将所有内容放在 mutate()
中以查看中间步骤:
df %>% mutate(
Week = str_extract(Week, "\d+") %>% as.numeric(),
date_first_week = (paste(Year, Month, "01", sep = "-")) %>% ymd(),
dow = wday(date_first_week),
week_start = if_else(
Week == 1,
date_first_week,
date_first_week + ((Week-1)*7)-dow+1
),
week_start_2 = date_first_week + ((Week-1)*7)-dow+1,
shift = floor((7-dow)/2),
ceil = ceiling_date(date_first_week, unit = "month")-1,
diff_ceil_week_start = ceil - week_start,
shift = if_else(diff_ceil_week_start<6,
floor(diff_ceil_week_start/2),
shift),
week_mid = week_start + shift
)
# A tibble: 20 x 11
Month Week Year date_first_week dow week_start week_start_2 shift ceil diff_ceil_week_start week_mid
<chr> <dbl> <chr> <date> <dbl> <date> <date> <drtn> <date> <drtn> <date>
1 May 4 2016 2016-05-01 1 2016-05-22 2016-05-22 3 days 2016-05-31 9 days 2016-05-25
2 May 4 2007 2007-05-01 3 2007-05-20 2007-05-20 2 days 2007-05-31 11 days 2007-05-22
3 August 2 2010 2010-08-01 1 2010-08-08 2010-08-08 3 days 2010-08-31 23 days 2010-08-11
4 April 4 1991 1991-04-01 2 1991-04-21 1991-04-21 2 days 1991-04-30 9 days 1991-04-23
5 May 4 2012 2012-05-01 3 2012-05-20 2012-05-20 2 days 2012-05-31 11 days 2012-05-22
6 May 5 1990 1990-05-01 3 1990-05-27 1990-05-27 2 days 1990-05-31 4 days 1990-05-29
7 July 4 2011 2011-07-01 6 2011-07-17 2011-07-17 0 days 2011-07-31 14 days 2011-07-17
8 August 2 2014 2014-08-01 6 2014-08-03 2014-08-03 0 days 2014-08-31 28 days 2014-08-03
9 July 3 2020 2020-07-01 4 2020-07-12 2020-07-12 1 days 2020-07-31 19 days 2020-07-13
10 May 4 2011 2011-05-01 1 2011-05-22 2011-05-22 3 days 2011-05-31 9 days 2011-05-25
11 July 1 2010 2010-07-01 5 2010-07-01 2010-06-27 1 days 2010-07-31 30 days 2010-07-02
12 May 3 1992 1992-05-01 6 1992-05-10 1992-05-10 0 days 1992-05-31 21 days 1992-05-10
13 July 3 2017 2017-07-01 7 2017-07-09 2017-07-09 0 days 2017-07-31 22 days 2017-07-09
14 April 3 2020 2020-04-01 4 2020-04-12 2020-04-12 1 days 2020-04-30 18 days 2020-04-13
15 May 2 2014 2014-05-01 5 2014-05-04 2014-05-04 1 days 2014-05-31 27 days 2014-05-05
16 July 3 1996 1996-07-01 2 1996-07-14 1996-07-14 2 days 1996-07-31 17 days 1996-07-16
17 May 5 2012 2012-05-01 3 2012-05-27 2012-05-27 2 days 2012-05-31 4 days 2012-05-29
18 April 4 1995 1995-04-01 7 1995-04-16 1995-04-16 0 days 1995-04-30 14 days 1995-04-16
19 August 3 2018 2018-08-01 4 2018-08-12 2018-08-12 1 days 2018-08-31 19 days 2018-08-13
20 July 1 2019 2019-07-01 2 2019-07-01 2019-06-30 2 days 2019-07-31 30 days 2019-07-03
我的日期数据分为三列,如下所示:
年 | 月份 | 周 |
---|---|---|
2019 | 4 月 | 1 |
2019 | 4 月 | 2 |
2019 | 4 月 | 3 |
2019 | 4 月 | 4 |
2019 | 五月 | 1 |
2019 | 五月 | 2 |
2019 | 五月 | 3 |
2019 | 五月 | 4 |
其中 'week' 对应于该月的第 (1-4) 周位置。我想要的是计算相应日期的中间日期。例如,对于第一行(2019 年、4 月和第 1 周),这对应于 2019 年 4 月 1 日 - 2019 年 4 月 6 日,所以理想情况下我想要本周中点的一列,这样(大约)是 4 月 3 日或 2019 年 4 日。不需要确切的日期。
所以结尾 table 应该是这样的:
年 | 月份 | 周 | 中点 |
---|---|---|---|
2019 | 4 月 | 1 | 2019-04-03 |
下面是一些示例数据:
structure(list(Month = c("May", "May", "August", "April", "May",
"May", "July", "August", "July", "May", "July", "May", "July",
"April", "May", "July", "May", "April", "August", "July"), Week = c("wk 4",
"wk 4", "wk 2", "wk 4", "wk 4", "wk 5", "wk 4", "wk 2", "wk 3",
"wk 4", "wk 1", "wk 3", "wk 3", "wk 3", "wk 2", "wk 3", "wk 5",
"wk 4", "wk 3", "wk 1"), Year = c("2016", "2007", "2010", "1991",
"2012", "1990", "2011", "2014", "2020", "2011", "2010", "1992",
"2017", "2020", "2014", "1996", "2012", "1995", "2018", "2019"
)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))
创建一个函数来查找中点,如下所示(my_df
如果您的示例数据):
library(tidyverse)
library(lubridate)
# <-- Function to find midpoint of week -->
midweek <- \(year, month, month_week) {
# first day of the month
date <- paste(year, month, "1st") |> lubridate::ymd()
# return middle date of the month week
date + 7 * month_week - 4
}
my_df |> mutate(
Week = gsub("wk ", "", Week) |> as.integer(),
midpoint = midweek(Year, Month, Week)
)
#> # A tibble: 20 x 4
#> Month Week Year midpoint
#> <chr> <int> <chr> <date>
#> 1 May 4 2016 2016-05-25
#> 2 May 4 2007 2007-05-25
#> 3 August 2 2010 2010-08-11
#> 4 April 4 1991 1991-04-25
#> 5 May 4 2012 2012-05-25
#> 6 May 5 1990 1990-06-01
#> 7 July 4 2011 2011-07-25
#> 8 August 2 2014 2014-08-11
#> 9 July 3 2020 2020-07-18
#> 10 May 4 2011 2011-05-25
#> 11 July 1 2010 2010-07-04
#> 12 May 3 1992 1992-05-18
#> 13 July 3 2017 2017-07-18
#> 14 April 3 2020 2020-04-18
#> 15 May 2 2014 2014-05-11
#> 16 July 3 1996 1996-07-18
#> 17 May 5 2012 2012-06-01
#> 18 April 4 1995 1995-04-25
#> 19 August 3 2018 2018-08-18
#> 20 July 1 2019 2019-07-04
下面应该做的。虽然它非常 hacky。
shift_to_middle_of_week <- function(Year, Month, Week) {
Week = str_extract(Week, "\d+") %>% as.numeric()
date_first_week = (paste(Year, Month, "1", sep = "-")) %>% ymd()
dow = wday(date_first_week)
week_start = if_else(
Week == 1,
date_first_week,
date_first_week + ((Week-1)*7)-dow+1
)
shift = floor((7-dow)/2)
ceil = ceiling_date(date_first_week, unit = "month")-1
diff_ceil_week_start = ceil - week_start
shift = if_else(diff_ceil_week_start<6,
floor(diff_ceil_week_start/2),
shift)
return(week_start + shift)
}
df %>%
mutate(
date_mid_week = shift_to_middle_of_week(Year, Month, Week)
)
#> # A tibble: 20 x 4
#> Month Week Year date_mid_week
#> <chr> <chr> <chr> <date>
#> 1 May wk 4 2016 2016-05-25
#> 2 May wk 4 2007 2007-05-22
#> 3 August wk 2 2010 2010-08-11
#> 4 April wk 4 1991 1991-04-23
#> 5 May wk 4 2012 2012-05-22
#> 6 May wk 5 1990 1990-05-29
#> 7 July wk 4 2011 2011-07-17
#> 8 August wk 2 2014 2014-08-03
#> 9 July wk 3 2020 2020-07-13
#> 10 May wk 4 2011 2011-05-25
#> 11 July wk 1 2010 2010-07-02
#> 12 May wk 3 1992 1992-05-10
#> 13 July wk 3 2017 2017-07-09
#> 14 April wk 3 2020 2020-04-13
#> 15 May wk 2 2014 2014-05-05
#> 16 July wk 3 1996 1996-07-16
#> 17 May wk 5 2012 2012-05-29
#> 18 April wk 4 1995 1995-04-16
#> 19 August wk 3 2018 2018-08-13
#> 20 July wk 1 2019 2019-07-03
更多细节。
将所有内容放在 mutate()
中以查看中间步骤:
df %>% mutate(
Week = str_extract(Week, "\d+") %>% as.numeric(),
date_first_week = (paste(Year, Month, "01", sep = "-")) %>% ymd(),
dow = wday(date_first_week),
week_start = if_else(
Week == 1,
date_first_week,
date_first_week + ((Week-1)*7)-dow+1
),
week_start_2 = date_first_week + ((Week-1)*7)-dow+1,
shift = floor((7-dow)/2),
ceil = ceiling_date(date_first_week, unit = "month")-1,
diff_ceil_week_start = ceil - week_start,
shift = if_else(diff_ceil_week_start<6,
floor(diff_ceil_week_start/2),
shift),
week_mid = week_start + shift
)
# A tibble: 20 x 11
Month Week Year date_first_week dow week_start week_start_2 shift ceil diff_ceil_week_start week_mid
<chr> <dbl> <chr> <date> <dbl> <date> <date> <drtn> <date> <drtn> <date>
1 May 4 2016 2016-05-01 1 2016-05-22 2016-05-22 3 days 2016-05-31 9 days 2016-05-25
2 May 4 2007 2007-05-01 3 2007-05-20 2007-05-20 2 days 2007-05-31 11 days 2007-05-22
3 August 2 2010 2010-08-01 1 2010-08-08 2010-08-08 3 days 2010-08-31 23 days 2010-08-11
4 April 4 1991 1991-04-01 2 1991-04-21 1991-04-21 2 days 1991-04-30 9 days 1991-04-23
5 May 4 2012 2012-05-01 3 2012-05-20 2012-05-20 2 days 2012-05-31 11 days 2012-05-22
6 May 5 1990 1990-05-01 3 1990-05-27 1990-05-27 2 days 1990-05-31 4 days 1990-05-29
7 July 4 2011 2011-07-01 6 2011-07-17 2011-07-17 0 days 2011-07-31 14 days 2011-07-17
8 August 2 2014 2014-08-01 6 2014-08-03 2014-08-03 0 days 2014-08-31 28 days 2014-08-03
9 July 3 2020 2020-07-01 4 2020-07-12 2020-07-12 1 days 2020-07-31 19 days 2020-07-13
10 May 4 2011 2011-05-01 1 2011-05-22 2011-05-22 3 days 2011-05-31 9 days 2011-05-25
11 July 1 2010 2010-07-01 5 2010-07-01 2010-06-27 1 days 2010-07-31 30 days 2010-07-02
12 May 3 1992 1992-05-01 6 1992-05-10 1992-05-10 0 days 1992-05-31 21 days 1992-05-10
13 July 3 2017 2017-07-01 7 2017-07-09 2017-07-09 0 days 2017-07-31 22 days 2017-07-09
14 April 3 2020 2020-04-01 4 2020-04-12 2020-04-12 1 days 2020-04-30 18 days 2020-04-13
15 May 2 2014 2014-05-01 5 2014-05-04 2014-05-04 1 days 2014-05-31 27 days 2014-05-05
16 July 3 1996 1996-07-01 2 1996-07-14 1996-07-14 2 days 1996-07-31 17 days 1996-07-16
17 May 5 2012 2012-05-01 3 2012-05-27 2012-05-27 2 days 2012-05-31 4 days 2012-05-29
18 April 4 1995 1995-04-01 7 1995-04-16 1995-04-16 0 days 1995-04-30 14 days 1995-04-16
19 August 3 2018 2018-08-01 4 2018-08-12 2018-08-12 1 days 2018-08-31 19 days 2018-08-13
20 July 1 2019 2019-07-01 2 2019-07-01 2019-06-30 2 days 2019-07-31 30 days 2019-07-03