给定年、月和月份的周数,为每个观察结果找到中间日期

Find middle date for each observation given year, month, and week number of the month

我的日期数据分为三列,如下所示:

月份
2019 4 月 1
2019 4 月 2
2019 4 月 3
2019 4 月 4
2019 五月 1
2019 五月 2
2019 五月 3
2019 五月 4

其中 'week' 对应于该月的第 (1-4) 周位置。我想要的是计算相应日期的中间日期。例如,对于第一行(2019 年、4 月和第 1 周),这对应于 2019 年 4 月 1 日 - 2019 年 4 月 6 日,所以理想情况下我想要本周中点的一列,这样(大约)是 4 月 3 日或 2019 年 4 日。不需要确切的日期。

所以结尾 table 应该是这样的:

月份 中点
2019 4 月 1 2019-04-03

下面是一些示例数据:

structure(list(Month = c("May", "May", "August", "April", "May", 
"May", "July", "August", "July", "May", "July", "May", "July", 
"April", "May", "July", "May", "April", "August", "July"), Week = c("wk 4", 
"wk 4", "wk 2", "wk 4", "wk 4", "wk 5", "wk 4", "wk 2", "wk 3", 
"wk 4", "wk 1", "wk 3", "wk 3", "wk 3", "wk 2", "wk 3", "wk 5", 
"wk 4", "wk 3", "wk 1"), Year = c("2016", "2007", "2010", "1991", 
"2012", "1990", "2011", "2014", "2020", "2011", "2010", "1992", 
"2017", "2020", "2014", "1996", "2012", "1995", "2018", "2019"
)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))

创建一个函数来查找中点,如下所示(my_df 如果您的示例数据):

library(tidyverse)
library(lubridate)

# <-- Function to find midpoint of week -->
midweek <- \(year, month, month_week) {
  # first day of the month
  date <- paste(year, month, "1st") |> lubridate::ymd()
  
  # return middle date of the month week
  date + 7 * month_week - 4
}


my_df |> mutate(
  Week = gsub("wk ", "", Week) |> as.integer(), 
  midpoint = midweek(Year, Month, Week)
)

#> # A tibble: 20 x 4
#>    Month   Week Year  midpoint  
#>    <chr>  <int> <chr> <date>    
#>  1 May        4 2016  2016-05-25
#>  2 May        4 2007  2007-05-25
#>  3 August     2 2010  2010-08-11
#>  4 April      4 1991  1991-04-25
#>  5 May        4 2012  2012-05-25
#>  6 May        5 1990  1990-06-01
#>  7 July       4 2011  2011-07-25
#>  8 August     2 2014  2014-08-11
#>  9 July       3 2020  2020-07-18
#> 10 May        4 2011  2011-05-25
#> 11 July       1 2010  2010-07-04
#> 12 May        3 1992  1992-05-18
#> 13 July       3 2017  2017-07-18
#> 14 April      3 2020  2020-04-18
#> 15 May        2 2014  2014-05-11
#> 16 July       3 1996  1996-07-18
#> 17 May        5 2012  2012-06-01
#> 18 April      4 1995  1995-04-25
#> 19 August     3 2018  2018-08-18
#> 20 July       1 2019  2019-07-04

下面应该做的。虽然它非常 hacky。

shift_to_middle_of_week <- function(Year, Month, Week) {
  
  Week = str_extract(Week, "\d+") %>% as.numeric()
  date_first_week = (paste(Year, Month, "1", sep = "-")) %>% ymd()
  dow = wday(date_first_week)
  
  week_start = if_else(
    Week == 1, 
    date_first_week, 
    date_first_week + ((Week-1)*7)-dow+1
  ) 
  
  shift = floor((7-dow)/2)
  
  ceil = ceiling_date(date_first_week, unit = "month")-1
  diff_ceil_week_start = ceil - week_start
  
  shift = if_else(diff_ceil_week_start<6, 
                  floor(diff_ceil_week_start/2), 
                  shift)
  return(week_start + shift)
}

df %>% 
  mutate(
    date_mid_week = shift_to_middle_of_week(Year, Month, Week)
  )
#> # A tibble: 20 x 4
#>    Month  Week  Year  date_mid_week
#>    <chr>  <chr> <chr> <date>       
#>  1 May    wk 4  2016  2016-05-25   
#>  2 May    wk 4  2007  2007-05-22   
#>  3 August wk 2  2010  2010-08-11   
#>  4 April  wk 4  1991  1991-04-23   
#>  5 May    wk 4  2012  2012-05-22   
#>  6 May    wk 5  1990  1990-05-29   
#>  7 July   wk 4  2011  2011-07-17   
#>  8 August wk 2  2014  2014-08-03   
#>  9 July   wk 3  2020  2020-07-13   
#> 10 May    wk 4  2011  2011-05-25   
#> 11 July   wk 1  2010  2010-07-02   
#> 12 May    wk 3  1992  1992-05-10   
#> 13 July   wk 3  2017  2017-07-09   
#> 14 April  wk 3  2020  2020-04-13   
#> 15 May    wk 2  2014  2014-05-05   
#> 16 July   wk 3  1996  1996-07-16   
#> 17 May    wk 5  2012  2012-05-29   
#> 18 April  wk 4  1995  1995-04-16   
#> 19 August wk 3  2018  2018-08-13   
#> 20 July   wk 1  2019  2019-07-03

更多细节。

将所有内容放在 mutate() 中以查看中间步骤:

df %>% mutate(
  Week = str_extract(Week, "\d+") %>% as.numeric(),
  date_first_week = (paste(Year, Month, "01", sep = "-")) %>% ymd(), 
  dow = wday(date_first_week),
  week_start = if_else(
    Week == 1, 
    date_first_week, 
    date_first_week + ((Week-1)*7)-dow+1
  ), 
  week_start_2 = date_first_week + ((Week-1)*7)-dow+1,

  shift = floor((7-dow)/2),
  ceil = ceiling_date(date_first_week, unit = "month")-1, 
  diff_ceil_week_start = ceil - week_start,
  shift = if_else(diff_ceil_week_start<6, 
                  floor(diff_ceil_week_start/2), 
                  shift), 
  week_mid = week_start + shift
)

# A tibble: 20 x 11
   Month   Week Year  date_first_week   dow week_start week_start_2 shift  ceil       diff_ceil_week_start week_mid  
   <chr>  <dbl> <chr> <date>          <dbl> <date>     <date>       <drtn> <date>     <drtn>               <date>    
 1 May        4 2016  2016-05-01          1 2016-05-22 2016-05-22   3 days 2016-05-31  9 days              2016-05-25
 2 May        4 2007  2007-05-01          3 2007-05-20 2007-05-20   2 days 2007-05-31 11 days              2007-05-22
 3 August     2 2010  2010-08-01          1 2010-08-08 2010-08-08   3 days 2010-08-31 23 days              2010-08-11
 4 April      4 1991  1991-04-01          2 1991-04-21 1991-04-21   2 days 1991-04-30  9 days              1991-04-23
 5 May        4 2012  2012-05-01          3 2012-05-20 2012-05-20   2 days 2012-05-31 11 days              2012-05-22
 6 May        5 1990  1990-05-01          3 1990-05-27 1990-05-27   2 days 1990-05-31  4 days              1990-05-29
 7 July       4 2011  2011-07-01          6 2011-07-17 2011-07-17   0 days 2011-07-31 14 days              2011-07-17
 8 August     2 2014  2014-08-01          6 2014-08-03 2014-08-03   0 days 2014-08-31 28 days              2014-08-03
 9 July       3 2020  2020-07-01          4 2020-07-12 2020-07-12   1 days 2020-07-31 19 days              2020-07-13
10 May        4 2011  2011-05-01          1 2011-05-22 2011-05-22   3 days 2011-05-31  9 days              2011-05-25
11 July       1 2010  2010-07-01          5 2010-07-01 2010-06-27   1 days 2010-07-31 30 days              2010-07-02
12 May        3 1992  1992-05-01          6 1992-05-10 1992-05-10   0 days 1992-05-31 21 days              1992-05-10
13 July       3 2017  2017-07-01          7 2017-07-09 2017-07-09   0 days 2017-07-31 22 days              2017-07-09
14 April      3 2020  2020-04-01          4 2020-04-12 2020-04-12   1 days 2020-04-30 18 days              2020-04-13
15 May        2 2014  2014-05-01          5 2014-05-04 2014-05-04   1 days 2014-05-31 27 days              2014-05-05
16 July       3 1996  1996-07-01          2 1996-07-14 1996-07-14   2 days 1996-07-31 17 days              1996-07-16
17 May        5 2012  2012-05-01          3 2012-05-27 2012-05-27   2 days 2012-05-31  4 days              2012-05-29
18 April      4 1995  1995-04-01          7 1995-04-16 1995-04-16   0 days 1995-04-30 14 days              1995-04-16
19 August     3 2018  2018-08-01          4 2018-08-12 2018-08-12   1 days 2018-08-31 19 days              2018-08-13
20 July       1 2019  2019-07-01          2 2019-07-01 2019-06-30   2 days 2019-07-31 30 days              2019-07-03