给定一个日期范围如何扩展到该范围内每月的天数?

Given a date range how to expand to the number of days per month in that range?

案例:
给定的数据框 df 具有(除其他外)一个 startDate 和一个 endDate 列。我的目=19=] 和 numberOfDaysInMonth,都是 int 类型。

示例:
输入: df

  id    startDate     endDate  someOtherCol
   1   2017-09-23  2018-02-01          val1
   2   2018-01-01  2018-03-31          val2
 ...          ...         ...           ...

期望输出: df_res

  id  year  month  numberOfDaysInMonth  someOtherCol
   1  2017      9                    8          val1
   1  2017     10                   31          val1
   1  2017     11                   30          val1
   1  2017     12                   31          val1
   1  2018      1                   31          val1
   1  2018      2                    1          val1
   2  2018      1                   31          val2
   2  2018      2                   28          val2
   2  2018      3                   31          val2
 ...   ...    ...                  ...           ... 

背景:
我对 R 比较陌生,但知道很棒的 dplyrlubridate 包。我只是没能以一种巧妙的方式实现上述目标,即使在使用这些包时也是如此。我得到的最接近的是:Expand rows by date range using start and end date,但这不会产生范围内包含的每月天数。

非常感谢任何帮助。

如果您不介意 data.table 解决方案,您可以在按 id、someOtherCol、年份和月份聚合之前在 startDate 和 endDate 之间创建一系列连续日期,如下所示:

dat[, .(Dates=seq(startDate, endDate, by="1 day")), by=.(id, someOtherCol)][,
    .N, by=.(id, someOtherCol, year(Dates), month(Dates))]

输出:

   id someOtherCol year month  N
1:  1         val1 2017     9  8
2:  1         val1 2017    10 31
3:  1         val1 2017    11 30
4:  1         val1 2017    12 31
5:  1         val1 2018     1 31
6:  1         val1 2018     2  1
7:  2         val2 2018     1 31
8:  2         val2 2018     2 28
9:  2         val2 2018     3 31

数据:

library(data.table)    
dat <- fread("id    startDate     endDate  someOtherCol
1   2017-09-23  2018-02-01          val1
2   2018-01-01  2018-03-31          val2")
datecols <- c("startDate", "endDate")
dat[, (datecols) := lapply(.SD, as.Date, format="%Y-%m-%d"), .SDcols=datecols]

一个tidyverse解决方案:

# example data
df = read.table(text = "
id    startDate     endDate  someOtherCol
1   2017-09-23  2018-02-01          val1
2   2018-01-01  2018-03-31          val2
", header=T, stringsAsFactors=F)

library(tidyverse)
library(lubridate)


df %>%
  mutate_at(vars(startDate, endDate), ymd) %>%                  # update to date columns (if needed)
  group_by(id) %>%                                              # for each id
  mutate(d = list(seq(startDate, endDate, by="1 day"))) %>%     # create a sequence of dates (as a list)
  unnest() %>%                                                  # unnest data
  group_by(id, year=year(d), month=month(d), someOtherCol) %>%  # group by those variables (while getting year and month of each date in the sequence)
  summarise(numberOfDaysInMonth = n()) %>%                      # count days
  ungroup()                                                     # forget the grouping

# # A tibble: 9 x 5
#      id  year month someOtherCol numberOfDaysInMonth
#   <int> <dbl> <dbl> <chr>                      <int>
# 1     1  2017     9 val1                           8
# 2     1  2017    10 val1                          31
# 3     1  2017    11 val1                          30
# 4     1  2017    12 val1                          31
# 5     1  2018     1 val1                          31
# 6     1  2018     2 val1                           1
# 7     2  2018     1 val2                          31
# 8     2  2018     2 val2                          28
# 9     2  2018     3 val2                          31