给定一个日期范围如何扩展到该范围内每月的天数？

Question

案例：
给定的数据框 df 具有（除其他外）一个 startDate 和一个 endDate 列。我的目=19=] 和 numberOfDaysInMonth，都是 int 类型。

示例：
输入： df

  id    startDate     endDate  someOtherCol
   1   2017-09-23  2018-02-01          val1
   2   2018-01-01  2018-03-31          val2
 ...          ...         ...           ...

期望输出： df_res

  id  year  month  numberOfDaysInMonth  someOtherCol
   1  2017      9                    8          val1
   1  2017     10                   31          val1
   1  2017     11                   30          val1
   1  2017     12                   31          val1
   1  2018      1                   31          val1
   1  2018      2                    1          val1
   2  2018      1                   31          val2
   2  2018      2                   28          val2
   2  2018      3                   31          val2
 ...   ...    ...                  ...           ...

背景：
我对 R 比较陌生，但知道很棒的 dplyr 和 lubridate 包。我只是没能以一种巧妙的方式实现上述目标，即使在使用这些包时也是如此。我得到的最接近的是：Expand rows by date range using start and end date，但这不会产生范围内包含的每月天数。

非常感谢任何帮助。

Answer 1

如果您不介意 data.table 解决方案，您可以在按 id、someOtherCol、年份和月份聚合之前在 startDate 和 endDate 之间创建一系列连续日期，如下所示：

dat[, .(Dates=seq(startDate, endDate, by="1 day")), by=.(id, someOtherCol)][,
    .N, by=.(id, someOtherCol, year(Dates), month(Dates))]

输出：

   id someOtherCol year month  N
1:  1         val1 2017     9  8
2:  1         val1 2017    10 31
3:  1         val1 2017    11 30
4:  1         val1 2017    12 31
5:  1         val1 2018     1 31
6:  1         val1 2018     2  1
7:  2         val2 2018     1 31
8:  2         val2 2018     2 28
9:  2         val2 2018     3 31

数据：

library(data.table)    
dat <- fread("id    startDate     endDate  someOtherCol
1   2017-09-23  2018-02-01          val1
2   2018-01-01  2018-03-31          val2")
datecols <- c("startDate", "endDate")
dat[, (datecols) := lapply(.SD, as.Date, format="%Y-%m-%d"), .SDcols=datecols]

Answer 2

一个tidyverse解决方案：

# example data
df = read.table(text = "
id    startDate     endDate  someOtherCol
1   2017-09-23  2018-02-01          val1
2   2018-01-01  2018-03-31          val2
", header=T, stringsAsFactors=F)

library(tidyverse)
library(lubridate)


df %>%
  mutate_at(vars(startDate, endDate), ymd) %>%                  # update to date columns (if needed)
  group_by(id) %>%                                              # for each id
  mutate(d = list(seq(startDate, endDate, by="1 day"))) %>%     # create a sequence of dates (as a list)
  unnest() %>%                                                  # unnest data
  group_by(id, year=year(d), month=month(d), someOtherCol) %>%  # group by those variables (while getting year and month of each date in the sequence)
  summarise(numberOfDaysInMonth = n()) %>%                      # count days
  ungroup()                                                     # forget the grouping

# # A tibble: 9 x 5
#      id  year month someOtherCol numberOfDaysInMonth
#   <int> <dbl> <dbl> <chr>                      <int>
# 1     1  2017     9 val1                           8
# 2     1  2017    10 val1                          31
# 3     1  2017    11 val1                          30
# 4     1  2017    12 val1                          31
# 5     1  2018     1 val1                          31
# 6     1  2018     2 val1                           1
# 7     2  2018     1 val2                          31
# 8     2  2018     2 val2                          28
# 9     2  2018     3 val2                          31

给定一个日期范围如何扩展到该范围内每月的天数？

Given a date range how to expand to the number of days per month in that range?

r

lubridate

dplyr