计算日期范围内每个月的天数

Question

我有一个包含开始日期和结束日期的数据框，就像这样

id <- c(1, 1, 2)
start <- c("2014-01-05", "2014-02-04", "2014-02-06")
end <- c("2014-02-03", "2014-04-29", "2014-03-07")
df <- data.frame(id, start, end)

 id        start          end
  1    2014-01-05   2014-02-03
  1    2014-02-04   2014-04-29
  2    2014-02-06   2014-03-07

我正在尝试确定如何计算开始日期和结束日期之间每个月发生的日期数。比如下面的：

id    month_yyyy_mm count
 1          2014-01    27
 1          2014-02     3
 1          2014-02    25
 1          2014-03    31
 1          2014-04    29
 2          2014-02    23
 2          2014-03     7

我可以将字符串转换为日期，然后使用 difftime 计算开始和结束之间的总差，但我不知道如何按月计算。 lubridate 包中是否有任何可以提供帮助的东西？

Answer 1

考虑下面的函数 f1, f2, f3

f1 <- function(d_first,d_last){
        d_first <- as.Date(d_first)
        d_last <- as.Date(d_last)

        D <- seq(d_first, d_last, 1) # generate all days in [d_first,d_last]
        M <- unique(format(D, "%m")) # all months in [d_first,d_lst]

        f2 <- function(x) length(which(format(D, "%m") == x)) # returns number of days in month x
        res <- vapply(M,f2,numeric(1))
        return(cbind(unique(format(D, "%Y-%m")),res))
      }
f3 <- function(k) f1(df$start[k],df$end[k])

output <- sapply(1:nrow(df), f3)

产生

> output 
[[1]]
             res 
01 "2014-01" "27"
02 "2014-02" "3" 

[[2]]
             res 
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"

[[3]]
             res 
02 "2014-02" "23"
03 "2014-03" "7"

从现在开始，剩下的就是格式问题了。事实上，一个简单的 do.call(rbind, output) 就可以解决问题

> do.call(rbind, output)
             res 
01 "2014-01" "27"
02 "2014-02" "3" 
02 "2014-02" "25"
03 "2014-03" "31"
04 "2014-04" "29"
02 "2014-02" "23"
03 "2014-03" "7"

在我的脑海中，您可以设置 ID f4 <- function(k) cbind(df$id[k], f3(k))，因此

> do.call(rbind, sapply(1:nrow(df), f4))
                 res 
01 "1" "2014-01" "27"
02 "1" "2014-02" "3" 
02 "1" "2014-02" "25"
03 "1" "2014-03" "31"
04 "1" "2014-04" "29"
02 "2" "2014-02" "23"
03 "2" "2014-03" "7"

但可能还有更聪明的解决方案。

Answer 2

这是一个不同的方法，它使用 data.table 包中的 foverlaps() 函数。

foverlaps() 查找创建的月份第一天和最后一天序列与给定期间之间的重叠。

library(data.table)
library(lubridate)

# coerce dates from character to IDate
cols <- c("start", "end")
DT <- as.data.table(df)[, (cols) := lapply(.SD, as.IDate), .SDcols = cols]

# create sequence of months which cover all periods
mon_seq <- DT[, as.IDate(seq(floor_date(min(start), unit = "months"), 
                             ceiling_date(max(end), unit = "months"),
                             by = "month"))]
# create helper data.table with first and last day of months
mDT <- data.table(start = head(mon_seq, -1L), end = tail(mon_seq, -1L) - 1L)
setkeyv(DT, cols)
# find overlapping pieces for each month
foverlaps(mDT, DT, nomatch = 0L)[
  # compute count of days in each month
  , {tmp <- pmax(start, i.start)
  .(id = id, month = format(tmp, "%Y-%m"), 
    count = as.integer(difftime(pmin(end, i.end), tmp, units = "days")) + 1L)
  }][
    # reorder conveniently
    order(id, month)]

   id   month count
1:  1 2014-01    27
2:  1 2014-02     3
3:  1 2014-02    25
4:  1 2014-03    31
5:  1 2014-04    29
6:  2 2014-02    23
7:  2 2014-03     7

计算日期范围内每个月的天数

Count the number of days in each month of a date range

r

date

lubridate