在 Table 1 从 Table 2 在 R 中获取开始日期和结束日期之间的值

Getting values between Start and End Dates in Table 1 From Table 2 in R

我有一个 table(我们称之为 TBL1),它在某些日期有一个开始和结束列。

Table 2 (TBL2) 是通过从 Google Analytics 中提取一年的会话数据创建的。

TBL2中唯一的变量是Dates(daily)和session(对应每一天)

TBL1 中的变量是开始日期、结束日期、会话(空)

我想从 TBL2 获取 TBL1$StartDate 和 TBL1$EndDate 之间的会话总和,并将会话总和放入 TBL1$sessions。

这在 R 中可行吗?

示例代码如下: 请注意,TBL2 数据实际上来自 Google Analytics。在跨越一年的实际问题中,我也有 100 多个开始和结束日期。


StartDate <-c("2017-01-01", "2017-01-09","2017-01-18", "2017-01-07")

EndDate <- c("2017-01-05", "2017-01-11", "2017-01-25", "2017-01-28" )

Sessions <- c(" ", " ", " ", " ")

TBL1 <- data.frame(StartDate, EndDate, Sessions)
as.Date(TBL1$StartDate)
as.Date(TBL1$EndDate)


StartDate   EndDate      Sessions
2017-01-01  2017-01-05          
2017-01-09  2017-01-11          
2017-01-18  2017-01-25          
2017-01-07  2017-01-28

Date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06","2017-01-07","2017-01-08","2017-01-09","2017-01-10","2017-01-11","2017-01-12","2017-01-13","2017-01-14","2017-01-15","2017-01-16","2017-01-17","2017-01-18","2017-01-19","2017-01-20","2017-01-21","2017-01-22","2017-01-23","2017-01-24","2017-01-25","2017-01-26","2017-01-27","2017-01-28","2017-01-29","2017-01-30","2017-01-31")

sessions <- sample(200:5000,31)

TBL2 <- data.frame(Date, sessions)
as.Date(TBL2$Date)


Date      sessions
2017-01-01  1920            
2017-01-02  1276            
2017-01-03  1604            
2017-01-04  4283            
2017-01-05  4170            
2017-01-06  2870            
2017-01-07  2255            
2017-01-08  3660            
2017-01-09  290         
2017-01-10  4024
2017-01-11  1433            
2017-01-12  2168            
2017-01-13  2096            
2017-01-14  4649            
2017-01-15  836         
2017-01-16  3354            
2017-01-17  2366            
2017-01-18  1450            
2017-01-19  2067            
2017-01-20  4172            
2017-01-21  3081            
2017-01-22  3060            
2017-01-23  417         
2017-01-24  3422            
2017-01-25  2905            
2017-01-26  427         
2017-01-27  2163            
2017-01-28  2221            
2017-01-29  2350            
2017-01-30  3529            
2017-01-31  4156                

EndOutput <- data.frame(StartDate, EndDate, Session)


StartDate    EndDate   Session
2017-01-01  2017-01-05  13253       
2017-01-09  2017-01-11  5747        
2017-01-18  2017-01-25  20574       
2017-01-07  2017-01-28  49094

我不知道如何正确地对其进行矢量化,但这里有一种使用 lubridate 的 %within% 运算符进行此操作的快速而肮脏的方法:

library(lubridate)

# Convert our character dates into lubridate date-times.
TBL1$StartDate <- ymd(TBL1$StartDate)
TBL2$StartDate <- ymd(TBL1$StartDate)
TBL2$Date      <- ymd(TBL2$Date)

# Define a helper function to sum sessions between the start and end date of row x.
sum_of_sessions <- function (x) {sum(TBL2$sessions[TBL2$Date %within% interval(TBL1$StartDate[x], TBL1$EndDate[x])])}

# Store the results in TBL$Sessions.
for (i in nrow(TBL1)) {TBL1$Sessions[i] <- sum_of_sessions(i)}