在 Table 1 从 Table 2 在 R 中获取开始日期和结束日期之间的值
Getting values between Start and End Dates in Table 1 From Table 2 in R
我有一个 table(我们称之为 TBL1),它在某些日期有一个开始和结束列。
Table 2 (TBL2) 是通过从 Google Analytics 中提取一年的会话数据创建的。
TBL2中唯一的变量是Dates(daily)和session(对应每一天)
TBL1 中的变量是开始日期、结束日期、会话(空)
我想从 TBL2 获取 TBL1$StartDate 和 TBL1$EndDate 之间的会话总和,并将会话总和放入 TBL1$sessions。
这在 R 中可行吗?
示例代码如下:
请注意,TBL2 数据实际上来自 Google Analytics。在跨越一年的实际问题中,我也有 100 多个开始和结束日期。
StartDate <-c("2017-01-01", "2017-01-09","2017-01-18", "2017-01-07")
EndDate <- c("2017-01-05", "2017-01-11", "2017-01-25", "2017-01-28" )
Sessions <- c(" ", " ", " ", " ")
TBL1 <- data.frame(StartDate, EndDate, Sessions)
as.Date(TBL1$StartDate)
as.Date(TBL1$EndDate)
StartDate EndDate Sessions
2017-01-01 2017-01-05
2017-01-09 2017-01-11
2017-01-18 2017-01-25
2017-01-07 2017-01-28
Date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06","2017-01-07","2017-01-08","2017-01-09","2017-01-10","2017-01-11","2017-01-12","2017-01-13","2017-01-14","2017-01-15","2017-01-16","2017-01-17","2017-01-18","2017-01-19","2017-01-20","2017-01-21","2017-01-22","2017-01-23","2017-01-24","2017-01-25","2017-01-26","2017-01-27","2017-01-28","2017-01-29","2017-01-30","2017-01-31")
sessions <- sample(200:5000,31)
TBL2 <- data.frame(Date, sessions)
as.Date(TBL2$Date)
Date sessions
2017-01-01 1920
2017-01-02 1276
2017-01-03 1604
2017-01-04 4283
2017-01-05 4170
2017-01-06 2870
2017-01-07 2255
2017-01-08 3660
2017-01-09 290
2017-01-10 4024
2017-01-11 1433
2017-01-12 2168
2017-01-13 2096
2017-01-14 4649
2017-01-15 836
2017-01-16 3354
2017-01-17 2366
2017-01-18 1450
2017-01-19 2067
2017-01-20 4172
2017-01-21 3081
2017-01-22 3060
2017-01-23 417
2017-01-24 3422
2017-01-25 2905
2017-01-26 427
2017-01-27 2163
2017-01-28 2221
2017-01-29 2350
2017-01-30 3529
2017-01-31 4156
EndOutput <- data.frame(StartDate, EndDate, Session)
StartDate EndDate Session
2017-01-01 2017-01-05 13253
2017-01-09 2017-01-11 5747
2017-01-18 2017-01-25 20574
2017-01-07 2017-01-28 49094
我不知道如何正确地对其进行矢量化,但这里有一种使用 lubridate 的 %within% 运算符进行此操作的快速而肮脏的方法:
library(lubridate)
# Convert our character dates into lubridate date-times.
TBL1$StartDate <- ymd(TBL1$StartDate)
TBL2$StartDate <- ymd(TBL1$StartDate)
TBL2$Date <- ymd(TBL2$Date)
# Define a helper function to sum sessions between the start and end date of row x.
sum_of_sessions <- function (x) {sum(TBL2$sessions[TBL2$Date %within% interval(TBL1$StartDate[x], TBL1$EndDate[x])])}
# Store the results in TBL$Sessions.
for (i in nrow(TBL1)) {TBL1$Sessions[i] <- sum_of_sessions(i)}
我有一个 table(我们称之为 TBL1),它在某些日期有一个开始和结束列。
Table 2 (TBL2) 是通过从 Google Analytics 中提取一年的会话数据创建的。
TBL2中唯一的变量是Dates(daily)和session(对应每一天)
TBL1 中的变量是开始日期、结束日期、会话(空)
我想从 TBL2 获取 TBL1$StartDate 和 TBL1$EndDate 之间的会话总和,并将会话总和放入 TBL1$sessions。
这在 R 中可行吗?
示例代码如下: 请注意,TBL2 数据实际上来自 Google Analytics。在跨越一年的实际问题中,我也有 100 多个开始和结束日期。
StartDate <-c("2017-01-01", "2017-01-09","2017-01-18", "2017-01-07")
EndDate <- c("2017-01-05", "2017-01-11", "2017-01-25", "2017-01-28" )
Sessions <- c(" ", " ", " ", " ")
TBL1 <- data.frame(StartDate, EndDate, Sessions)
as.Date(TBL1$StartDate)
as.Date(TBL1$EndDate)
StartDate EndDate Sessions
2017-01-01 2017-01-05
2017-01-09 2017-01-11
2017-01-18 2017-01-25
2017-01-07 2017-01-28
Date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06","2017-01-07","2017-01-08","2017-01-09","2017-01-10","2017-01-11","2017-01-12","2017-01-13","2017-01-14","2017-01-15","2017-01-16","2017-01-17","2017-01-18","2017-01-19","2017-01-20","2017-01-21","2017-01-22","2017-01-23","2017-01-24","2017-01-25","2017-01-26","2017-01-27","2017-01-28","2017-01-29","2017-01-30","2017-01-31")
sessions <- sample(200:5000,31)
TBL2 <- data.frame(Date, sessions)
as.Date(TBL2$Date)
Date sessions
2017-01-01 1920
2017-01-02 1276
2017-01-03 1604
2017-01-04 4283
2017-01-05 4170
2017-01-06 2870
2017-01-07 2255
2017-01-08 3660
2017-01-09 290
2017-01-10 4024
2017-01-11 1433
2017-01-12 2168
2017-01-13 2096
2017-01-14 4649
2017-01-15 836
2017-01-16 3354
2017-01-17 2366
2017-01-18 1450
2017-01-19 2067
2017-01-20 4172
2017-01-21 3081
2017-01-22 3060
2017-01-23 417
2017-01-24 3422
2017-01-25 2905
2017-01-26 427
2017-01-27 2163
2017-01-28 2221
2017-01-29 2350
2017-01-30 3529
2017-01-31 4156
EndOutput <- data.frame(StartDate, EndDate, Session)
StartDate EndDate Session
2017-01-01 2017-01-05 13253
2017-01-09 2017-01-11 5747
2017-01-18 2017-01-25 20574
2017-01-07 2017-01-28 49094
我不知道如何正确地对其进行矢量化,但这里有一种使用 lubridate 的 %within% 运算符进行此操作的快速而肮脏的方法:
library(lubridate)
# Convert our character dates into lubridate date-times.
TBL1$StartDate <- ymd(TBL1$StartDate)
TBL2$StartDate <- ymd(TBL1$StartDate)
TBL2$Date <- ymd(TBL2$Date)
# Define a helper function to sum sessions between the start and end date of row x.
sum_of_sessions <- function (x) {sum(TBL2$sessions[TBL2$Date %within% interval(TBL1$StartDate[x], TBL1$EndDate[x])])}
# Store the results in TBL$Sessions.
for (i in nrow(TBL1)) {TBL1$Sessions[i] <- sum_of_sessions(i)}