按名称排列的每一行最近 21 天的事件
Events in last 21 days for every row by Name
这就是我的数据框的样子。最右边的两列是我想要的 columns.These 两列检查条件是否在过去 21 天内有 "Email" ActivityType 以及在过去 21 天内是否有 "Webinar" ActivityType。
Name ActivityType ActivityDate Email(last21days) Webinar(last21day)**
John Email 1/1/2014 TRUE NA
John Webinar 1/5/2014 TRUE TRUE
John Sale 1/20/2014 TRUE TRUE
John Webinar 3/25/2014 NA TRUE
John Sale 4/1/2014 NA TRUE
John Sale 7/1/2014 NA NA
Tom Email 1/1/2015 TRUE NA
Tom Webinar 1/5/2015 TRUE TRUE
Tom Sale 1/20/2015 TRUE TRUE
Tom Webinar 3/25/2015 NA TRUE
Tom Sale 4/1/2015 NA TRUE
Tom Sale 7/1/2015 NA NA
根据此处的帮助:
我试过了:
df$ActivityDate <- as.Date(df$ActivityDate)
library(data.table)
setDT(df)
setkey(df, Name,ActivityDate)
Elsetemp <- df[, .(Name, ActivityDate, ActivityType)]
df[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")),
Webinar21 = as.logical(which(i.ActivityType == "Webinar"))),
roll = -21, by = .EACHI]
无济于事,因为我只得到 TRUE
行 "Sale"。例如,第二行 ActivityType = Webinar,Email21 和 Webinar21 都应为 TRUE。当我定义过去 21 天时,我试图将事件发生的那一天也包括在内。
基础 R 解决方案:
#New type of sequence function that can accept vectors
seq2 <- function(v1) {
res <- list()
for(i in seq_along(v1)) {
res[[i]] <- seq(v1[i], v1[i]+21, by='day')
}
as.Date(unlist(res), origin='1970-01-01')
}
df <- df[ ,1:3]
df$ActivityDate <- as.Date(df$ActivityDate, format='%m/%d/%Y')
#Email column
emailed <- df[df$ActivityType == 'Email', 'ActivityDate']
df$Email <- df$ActivityDate %in% seq2(emailed)
#Webinar column
webbed <- df[df$ActivityType == 'Webinar', 'ActivityDate']
df$Webinar <- df$ActivityDate %in% seq2(webbed)
首先,我们在没有示例输出的情况下对前三列进行子集化。然后用 as.Date
转换日期因子。向量 emailed
使用 Email
字符串查找 ActivityType
。创建函数 seq2
是为了查找日期和 21 天之后的日期。它创建了一个可以检查的序列。
df
# Name ActivityType ActivityDate Email Webinar
# 1 John Email 2014-01-01 TRUE FALSE
# 2 John Webinar 2014-01-05 TRUE TRUE
# 3 John Sale 2014-01-20 TRUE TRUE
# 4 John Webinar 2014-03-25 FALSE TRUE
# 5 John Sale 2014-04-01 FALSE TRUE
# 6 John Sale 2014-07-01 FALSE FALSE
# 7 Tom Email 2015-01-01 TRUE FALSE
# 8 Tom Webinar 2015-01-05 TRUE TRUE
# 9 Tom Sale 2015-01-20 TRUE TRUE
# 10 Tom Webinar 2015-03-25 FALSE TRUE
# 11 Tom Sale 2015-04-01 FALSE TRUE
# 12 Tom Sale 2015-07-01 FALSE FALSE
数据
df <- read.table(text=' Name ActivityType ActivityDate Email(last21days) Webinar(last21day)**
John Email 1/1/2014 TRUE NA
John Webinar 1/5/2014 TRUE TRUE
John Sale 1/20/2014 TRUE TRUE
John Webinar 3/25/2014 NA TRUE
John Sale 4/1/2014 NA TRUE
John Sale 7/1/2014 NA NA
Tom Email 1/1/2015 TRUE NA
Tom Webinar 1/5/2015 TRUE TRUE
Tom Sale 1/20/2015 TRUE TRUE
Tom Webinar 3/25/2015 NA TRUE
Tom Sale 4/1/2015 NA TRUE
Tom Sale 7/1/2015 NA NA', header=T)
这个怎么样?
使用 滚动连接 来自 data.table
:
require(data.table)
dt[, ActivityDate := as.Date(ActivityDate, format="%m/%d/%Y")]
setkey(dt, Name, ActivityDate)
roll_index <- function(x, types, roll=21) {
lapply(types, function(type) {
idx = x[ActivityType == type][x, roll=roll, which=TRUE]
as.logical(idx)
})
}
dt[, c("Email_21", "Webinar_21") := roll_index(dt, c("Email", "Webinar"))]
# Name ActivityType ActivityDate Email_21 Webinar_21
# 1: John Email 2014-01-01 TRUE NA
# 2: John Webinar 2014-01-05 TRUE TRUE
# 3: John Sale 2014-01-20 TRUE TRUE
# 4: John Webinar 2014-03-25 NA TRUE
# 5: John Sale 2014-04-01 NA TRUE
# 6: John Sale 2014-07-01 NA NA
# 7: Tom Email 2015-01-01 TRUE NA
# 8: Tom Webinar 2015-01-05 TRUE TRUE
# 9: Tom Sale 2015-01-20 TRUE TRUE
# 10: Tom Webinar 2015-03-25 NA TRUE
# 11: Tom Sale 2015-04-01 NA TRUE
# 12: Tom Sale 2015-07-01 NA NA
这就是我的数据框的样子。最右边的两列是我想要的 columns.These 两列检查条件是否在过去 21 天内有 "Email" ActivityType 以及在过去 21 天内是否有 "Webinar" ActivityType。
Name ActivityType ActivityDate Email(last21days) Webinar(last21day)**
John Email 1/1/2014 TRUE NA
John Webinar 1/5/2014 TRUE TRUE
John Sale 1/20/2014 TRUE TRUE
John Webinar 3/25/2014 NA TRUE
John Sale 4/1/2014 NA TRUE
John Sale 7/1/2014 NA NA
Tom Email 1/1/2015 TRUE NA
Tom Webinar 1/5/2015 TRUE TRUE
Tom Sale 1/20/2015 TRUE TRUE
Tom Webinar 3/25/2015 NA TRUE
Tom Sale 4/1/2015 NA TRUE
Tom Sale 7/1/2015 NA NA
根据此处的帮助:
df$ActivityDate <- as.Date(df$ActivityDate)
library(data.table)
setDT(df)
setkey(df, Name,ActivityDate)
Elsetemp <- df[, .(Name, ActivityDate, ActivityType)]
df[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")),
Webinar21 = as.logical(which(i.ActivityType == "Webinar"))),
roll = -21, by = .EACHI]
无济于事,因为我只得到 TRUE
行 "Sale"。例如,第二行 ActivityType = Webinar,Email21 和 Webinar21 都应为 TRUE。当我定义过去 21 天时,我试图将事件发生的那一天也包括在内。
基础 R 解决方案:
#New type of sequence function that can accept vectors
seq2 <- function(v1) {
res <- list()
for(i in seq_along(v1)) {
res[[i]] <- seq(v1[i], v1[i]+21, by='day')
}
as.Date(unlist(res), origin='1970-01-01')
}
df <- df[ ,1:3]
df$ActivityDate <- as.Date(df$ActivityDate, format='%m/%d/%Y')
#Email column
emailed <- df[df$ActivityType == 'Email', 'ActivityDate']
df$Email <- df$ActivityDate %in% seq2(emailed)
#Webinar column
webbed <- df[df$ActivityType == 'Webinar', 'ActivityDate']
df$Webinar <- df$ActivityDate %in% seq2(webbed)
首先,我们在没有示例输出的情况下对前三列进行子集化。然后用 as.Date
转换日期因子。向量 emailed
使用 Email
字符串查找 ActivityType
。创建函数 seq2
是为了查找日期和 21 天之后的日期。它创建了一个可以检查的序列。
df
# Name ActivityType ActivityDate Email Webinar
# 1 John Email 2014-01-01 TRUE FALSE
# 2 John Webinar 2014-01-05 TRUE TRUE
# 3 John Sale 2014-01-20 TRUE TRUE
# 4 John Webinar 2014-03-25 FALSE TRUE
# 5 John Sale 2014-04-01 FALSE TRUE
# 6 John Sale 2014-07-01 FALSE FALSE
# 7 Tom Email 2015-01-01 TRUE FALSE
# 8 Tom Webinar 2015-01-05 TRUE TRUE
# 9 Tom Sale 2015-01-20 TRUE TRUE
# 10 Tom Webinar 2015-03-25 FALSE TRUE
# 11 Tom Sale 2015-04-01 FALSE TRUE
# 12 Tom Sale 2015-07-01 FALSE FALSE
数据
df <- read.table(text=' Name ActivityType ActivityDate Email(last21days) Webinar(last21day)**
John Email 1/1/2014 TRUE NA
John Webinar 1/5/2014 TRUE TRUE
John Sale 1/20/2014 TRUE TRUE
John Webinar 3/25/2014 NA TRUE
John Sale 4/1/2014 NA TRUE
John Sale 7/1/2014 NA NA
Tom Email 1/1/2015 TRUE NA
Tom Webinar 1/5/2015 TRUE TRUE
Tom Sale 1/20/2015 TRUE TRUE
Tom Webinar 3/25/2015 NA TRUE
Tom Sale 4/1/2015 NA TRUE
Tom Sale 7/1/2015 NA NA', header=T)
这个怎么样?
使用 滚动连接 来自 data.table
:
require(data.table)
dt[, ActivityDate := as.Date(ActivityDate, format="%m/%d/%Y")]
setkey(dt, Name, ActivityDate)
roll_index <- function(x, types, roll=21) {
lapply(types, function(type) {
idx = x[ActivityType == type][x, roll=roll, which=TRUE]
as.logical(idx)
})
}
dt[, c("Email_21", "Webinar_21") := roll_index(dt, c("Email", "Webinar"))]
# Name ActivityType ActivityDate Email_21 Webinar_21
# 1: John Email 2014-01-01 TRUE NA
# 2: John Webinar 2014-01-05 TRUE TRUE
# 3: John Sale 2014-01-20 TRUE TRUE
# 4: John Webinar 2014-03-25 NA TRUE
# 5: John Sale 2014-04-01 NA TRUE
# 6: John Sale 2014-07-01 NA NA
# 7: Tom Email 2015-01-01 TRUE NA
# 8: Tom Webinar 2015-01-05 TRUE TRUE
# 9: Tom Sale 2015-01-20 TRUE TRUE
# 10: Tom Webinar 2015-03-25 NA TRUE
# 11: Tom Sale 2015-04-01 NA TRUE
# 12: Tom Sale 2015-07-01 NA NA