使用历史数据 [前期] 创建新变量
using historical data [prior period] to create new variable
我想创建两个考虑先前历史记录的列[t-1] 以创建新列来指定当前期间重复或执行的新活动和旧活动的数量[数据见下文structure]。例如第 5 行,算法应该将 'think' 的新事件与之前的 [read, write] 事件进行比较,因为 t-1 中没有之前的 'think',它被列为 1 [for new] 期间 3 [第 5 行] 没有使用旧事件,因此它是 0。
event<- c('read', 'write', 'read', 'write', 'think', 'read', 'think', 'read')
person<- c('arun', 'arun','arun','arun','arun','john','john', 'john')
time <- c(1, 1,2,2,3,1,2,3)
df<- data.frame(event, person,time)
event person time new old
read arun 1 . .
write arun 1 . .
read arun 2 0 2
write arun 2 0 2
think arun 3 1 0
read john 1 . .
think john 2 1 0
read john 3 1 0
关于如何实现这一点有什么建议吗?
event<- c('read', 'write', 'read', 'write', 'think', 'read', 'think', 'read')
person<- c('arun', 'arun','arun','arun','arun','john','john', 'john')
time <- c(1, 1,2,2,3,1,2,3)
df<- data.frame(event, person,time)
library(tidyverse, warn.conflicts = FALSE)
df %>%
group_by(person, time) %>%
summarise(new = list(event), .groups = 'drop') %>%
group_by(person) %>%
mutate(old = map2_int(new, lag(new), ~ sum(.x %in% .y)),
new = map_int(new, length) - old) %>%
mutate(across(new:old, ~ifelse(time == 1, NA, .))) %>%
left_join( df, ., by = c('person', 'time'))
#> event person time new old
#> 1 read arun 1 NA NA
#> 2 write arun 1 NA NA
#> 3 read arun 2 0 2
#> 4 write arun 2 0 2
#> 5 think arun 3 1 0
#> 6 read john 1 NA NA
#> 7 think john 2 1 0
#> 8 read john 3 1 0
由 reprex package (v2.0.0)
创建于 2021-07-21
我想创建两个考虑先前历史记录的列[t-1] 以创建新列来指定当前期间重复或执行的新活动和旧活动的数量[数据见下文structure]。例如第 5 行,算法应该将 'think' 的新事件与之前的 [read, write] 事件进行比较,因为 t-1 中没有之前的 'think',它被列为 1 [for new] 期间 3 [第 5 行] 没有使用旧事件,因此它是 0。
event<- c('read', 'write', 'read', 'write', 'think', 'read', 'think', 'read')
person<- c('arun', 'arun','arun','arun','arun','john','john', 'john')
time <- c(1, 1,2,2,3,1,2,3)
df<- data.frame(event, person,time)
event person time new old
read arun 1 . .
write arun 1 . .
read arun 2 0 2
write arun 2 0 2
think arun 3 1 0
read john 1 . .
think john 2 1 0
read john 3 1 0
关于如何实现这一点有什么建议吗?
event<- c('read', 'write', 'read', 'write', 'think', 'read', 'think', 'read')
person<- c('arun', 'arun','arun','arun','arun','john','john', 'john')
time <- c(1, 1,2,2,3,1,2,3)
df<- data.frame(event, person,time)
library(tidyverse, warn.conflicts = FALSE)
df %>%
group_by(person, time) %>%
summarise(new = list(event), .groups = 'drop') %>%
group_by(person) %>%
mutate(old = map2_int(new, lag(new), ~ sum(.x %in% .y)),
new = map_int(new, length) - old) %>%
mutate(across(new:old, ~ifelse(time == 1, NA, .))) %>%
left_join( df, ., by = c('person', 'time'))
#> event person time new old
#> 1 read arun 1 NA NA
#> 2 write arun 1 NA NA
#> 3 read arun 2 0 2
#> 4 write arun 2 0 2
#> 5 think arun 3 1 0
#> 6 read john 1 NA NA
#> 7 think john 2 1 0
#> 8 read john 3 1 0
由 reprex package (v2.0.0)
创建于 2021-07-21