R dataframe 按因子拆分然后应用和 tidyr
R dataframe split by factor then apply and tidyr
有 8 周的互联网实验。收集每个参与者的数据,他们可以在任何日期开始实验。这个想法是计算每个参与者在第一周、第二周等所做的练习。所以结果应该是参与者乘以 8 matrix/data 帧。
- 每个参与者可以在任何日期开始,但实验在 8 周后结束
- 每个参与者都可以做 he/she 想做的练习。
举个例子
df <- data.frame(
fac=c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c","d","d","d","d","d","d"),
date=c("2017-01-01","2017-01-05","2017-01-13","2017-01-25","2017-02-10","2017-01-06","2017-01-16","2017-01-28","2017-02-02","2017-02-07","2017-01-11","2017-01-19","2017-01-24","2017-01-31","2017-02-09","2017-01-12","2017-01-24","2017-01-29","2017-02-04","2017-02-19","2017-03-08"),
sessions=c(1,2,3,6,5,1,3,2,3,3,1,5,3,2,4,1,3,5,2,6,6)
)
我的想法是:
- 添加“0”列 (df$count<-0)
- 按因子拆分数据框 [split(df, df$fac)] 3
- 取日期值-减去第一个条目的日期值,加1,除以7,然后向上取整。 [综述((date2 -date$1$+1)/7)]。这给了我参与者做练习的确切周数。
- 使用 tidyr:重组整个数据框,以便将每周的值相加(参与者乘以 8 个数据框)
但我不知道如何正确执行步骤 3 以及如何与步骤 4 结合
非常感谢!
类似于:
library(dplyr)
df <- df %>%
group_by(fac) %>%
mutate(time = ceiling(1+difftime(as.Date(date), as.Date(date[1]), units = 'weeks')))
summarize(df, total_sessions = sum(sessions))
有 8 周的互联网实验。收集每个参与者的数据,他们可以在任何日期开始实验。这个想法是计算每个参与者在第一周、第二周等所做的练习。所以结果应该是参与者乘以 8 matrix/data 帧。
- 每个参与者可以在任何日期开始,但实验在 8 周后结束
- 每个参与者都可以做 he/she 想做的练习。
举个例子
df <- data.frame(
fac=c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c","d","d","d","d","d","d"),
date=c("2017-01-01","2017-01-05","2017-01-13","2017-01-25","2017-02-10","2017-01-06","2017-01-16","2017-01-28","2017-02-02","2017-02-07","2017-01-11","2017-01-19","2017-01-24","2017-01-31","2017-02-09","2017-01-12","2017-01-24","2017-01-29","2017-02-04","2017-02-19","2017-03-08"),
sessions=c(1,2,3,6,5,1,3,2,3,3,1,5,3,2,4,1,3,5,2,6,6)
)
我的想法是:
- 添加“0”列 (df$count<-0)
- 按因子拆分数据框 [split(df, df$fac)] 3
- 取日期值-减去第一个条目的日期值,加1,除以7,然后向上取整。 [综述((date2 -date$1$+1)/7)]。这给了我参与者做练习的确切周数。
- 使用 tidyr:重组整个数据框,以便将每周的值相加(参与者乘以 8 个数据框)
但我不知道如何正确执行步骤 3 以及如何与步骤 4 结合
非常感谢!
类似于:
library(dplyr)
df <- df %>%
group_by(fac) %>%
mutate(time = ceiling(1+difftime(as.Date(date), as.Date(date[1]), units = 'weeks')))
summarize(df, total_sessions = sum(sessions))