如何根据条件使用 R 插入缺失的 dates/times？

Question

如下所示的数据框。 3名工作人员有天时读，但不完整（每个工作人员每天应有24次读）。

了解到工作人员每天阅读的次数不同。现在只对当天阅读量最多的员工感兴趣。

有很多天。想要为当天最多的行插入缺失的（每小时）行。也就是说，2018-03-02 仅为 Jack's 插入，2018-03-03 仅为 David 插入，2018-03-04 仅为 Kate 插入。

我尝试了 this question 中的这些行（即使它们全部填满而没有区别）但没有到达那里。

如何在 R 中完成？

date_time <- c("2/3/2018 0:00","2/3/2018 1:00","2/3/2018 2:00","2/3/2018 3:00","2/3/2018 5:00","2/3/2018 6:00","2/3/2018 7:00","2/3/2018 8:00","2/3/2018 9:00","2/3/2018 10:00","2/3/2018 11:00","2/3/2018 12:00","2/3/2018 13:00","2/3/2018 14:00","2/3/2018 16:00","2/3/2018 17:00","2/3/2018 18:00","2/3/2018 19:00","2/3/2018 21:00","2/3/2018 22:00","2/3/2018 23:00","3/3/2018 0:00","3/3/2018 0:00","3/3/2018 1:00","3/3/2018 2:00","3/3/2018 4:00","3/3/2018 5:00","3/3/2018 7:00","3/3/2018 8:00","3/3/2018 9:00","3/3/2018 11:00","3/3/2018 12:00","3/3/2018 14:00","3/3/2018 15:00","3/3/2018 17:00","3/3/2018 18:00","3/3/2018 20:00","3/3/2018 22:00","3/3/2018 23:00","4/3/2018 0:00","4/3/2018 0:00","4/3/2018 1:00","4/3/2018 2:00","4/3/2018 3:00","4/3/2018 5:00","4/3/2018 6:00","4/3/2018 7:00","4/3/2018 8:00","4/3/2018 10:00","4/3/2018 11:00","4/3/2018 12:00","4/3/2018 14:00","4/3/2018 15:00","4/3/2018 16:00","4/3/2018 17:00","4/3/2018 19:00","4/3/2018 20:00","4/3/2018 22:00","4/3/2018 23:00")
staff <- c("Jack","Jack","Kate","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Kate","Jack","Jack","Jack","David","David","Jack","Kate","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","Jack","Kate","David","David","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Jack")
reading <- c(7.5,8.3,7,6.9,7.1,8.1,8.4,8.8,6,7.1,8.9,7.3,7.4,6.9,11.3,18.8,4.6,6.7,7.7,7.8,7,7,6.6,6.8,6.7,6.1,7.1,6.3,7.2,6,5.8,6.6,6.5,6.4,7.2,8.4,6.5,6.5,5.5,6.7,7,7.5,6.5,7.5,7.2,6.3,7.3,8,7,8.2,6.5,6.8,7.5,7,6.1,5.7,6.7,4.3,6.3)
df <- data.frame(date_time, staff, reading)

Answer 1

试试这个代码：

确定每个小时和所有员工

date_h<-seq(as.POSIXlt(min(date_time),format="%d/%m/%Y %H:%M"),as.POSIXlt(max(date_time),format="%d/%m/%Y %H:%M"),by=60*60)
staff_u<-unique(staff)
comb<-expand.grid(staff_u,date_h)
colnames(comb)<-c("staff","date_time")

df

统一日期格式

df$date_time<-as.POSIXlt(df$date_time,format="%d/%m/%Y %H:%M")

合并信息

out<-merge(comb,df,all.x=T)

你的输出：

head(out)
  staff           date_time reading
1  Jack 2018-03-02 00:00:00     7.5
2  Jack 2018-03-02 01:00:00     8.3
3  Jack 2018-03-02 02:00:00      NA
4  Jack 2018-03-02 03:00:00     6.9
5  Jack 2018-03-02 04:00:00      NA
6  Jack 2018-03-02 05:00:00     7.1

Answer 2

选项是单独执行此操作。创建感兴趣日期的 data.table 和相应的 'staff'，并获得日期时间的完整序列，然后我们 rbind 使用原始数据集并使用条件，我们总结了数据

library(data.table)
stf <- c("Jack", "David", "Kate")
date <- as.Date(c("2018-03-02", "2018-03-03", "2018-03-04"))
df1 <- data.table(date, staff= stf)[, .(date_time = seq(as.POSIXct(paste(date, "00:00:00"), 
       tz = "GMT"),
           length.out = 24, by = "1 hour")), staff]

setDT(df)[, date_time := as.POSIXct(date_time, "%d/%m/%Y %H:%M", tz = "GMT")]
res <- rbindlist(list(df, df1), fill = TRUE)[, 
     .(reading = if(any(is.na(reading))) sum(reading, na.rm = TRUE) else reading),
         .(staff, date_time)]

table(res$staff, as.Date(res$date_time))

#         2018-03-02 2018-03-03 2018-03-04
#  David          3         24          2
#  Jack          24          1          1
#  Kate           3          1         24

head(res)
#   staff           date_time reading
#1:  Jack 2018-03-02 00:00:00     7.5
#2:  Jack 2018-03-02 01:00:00     8.3
#3:  Kate 2018-03-02 02:00:00     7.0
#4:  Jack 2018-03-02 03:00:00     6.9
#5:  Jack 2018-03-02 05:00:00     7.1
#6:  Jack 2018-03-02 06:00:00     8.1

tail(res)
#   staff           date_time reading
#1:  Kate 2018-03-04 04:00:00       0
#2:  Kate 2018-03-04 09:00:00       0
#3:  Kate 2018-03-04 13:00:00       0
#4:  Kate 2018-03-04 18:00:00       0
#5:  Kate 2018-03-04 21:00:00       0
#6:  Kate 2018-03-04 23:00:00       0

如何根据条件使用 R 插入缺失的 dates/times？

How to insert missing dates/times using R based on criteria?

r

time-series

missing-data

dataframe