基于先前观察的新变量

New Variable Based on Previous Observation

我有睡眠数据,其中一个字段表示我儿子上床睡觉的时间,另一个字段表示他醒来的时间。我想创建一个新变量来计算总睡眠时间。换句话说,我需要前一天每天的起床时间和就寝时间之间的差异。

library(lubridate)

day <- mdy("8/27/19","8/28/19","8/29/19")
wake <- hms("7:35:00","7:45:00","7:30:00")
bed <- hms("19:45:00","20:15:00","20:00:00")
toy_data <- tibble(day, wake, bed)

我试着用循环来做这个:

toy_data$sleeptime <- NA

for(i in 1:nrow(toy_data)){
  toy_data$sleeptime[i] <- as.duration(toy_data$bed[i-1] - toy_data$wake[i])
}

但出现错误:

replacement has length zero
library(dplyr)
toy_data %>%
  mutate(sleep_time = as.duration(wake) - lag(as.duration(bed)) + dhours(24))

# A tibble: 3 x 4
  day        wake         bed          sleep_time           
  <date>     <S4: Period> <S4: Period> <S4: Duration>       
1 2019-08-27 7H 35M 0S    19H 45M 0S   NA                   
2 2019-08-28 7H 45M 0S    20H 15M 0S   43200s (~12 hours)   
3 2019-08-29 7H 30M 0S    20H 0M 0S    40500s (~11.25 hours)

我没有幸运地以 S4: Period 形式获得原始列的 lag,但是当我在两个持续时间之间进行计算时它起作用了。

ind = match(toy_data$day - 1, toy_data$day)
as.numeric(ymd_hms(paste(toy_data$day, toy_data$wake)) - 
               ymd_hms(paste(toy_data$day[ind], toy_data$bed[ind])))
#[1]    NA 12.00 11.25
#Warning message:
# 1 failed to parse. 

可能会使事情变得更简单的一个选项是传递字符串而不是 mdyhms 对象,这样可以很容易地将每一列转换为一个完整的日期对象,这很容易操纵:

library(tidyverse)
library(lubridate)

day <- c("8/27/19","8/28/19","8/29/19")
wake <- c("7:35:00","7:45:00","7:30:00")
bed <- c("19:45:00","20:15:00","20:00:00")
toy_data <- tibble(day, wake, bed)

toy_data %>% 
  mutate(
    wake = mdy_hms(str_c(day, wake, sep = " ")),
    bed = mdy_hms(str_c(day, bed, sep = " ")),
    sleep = lead(wake) - bed
  )

# A tibble: 3 x 4
  day     wake                bed                 sleep      
  <chr>   <dttm>              <dttm>              <drtn>     
1 8/27/19 2019-08-27 07:35:00 2019-08-27 19:45:00 12.00 hours
2 8/28/19 2019-08-28 07:45:00 2019-08-28 20:15:00 11.25 hours
3 8/29/19 2019-08-29 07:30:00 2019-08-29 20:00:00    NA hours

然后您可以使用 lead 访问第二天的起床时间,或者如果您希望睡眠时间在第二天显示,则对 lag 进行小的改动。