根据上一行中的值设置时间序列中的日期值

Set date value in time series based on value in previous row

我有以下内容:

df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA

我想更新 new.date 以便对于任何给定的行,如果 ref.date 不是 NA,则 new.date 等于 ref.date,并且等于如果 ref.date 为 NA,则前一行中 new.date 的值。所以结果将是:

A  ref.date new.date
1     <NA>       NA
2 10/12/18       10/12/18
3     <NA>       10/12/18
4     <NA>       10/12/18
5 12/15/19       12/15/19
6     <NA>       12/15/19
7     <NA>       12/15/19
8     <NA>       12/15/19

我试过了

library(dplyr)
df <- df %>% mutate(new.date = ifelse(is.na(ref.date), lag(new.date), ref.date))
df$new.date <- as.Date(df$new.date, format = "%m/%d/%y")

但是这产生了数字格式的日期并且没有正确填充 ref.date 是 NA 的行。

我认为应该这样做:

df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

df %>%
  mutate(new.date = ref.date) %>% 
  fill(`new.date`, .direction = "down")
#>   A   ref.date   new.date
#> 1 1       <NA>       <NA>
#> 2 2 2018-10-12 2018-10-12
#> 3 3       <NA> 2018-10-12
#> 4 4       <NA> 2018-10-12
#> 5 5 2019-12-15 2019-12-15
#> 6 6       <NA> 2019-12-15
#> 7 7       <NA> 2019-12-15
#> 8 8       <NA> 2019-12-15

我们可以将 ref.date 复制到 new.date 列,然后使用 tidyr

中的 fill
library(dplyr)
df %>% mutate(new.date = ref.date) %>% tidyr::fill(new.date)

#  A   ref.date   new.date
#1 1       <NA>       <NA>
#2 2 2018-10-12 2018-10-12
#3 3       <NA> 2018-10-12
#4 4       <NA> 2018-10-12
#5 5 2019-12-15 2019-12-15
#6 6       <NA> 2019-12-15
#7 7       <NA> 2019-12-15
#8 8       <NA> 2019-12-15

这里有一些基本的 R 解决方案。

  • 使用 rle() + cumsum():
df$new.date <- with(rle(cumsum(!is.na(df$ref.date))),
                    rep(df$ref.date[c(0,cumsum(lengths[-length(lengths)]))+1],lengths))
  • 使用 split() + rbind():
df <- do.call(rbind,
              c(make.row.names = F,
                lapply(split(df,cumsum(!is.na(df$ref.date))), 
                       function(v) cbind(v,new.date = head(v$ref.date,1)))))

这样

> df
  A   ref.date   new.date
1 1       <NA>       <NA>
2 2 2018-10-12 2018-10-12
3 3       <NA> 2018-10-12
4 4       <NA> 2018-10-12
5 5 2019-12-15 2019-12-15
6 6       <NA> 2019-12-15
7 7       <NA> 2019-12-15
8 8       <NA> 2019-12-15