根据上一行中的值设置时间序列中的日期值
Set date value in time series based on value in previous row
我有以下内容:
df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA
我想更新 new.date 以便对于任何给定的行,如果 ref.date 不是 NA,则 new.date 等于 ref.date,并且等于如果 ref.date 为 NA,则前一行中 new.date 的值。所以结果将是:
A ref.date new.date
1 <NA> NA
2 10/12/18 10/12/18
3 <NA> 10/12/18
4 <NA> 10/12/18
5 12/15/19 12/15/19
6 <NA> 12/15/19
7 <NA> 12/15/19
8 <NA> 12/15/19
我试过了
library(dplyr)
df <- df %>% mutate(new.date = ifelse(is.na(ref.date), lag(new.date), ref.date))
df$new.date <- as.Date(df$new.date, format = "%m/%d/%y")
但是这产生了数字格式的日期并且没有正确填充 ref.date 是 NA 的行。
我认为应该这样做:
df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
df %>%
mutate(new.date = ref.date) %>%
fill(`new.date`, .direction = "down")
#> A ref.date new.date
#> 1 1 <NA> <NA>
#> 2 2 2018-10-12 2018-10-12
#> 3 3 <NA> 2018-10-12
#> 4 4 <NA> 2018-10-12
#> 5 5 2019-12-15 2019-12-15
#> 6 6 <NA> 2019-12-15
#> 7 7 <NA> 2019-12-15
#> 8 8 <NA> 2019-12-15
我们可以将 ref.date
复制到 new.date
列,然后使用 tidyr
中的 fill
library(dplyr)
df %>% mutate(new.date = ref.date) %>% tidyr::fill(new.date)
# A ref.date new.date
#1 1 <NA> <NA>
#2 2 2018-10-12 2018-10-12
#3 3 <NA> 2018-10-12
#4 4 <NA> 2018-10-12
#5 5 2019-12-15 2019-12-15
#6 6 <NA> 2019-12-15
#7 7 <NA> 2019-12-15
#8 8 <NA> 2019-12-15
这里有一些基本的 R 解决方案。
- 使用
rle()
+ cumsum()
:
df$new.date <- with(rle(cumsum(!is.na(df$ref.date))),
rep(df$ref.date[c(0,cumsum(lengths[-length(lengths)]))+1],lengths))
- 使用
split()
+ rbind()
:
df <- do.call(rbind,
c(make.row.names = F,
lapply(split(df,cumsum(!is.na(df$ref.date))),
function(v) cbind(v,new.date = head(v$ref.date,1)))))
这样
> df
A ref.date new.date
1 1 <NA> <NA>
2 2 2018-10-12 2018-10-12
3 3 <NA> 2018-10-12
4 4 <NA> 2018-10-12
5 5 2019-12-15 2019-12-15
6 6 <NA> 2019-12-15
7 7 <NA> 2019-12-15
8 8 <NA> 2019-12-15
我有以下内容:
df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA
我想更新 new.date 以便对于任何给定的行,如果 ref.date 不是 NA,则 new.date 等于 ref.date,并且等于如果 ref.date 为 NA,则前一行中 new.date 的值。所以结果将是:
A ref.date new.date
1 <NA> NA
2 10/12/18 10/12/18
3 <NA> 10/12/18
4 <NA> 10/12/18
5 12/15/19 12/15/19
6 <NA> 12/15/19
7 <NA> 12/15/19
8 <NA> 12/15/19
我试过了
library(dplyr)
df <- df %>% mutate(new.date = ifelse(is.na(ref.date), lag(new.date), ref.date))
df$new.date <- as.Date(df$new.date, format = "%m/%d/%y")
但是这产生了数字格式的日期并且没有正确填充 ref.date 是 NA 的行。
我认为应该这样做:
df <- data.frame(A = c(1:8), ref.date = c(NA, "10/12/18", NA, NA, "12/15/19", NA, NA, NA))
df$ref.date <- as.Date(df$ref.date, format = "%m/%d/%y")
df$new.date <- NA
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
df %>%
mutate(new.date = ref.date) %>%
fill(`new.date`, .direction = "down")
#> A ref.date new.date
#> 1 1 <NA> <NA>
#> 2 2 2018-10-12 2018-10-12
#> 3 3 <NA> 2018-10-12
#> 4 4 <NA> 2018-10-12
#> 5 5 2019-12-15 2019-12-15
#> 6 6 <NA> 2019-12-15
#> 7 7 <NA> 2019-12-15
#> 8 8 <NA> 2019-12-15
我们可以将 ref.date
复制到 new.date
列,然后使用 tidyr
fill
library(dplyr)
df %>% mutate(new.date = ref.date) %>% tidyr::fill(new.date)
# A ref.date new.date
#1 1 <NA> <NA>
#2 2 2018-10-12 2018-10-12
#3 3 <NA> 2018-10-12
#4 4 <NA> 2018-10-12
#5 5 2019-12-15 2019-12-15
#6 6 <NA> 2019-12-15
#7 7 <NA> 2019-12-15
#8 8 <NA> 2019-12-15
这里有一些基本的 R 解决方案。
- 使用
rle()
+cumsum()
:
df$new.date <- with(rle(cumsum(!is.na(df$ref.date))),
rep(df$ref.date[c(0,cumsum(lengths[-length(lengths)]))+1],lengths))
- 使用
split()
+rbind()
:
df <- do.call(rbind,
c(make.row.names = F,
lapply(split(df,cumsum(!is.na(df$ref.date))),
function(v) cbind(v,new.date = head(v$ref.date,1)))))
这样
> df
A ref.date new.date
1 1 <NA> <NA>
2 2 2018-10-12 2018-10-12
3 3 <NA> 2018-10-12
4 4 <NA> 2018-10-12
5 5 2019-12-15 2019-12-15
6 6 <NA> 2019-12-15
7 7 <NA> 2019-12-15
8 8 <NA> 2019-12-15