基于另一个因素的日期最接近但早于给定日期

Question

我的数据类似如下：

data.frame(date=c("2013-07","2013-08","2013-09","2013-11",
              "2013-11","2013-11","2014-02","2014-03"),
       X=c("0","1","0","0","1","0","1","0"))

  date    x 
1 2013-07 0 
2 2013-08 1
3 2013-09 0 
4 2013-11 0 
5 2013-11 1
6 2013-11 0
7 2014-02 1

我想创建一个新功能，显示不晚于当前日期但最接近当前日期 x=1

的数据

date      x  lastdate
1 2013-07 0       NA
2 2013-08 1  2013-08
3 2013-09 0  2013-08
4 2013-11 0  2013-11
5 2013-11 1  2013-11
6 2013-11 0  2013-11
7 2014-02 1  2014-02
8 2014-03 0  2014-02

Answer 1

一个有效的解决方案是使用 fintInterval 并仅在 x == "1" 内搜索。我在开头添加了 NA_character_ 以防 findInterval returns 为零（如第一行）

一些方法论解释：

这里的基本思想是在df$date[df$X == "1"]的开头添加一个NA，然后在原来的df$date[df$X == "1"]向量中搜索。每当 df$date 中的值优先于 df$date[df$X == "1"] 中的所有值时，findInterval 为其分配一个 0。这个零最终应该变成 NA，因此我们将 +1 添加到 findInterval 找到的所有事件，并在新向量（包含 NA）中搜索。这样，所有 0 都变成 1，因此，它们被分配 NA，因为 NA 是新向量中的第一个值。

df[["lastdate"]] <- c(NA_character_, 
                      as.character(df$date[df$X == "1"]))[findInterval(df$date, df$date[df$X == "1"]) + 1]
df
#      date X lastdate
# 1 2013-07 0     <NA>
# 2 2013-08 1  2013-08
# 3 2013-09 0  2013-08
# 4 2013-11 0  2013-11
# 5 2013-11 1  2013-11
# 6 2013-11 0  2013-11
# 7 2014-02 1  2014-02
# 8 2014-03 0  2014-02

或者（这有一个 dplyr 标签）

library(dplyr)
df %>%
   mutate(lastdate = c(NA_character_, as.character(date[X == "1"]))[findInterval(date, date[X == "1"]) + 1])

作为旁注，使用 numeric X 而不是 character 可能更容易，并且 character 或 zoo::yearmon 而不是 factor（很难修改）date 列

基于另一个因素的日期最接近但早于给定日期

Date closest but earlier than a given date based on another factor

r

data-manipulation

dplyr