为具有条件的日期创建过滤器

Question

我想使用日期和分类因子 (stand) 创建一个过滤数据框操作。我在 mydf 数据框和条件数据框中也有原始数据 (mycond)。我想创建一个新的数据框 (newdf)，条件是分类因子 (mydf$stand) 存在于 (mycond$stand) 中，然后获取所有 mycond$stand此分类因子的 (=<) (mycond$date) 之前和之前的日期行。如果分类因子不在 mycond$stand 中，请不要做任何事情并获取该因子的所有行。在我的例子中：

#Packages
library(dplyr)
library(lubridate) 

# Create my complete data frame
N<-100
ID<-1:N
stand <- rep(c("KBFAS1755G","DDHOF8674C","WJFZM8772L","EOFDS2812H","VMZWG2258I"),20)
variable <- rnorm(N)
mydf <- data.frame(ID=ID,stand=stand,variable=variable)
mydf$dates <- sample(seq(as.Date('2019/06/01'), Sys.Date(), by="day"), N)
mydf <- mydf %>%
         mutate(dates = ymd(dates)) 
str(mydf) 

#'data.frame':  100 obs. of  5 variables:
# $ ID      : chr  "1" "2" "3" "4" ...
# $ stand   : chr  "KBFAS1755G" "DDHOF8674C" "WJFZM8772L" "EOFDS2812H" ...
# $ date    : Date, format: NA NA NA NA ...
# $ variable: chr  "-1.07890610087943" "0.290143376807384" "0.395138836710153" 
#"-0.310578696329384" ...
# $ dates   : Date, format: "2021-02-23" "2020-04-23" "2019-04-03" "2020-02-19" ...   

 

# Create my conditional data frame
stand<-c("KBFAS1755G","DDHOF8674C","EOFDS2812H")
mycond <- data.frame(stand=stand)
mycond$dates<-c('2021/05/03','2021/01/01','2021/02/12')
mycond <- mycond %>%
         mutate(dates = ymd(dates)) 
str(mycond)

#'data.frame':  3 obs. of  2 variables:
# $ stand: chr  "KBFAS1755G" "DDHOF8674C" "EOFDS2812H"
# $ dates: Date, format: "2021-05-03" "2021-01-01" "2021-02-12"

# Create a new data frame data before dates filter() or another approach
newdf %>% group_by(stand,dates) %>% dplyr::filter(...)

这里我的大脑被冻结了，因为需要一个新的数据框 (newdf)，其中包含 mycond 中 dates 之前的数据。请问有什么解决办法吗？

Answer 1

您可以按 'stand' 连接两个数据集，如果 mydf 中的日期小于等于 mycond 中的日期，则保留行。

library(dplyr)

mydf %>% inner_join(mycond, by = 'stand') %>% filter(dates.x <= dates.y)

在基础 R 中 -

subset(merge(mydf, mycond, by = 'stand'), dates.x <= dates.y)

如果您使用 fuzzyjoin 可以按范围加入，则可以避免 filter 步骤。

fuzzyjoin::fuzzy_inner_join(mydf, mycond, 
            by = c('stand', 'dates'), match_fun = c(`==`, `<=`))

为具有条件的日期创建过滤器

Create a filter for dates with a condition

r

date

lubridate

dplyr