在多年数据集中筛选特定日期范围

Question

我有一个包含多年数据的数据框。我试图过滤这些年中每一年的特定日期，但我不确定如何执行此操作。在下面的代码中，我过滤了一年中属于两个不同范围（即 df2 和 df3）的数据。我如何修改此代码以适用于我的数据集中的所有年份？

我希望代码用范围内的所有日期过滤 IDs，而不包括范围内一天缺失的任何数据。

library(dplyr)
library(lubridate)

ID <-  rep(c("A","B","C", "D"), 5000)
date <-  rep_len(seq(dmy("01-01-2010"), dmy("31-01-2015"), by = "days"), 5000)
x <-  runif(length(date), min = 60000, max = 80000)
y <-  runif(length(date), min = 800000, max = 900000)

df <- data.frame(date = date, 
                 x = x,
                 y =y,
                 ID)


df2 <- df %>% 
  filter(date >= "2010-01-01", date <= "2010-01-31")

df3 <- df %>% 
  filter(date >= "2010-07-01", date <= "2010-07-31")

Answer 1

将'date'转换为Dateclass（ymd），通过[=16将group_split转换为list =]，然后通过 filter 在使用 make_date 创建的日期创建两个数据集，并且 return 'df2' 和 'df3' 的 list对于每个 'year'（嵌套列表）

library(dplyr)
library(lubridate)
library(purrr)
out <- df %>%
      mutate(date = ymd(date)) %>% 
      group_split(yr = year(date), .keep = FALSE) %>% 
      map(~ {
        df2 <- .x %>% 
         filter(date >= make_date(year(first(date)), 1, 1), 
              date <= make_date(year(first(date)), 1, 31))
        df3 <-  .x %>% 
          filter(date >= make_date(year(first(date)), 7, 1),
          date <= make_date(year(first(date)), 7, 31))
    list(df2, df3)
} )

另一种选择是创建另一个 'year' 相同的列（同时考虑 leap 年）

library(stringr)
df1 <- df %>%
     mutate(date1 = ymd(str_replace(date, '^\d{4}', '2020')))

然后使用 OP 的代码在 'date1'

上进行子集化

在多年数据集中筛选特定日期范围

Filter a specific range of dates across mult-year data set

r

filter

lubridate

dplyr