如果以前在 R 中的新列中看到过某个值,我该如何记录?

How do I record if a value has been previously seen in R in a new column?

我希望这是有道理的。我从 R 中的数据框 (df) 开始,如下所示:

样本类型 日期
一个 2020-10-05
B 2020-10-05
一个 2020-10-06
B 2020-10-06
B 2020-10-06
B 2020-10-06
一个 2020-10-10
一个 2020-10-11
一个 2020-10-11
一个 2020-10-15
一个 2020-10-16
一个 2020-10-17

我想创建一个 'rolling data frame' 来告诉我在过去 7 天内是否对样本类型“A”或“B”进行了抽样,第一列为“样本类型”,第二列为“日期”,第三列为“最近 7 天采样”。最后一列将填入“是”或“否”。

我可以使用以下方法达到每天对每种样本类型进行计数的程度:

library(dplyr)

count_sampletype_day <- df %>%
  group_by(sample, date) %>%
  tally

但我可能走错了路!

我的预期输出是:

样本类型 日期 最近 7 天采样
一个 2020-10-05
B 2020-10-05
一个 2020-10-06
B 2020-10-06
一个 2020-10-07
B 2020-10-07
一个 2020-10-08
B 2020-10-08
一个 2020-10-09
B 2020-10-09
一个 2020-10-10
B 2020-10-10
一个 2020-10-11
B 2020-10-11
一个 2020-10-12
B 2020-10-12
一个 2020-10-13
B 2020-10-13 没有
一个 2020-10-14
B 2020-10-14 没有
一个 2020-10-15
B 2020-10-15 没有
一个 2020-10-16
B 2020-10-16 没有
一个 2020-10-17
B 2020-10-17 没有

试试这个使用 zoo(和 dplyr,我推断你已经在使用)的解决方案:

library(dplyr)
eg <- expand.grid(Sample.Type = unique(dat$Sample.Type),
                  date = seq(min(dat$date), max(dat$date), by = "day"),
                  stringsAsFactors = FALSE)
dat %>%
  mutate(a=TRUE) %>%
  full_join(eg, by = c("Sample.Type", "date")) %>%
  mutate(a=!is.na(a)) %>%
  arrange(date) %>%
  group_by(Sample.Type) %>%
  mutate(last7 = zoo::rollapplyr(a, 7, any, partial = TRUE)) %>%
  select(-a) %>%
  ungroup() %>%
  print(n=99) 
# # A tibble: 29 x 3
#    Sample.Type date       last7
#    <chr>       <date>     <lgl>
#  1 A           2020-10-05 TRUE 
#  2 B           2020-10-05 TRUE 
#  3 A           2020-10-06 TRUE 
#  4 B           2020-10-06 TRUE 
#  5 B           2020-10-06 TRUE 
#  6 B           2020-10-06 TRUE 
#  7 A           2020-10-07 TRUE 
#  8 B           2020-10-07 TRUE 
#  9 A           2020-10-08 TRUE 
# 10 B           2020-10-08 TRUE 
# 11 A           2020-10-09 TRUE 
# 12 B           2020-10-09 TRUE 
# 13 A           2020-10-10 TRUE 
# 14 B           2020-10-10 TRUE 
# 15 A           2020-10-11 TRUE 
# 16 A           2020-10-11 TRUE 
# 17 B           2020-10-11 TRUE 
# 18 A           2020-10-12 TRUE 
# 19 B           2020-10-12 TRUE 
# 20 A           2020-10-13 TRUE 
# 21 B           2020-10-13 FALSE
# 22 A           2020-10-14 TRUE 
# 23 B           2020-10-14 FALSE
# 24 A           2020-10-15 TRUE 
# 25 B           2020-10-15 FALSE
# 26 A           2020-10-16 TRUE 
# 27 B           2020-10-16 FALSE
# 28 A           2020-10-17 TRUE 
# 29 B           2020-10-17 FALSE

数据

dat <- structure(list(Sample.Type = c("A", "B", "A", "B", "B", "B", "A", "A", "A", "A", "A", "A"), date = structure(c(18540, 18540, 18541, 18541, 18541, 18541, 18545, 18546, 18546, 18550, 18551, 18552), class = "Date")), row.names = c(NA, -12L), class = "data.frame")

Sample.Type 分组时,您只需要 lag()

  1. 玩具数据集。我刚刚添加了第三个 Sample.Type
library(dplyr)
library(lubridate)

typeday <- tibble(
    Sample.Type = c("A", "B", "A", "B", "A", "A","B", "C", "C"),
    date = as.Date(c("2020-10-05", "2020-10-05", "2020-10-06",
                     "2020-10-06", "2020-10-11", "2020-10-17",
                     "2020-10-17", "2020-10-17", "2020-10-18"))
    )

typeday
#> # A tibble: 9 x 2
#>   Sample.Type date      
#>   <chr>       <date>    
#> 1 A           2020-10-05
#> 2 B           2020-10-05
#> 3 A           2020-10-06
#> 4 B           2020-10-06
#> 5 A           2020-10-11
#> 6 A           2020-10-17
#> 7 B           2020-10-17
#> 8 C           2020-10-17
#> 9 C           2020-10-18
  1. 然后,确保类型和日期的顺序正确。按 Sample.Type 分组后,评估最后一个日期 (lag(date)) 是否比实际日期晚 7 天以上。从那里它只是清理 sampled 列。也可以取消分组后只按日期排列
typeday %>% 
    arrange(Sample.Type, date) %>% 
    group_by(Sample.Type) %>% 
    mutate(
        sampled = lag(date) >= date - days(7),
        sampled = case_when(
            sampled ~ "yes",
            !sampled | is.na(sampled) ~ "no"
        )
    ) %>% 
    ungroup() %>% 
    arrange(date)
#> # A tibble: 9 x 3
#>   Sample.Type date       sampled
#>   <chr>       <date>     <chr>  
#> 1 A           2020-10-05 no     
#> 2 B           2020-10-05 no     
#> 3 A           2020-10-06 yes    
#> 4 B           2020-10-06 yes    
#> 5 A           2020-10-11 yes    
#> 6 A           2020-10-17 yes    
#> 7 B           2020-10-17 no     
#> 8 C           2020-10-17 no     
#> 9 C           2020-10-18 yes

reprex package (v2.0.0)

创建于 2021-06-01