为采样天数创建一个列，例如0、10、30 天，每个研究区域从 0 天开始？

Question

我喜欢为物种数据创建一些采样工作量曲线。哪里有几个研究区域，有多个抽样地块，在特定时间段内重新抽样。我的数据集看起来与这个相似：

    df1 <- data.frame(PlotID = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E"),
                  species = c("x","x","x1","x","x1","x2","x1","x3","x4","x4","x5","x5","x","x3","x","x3","x3","x4","x5","x","x1","x2","x3"),
                  date = as.Date(c("27-04-1995",    "26-05-1995",   "02-08-1995",   "02-05-1995",   "28-09-1995",   "02-08-1994",   "31-05-1995",   "27-07-1995",   "06-12-1995",   "03-05-1996",   "27-04-1995",   "31-05-1995",   "29-06-1994",   "30-08-1995",   "26-05-1994",   "30-05-1995",   "30-06-1995",   "30-06-1995",   "30-06-1995",   "30-08-1995",   "31-08-1995",   "01-09-1995","02-09-1995"),'%d-%m-%Y'),
                  area= c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C"))

我真的想要一个能给我额外的采样时间列的输出，例如整个数据框为 0、10 天、30 天，但每个区域的时间应从 0 开始。我试过这个：

effort<-df1%>% arrange(PlotID, date,species) %>% group_by(area) %>%
  mutate(diffDate = difftime(date, lag(date,1))) %>% ungroup()

但不知何故我的代码产生了废话？能请教一下吗？

T 最后我想实现类似下面这个例子的东西。每个研究区域的矩阵列表，以物种为行，但不以抽样地块为列，而是时间（以天为单位，显示增加的抽样工作量）。该示例显示了 iNEXT 包中的数据集。但我一直坚持为采样 dates.For 之间的每个区域计算采样天数，现在我只想让这个额外的列显示每个区域采样事件与发现物种之间的天数。我希望现在更清楚一点？

编辑：这是我真实数据集中的日期：

output from dput(head(my.data))
date= structure(c(801878400, 798940800, 780710400, 769910400, 775785600, 798940800), class = c("POSIXct", "POSIXt"), tzone = "UTC")

Answer 1

我用 for 循环解决了它

areas <- unique(df1$area)

df1$diffdate <- 0
for (i in 1:length(areas)){
  df1$diffdate[df1$area == areas[i]] <- df1$date[df1$area == areas[i]] - min(df1$date[df1$area == areas[i]])
}

Answer 2

一个可能的tidyverse解决方案是

library(dplyr)

df1 %>% arrange(area, date) %>% 
  group_by(area)  %>% 
  mutate(diff_date_from_start = date - min(date), 
         diff_date_from_prev = date - lag(date))
#> # A tibble: 23 x 6
#> # Groups:   area [3]
#>    PlotID species date       area  diff_date_from_start diff_date_from_prev
#>    <chr>  <chr>   <date>     <chr> <drtn>               <drtn>             
#>  1 B      x2      1994-08-02 A       0 days              NA days           
#>  2 A      x       1995-04-27 A     268 days             268 days           
#>  3 A      x       1995-05-02 A     273 days               5 days           
#>  4 A      x       1995-05-26 A     297 days              24 days           
#>  5 B      x1      1995-05-31 A     302 days               5 days           
#>  6 B      x3      1995-07-27 A     359 days              57 days           
#>  7 A      x1      1995-08-02 A     365 days               6 days           
#>  8 A      x1      1995-09-28 A     422 days              57 days           
#>  9 B      x4      1995-12-06 A     491 days              69 days           
#> 10 B      x4      1996-05-03 A     640 days             149 days           
#> # … with 13 more rows

如果您也按其他变量分组，diff_date_from_prev 变量可能更有意义，例如 species 和 PlotID.
diff_date_from_prev 计算每个区域的当前样本与第一个样本之间的天数差异。

编辑以回复评论：

您的 date 存储为 POSIX 而不是日期 class。如果时区不相关，我发现使用 Date 更容易，因此一种选择是转换为日期 as.Date()，然后应用前面所述的操作。或者，您可以按照@Rui Barradas 在评论中的建议使用 difftime() 函数并相应地指定单位。

df1 <- data.frame(PlotID = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E"),
                  species = c("x","x","x1","x","x1","x2","x1","x3","x4","x4","x5","x5","x","x3","x","x3","x3","x4","x5","x","x1","x2","x3"),
                  # date as posix not as date. they are different data classs.  
                  date = as.POSIXct(c("27-04-1995",    "26-05-1995",   "02-08-1995",   "02-05-1995",   "28-09-1995",   "02-08-1994",   "31-05-1995",   "27-07-1995",   "06-12-1995",   "03-05-1996",   "27-04-1995",   "31-05-1995",   "29-06-1994",   "30-08-1995",   "26-05-1994",   "30-05-1995",   "30-06-1995",   "30-06-1995",   "30-06-1995",   "30-08-1995",   "31-08-1995",   "01-09-1995","02-09-1995"),'%d-%m-%Y'),
                  area= c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C"))


library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df1 %>% arrange(area, date) %>% 
  group_by(area)  %>% 
  mutate(
    date = as.Date(date),
    diff_date_from_start = date - min(date)
  )
#> # A tibble: 23 x 6
#> # Groups:   area [3]
#>    PlotID species date       area  diff_date_from_start diff_date_time
#>    <chr>  <chr>   <date>     <chr> <drtn>               <drtn>        
#>  1 A      x       2-05-19    A         0 days               0 days    
#>  2 A      x1      2-08-19    A        92 days              92 days    
#>  3 B      x2      2-08-19    A        92 days              92 days    
#>  4 B      x4      3-05-19    A       365 days             365 days    
#>  5 B      x4      6-12-19    A      1675 days            1675 days    
#>  6 A      x       26-05-19   A      8766 days            8766 days    
#>  7 A      x       27-04-19   A      9101 days            9101 days    
#>  8 B      x3      27-07-19   A      9192 days            9192 days    
#>  9 A      x1      28-09-19   A      9620 days            9620 days    
#> 10 B      x1      31-05-19   A     10592 days           10592 days    
#> # … with 13 more rows

# or as suggested by Rui Barradas. you can use difftime function and keep you date as a POSIX class
df1 %>% arrange(area, date) %>% 
  group_by(area)  %>% 
  mutate(
    diff_date_time = difftime(date, min(date), unit = "days")
  )
#> # A tibble: 23 x 5
#> # Groups:   area [3]
#>    PlotID species date                area  diff_date_time
#>    <chr>  <chr>   <dttm>              <chr> <drtn>        
#>  1 A      x       2-05-19 00:00:00    A         0 days    
#>  2 A      x1      2-08-19 00:00:00    A        92 days    
#>  3 B      x2      2-08-19 00:00:00    A        92 days    
#>  4 B      x4      3-05-19 00:00:00    A       365 days    
#>  5 B      x4      6-12-19 00:00:00    A      1675 days    
#>  6 A      x       26-05-19 00:00:00   A      8766 days    
#>  7 A      x       27-04-19 00:00:00   A      9101 days    
#>  8 B      x3      27-07-19 00:00:00   A      9192 days    
#>  9 A      x1      28-09-19 00:00:00   A      9620 days    
#> 10 B      x1      31-05-19 00:00:00   A     10592 days    
#> # … with 13 more rows

^{由 reprex package (v2.0.0)}

于 2021 年 6 月 13 日创建

Answer 3

您是否希望每组 area 的日期序列按 10 天计算？

library(dplyr)
library(tidyr)

df1 %>% 
  arrange(PlotID, date, species) %>% 
  group_by(area) %>%
  complete(date = full_seq(date, 1)) %>%
  mutate(species = zoo::na.locf(species),
         PlotID = zoo::na.locf(PlotID),
         diffDate = 10*as.integer(date - first(date)) %/% 10) %>%
  ungroup() %>%
  group_by(diffDate) %>%
  filter(row_number() == 1)
## A tibble: 65 x 5
## Groups:   diffDate [65]
#   area  date       PlotID species diffDate
#   <chr> <date>     <chr>  <chr>      <dbl>
# 1 A     1994-08-02 B      x2             0
# 2 A     1994-08-12 B      x2            10
# 3 A     1994-08-22 B      x2            20
# 4 A     1994-09-01 B      x2            30
# 5 A     1994-09-11 B      x2            40
# 6 A     1994-09-21 B      x2            50
# 7 A     1994-10-01 B      x2            60
# 8 A     1994-10-11 B      x2            70
# 9 A     1994-10-21 B      x2            80
#10 A     1994-10-31 B      x2            90
## … with 55 more rows

为采样天数创建一个列，例如0、10、30 天，每个研究区域从 0 天开始？

Create a column for days sampled e.g. 0,10,30 days,starting with 0 days for every study area?

r

date

tidyr

编辑以回复评论：