为采样天数创建一个列,例如0、10、30 天,每个研究区域从 0 天开始?
Create a column for days sampled e.g. 0,10,30 days,starting with 0 days for every study area?
我喜欢为物种数据创建一些采样工作量曲线。哪里有几个研究区域,有多个抽样地块,在特定时间段内重新抽样。我的数据集看起来与这个相似:
df1 <- data.frame(PlotID = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E"),
species = c("x","x","x1","x","x1","x2","x1","x3","x4","x4","x5","x5","x","x3","x","x3","x3","x4","x5","x","x1","x2","x3"),
date = as.Date(c("27-04-1995", "26-05-1995", "02-08-1995", "02-05-1995", "28-09-1995", "02-08-1994", "31-05-1995", "27-07-1995", "06-12-1995", "03-05-1996", "27-04-1995", "31-05-1995", "29-06-1994", "30-08-1995", "26-05-1994", "30-05-1995", "30-06-1995", "30-06-1995", "30-06-1995", "30-08-1995", "31-08-1995", "01-09-1995","02-09-1995"),'%d-%m-%Y'),
area= c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C"))
我真的想要一个能给我额外的采样时间列的输出,例如整个数据框为 0、10 天、30 天,但每个区域的时间应从 0 开始。我试过这个:
effort<-df1%>% arrange(PlotID, date,species) %>% group_by(area) %>%
mutate(diffDate = difftime(date, lag(date,1))) %>% ungroup()
但不知何故我的代码产生了废话?
能请教一下吗?
T 最后我想实现类似下面这个例子的东西。每个研究区域的矩阵列表,以物种为行,但不以抽样地块为列,而是时间(以天为单位,显示增加的抽样工作量)。该示例显示了 iNEXT 包中的数据集。但我一直坚持为采样 dates.For 之间的每个区域计算采样天数,现在我只想让这个额外的列显示每个区域采样事件与发现物种之间的天数。我希望现在更清楚一点?
编辑:这是我真实数据集中的日期:
output from dput(head(my.data))
date= structure(c(801878400, 798940800, 780710400, 769910400, 775785600, 798940800), class = c("POSIXct", "POSIXt"), tzone = "UTC")
我用 for
循环解决了它
areas <- unique(df1$area)
df1$diffdate <- 0
for (i in 1:length(areas)){
df1$diffdate[df1$area == areas[i]] <- df1$date[df1$area == areas[i]] - min(df1$date[df1$area == areas[i]])
}
一个可能的tidyverse
解决方案是
library(dplyr)
df1 %>% arrange(area, date) %>%
group_by(area) %>%
mutate(diff_date_from_start = date - min(date),
diff_date_from_prev = date - lag(date))
#> # A tibble: 23 x 6
#> # Groups: area [3]
#> PlotID species date area diff_date_from_start diff_date_from_prev
#> <chr> <chr> <date> <chr> <drtn> <drtn>
#> 1 B x2 1994-08-02 A 0 days NA days
#> 2 A x 1995-04-27 A 268 days 268 days
#> 3 A x 1995-05-02 A 273 days 5 days
#> 4 A x 1995-05-26 A 297 days 24 days
#> 5 B x1 1995-05-31 A 302 days 5 days
#> 6 B x3 1995-07-27 A 359 days 57 days
#> 7 A x1 1995-08-02 A 365 days 6 days
#> 8 A x1 1995-09-28 A 422 days 57 days
#> 9 B x4 1995-12-06 A 491 days 69 days
#> 10 B x4 1996-05-03 A 640 days 149 days
#> # … with 13 more rows
- 如果您也按其他变量分组,
diff_date_from_prev
变量可能更有意义,例如 species
和 PlotID
.
diff_date_from_prev
计算每个区域的当前样本与第一个样本之间的天数差异。
编辑以回复评论:
您的 date
存储为 POSIX 而不是日期 class。如果时区不相关,我发现使用 Date
更容易,因此一种选择是转换为日期 as.Date()
,然后应用前面所述的操作。或者,您可以按照@Rui Barradas 在评论中的建议使用 difftime()
函数并相应地指定单位。
df1 <- data.frame(PlotID = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E"),
species = c("x","x","x1","x","x1","x2","x1","x3","x4","x4","x5","x5","x","x3","x","x3","x3","x4","x5","x","x1","x2","x3"),
# date as posix not as date. they are different data classs.
date = as.POSIXct(c("27-04-1995", "26-05-1995", "02-08-1995", "02-05-1995", "28-09-1995", "02-08-1994", "31-05-1995", "27-07-1995", "06-12-1995", "03-05-1996", "27-04-1995", "31-05-1995", "29-06-1994", "30-08-1995", "26-05-1994", "30-05-1995", "30-06-1995", "30-06-1995", "30-06-1995", "30-08-1995", "31-08-1995", "01-09-1995","02-09-1995"),'%d-%m-%Y'),
area= c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C"))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 %>% arrange(area, date) %>%
group_by(area) %>%
mutate(
date = as.Date(date),
diff_date_from_start = date - min(date)
)
#> # A tibble: 23 x 6
#> # Groups: area [3]
#> PlotID species date area diff_date_from_start diff_date_time
#> <chr> <chr> <date> <chr> <drtn> <drtn>
#> 1 A x 2-05-19 A 0 days 0 days
#> 2 A x1 2-08-19 A 92 days 92 days
#> 3 B x2 2-08-19 A 92 days 92 days
#> 4 B x4 3-05-19 A 365 days 365 days
#> 5 B x4 6-12-19 A 1675 days 1675 days
#> 6 A x 26-05-19 A 8766 days 8766 days
#> 7 A x 27-04-19 A 9101 days 9101 days
#> 8 B x3 27-07-19 A 9192 days 9192 days
#> 9 A x1 28-09-19 A 9620 days 9620 days
#> 10 B x1 31-05-19 A 10592 days 10592 days
#> # … with 13 more rows
# or as suggested by Rui Barradas. you can use difftime function and keep you date as a POSIX class
df1 %>% arrange(area, date) %>%
group_by(area) %>%
mutate(
diff_date_time = difftime(date, min(date), unit = "days")
)
#> # A tibble: 23 x 5
#> # Groups: area [3]
#> PlotID species date area diff_date_time
#> <chr> <chr> <dttm> <chr> <drtn>
#> 1 A x 2-05-19 00:00:00 A 0 days
#> 2 A x1 2-08-19 00:00:00 A 92 days
#> 3 B x2 2-08-19 00:00:00 A 92 days
#> 4 B x4 3-05-19 00:00:00 A 365 days
#> 5 B x4 6-12-19 00:00:00 A 1675 days
#> 6 A x 26-05-19 00:00:00 A 8766 days
#> 7 A x 27-04-19 00:00:00 A 9101 days
#> 8 B x3 27-07-19 00:00:00 A 9192 days
#> 9 A x1 28-09-19 00:00:00 A 9620 days
#> 10 B x1 31-05-19 00:00:00 A 10592 days
#> # … with 13 more rows
由 reprex package (v2.0.0)
于 2021 年 6 月 13 日创建
您是否希望每组 area
的日期序列按 10 天计算?
library(dplyr)
library(tidyr)
df1 %>%
arrange(PlotID, date, species) %>%
group_by(area) %>%
complete(date = full_seq(date, 1)) %>%
mutate(species = zoo::na.locf(species),
PlotID = zoo::na.locf(PlotID),
diffDate = 10*as.integer(date - first(date)) %/% 10) %>%
ungroup() %>%
group_by(diffDate) %>%
filter(row_number() == 1)
## A tibble: 65 x 5
## Groups: diffDate [65]
# area date PlotID species diffDate
# <chr> <date> <chr> <chr> <dbl>
# 1 A 1994-08-02 B x2 0
# 2 A 1994-08-12 B x2 10
# 3 A 1994-08-22 B x2 20
# 4 A 1994-09-01 B x2 30
# 5 A 1994-09-11 B x2 40
# 6 A 1994-09-21 B x2 50
# 7 A 1994-10-01 B x2 60
# 8 A 1994-10-11 B x2 70
# 9 A 1994-10-21 B x2 80
#10 A 1994-10-31 B x2 90
## … with 55 more rows
我喜欢为物种数据创建一些采样工作量曲线。哪里有几个研究区域,有多个抽样地块,在特定时间段内重新抽样。我的数据集看起来与这个相似:
df1 <- data.frame(PlotID = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E"),
species = c("x","x","x1","x","x1","x2","x1","x3","x4","x4","x5","x5","x","x3","x","x3","x3","x4","x5","x","x1","x2","x3"),
date = as.Date(c("27-04-1995", "26-05-1995", "02-08-1995", "02-05-1995", "28-09-1995", "02-08-1994", "31-05-1995", "27-07-1995", "06-12-1995", "03-05-1996", "27-04-1995", "31-05-1995", "29-06-1994", "30-08-1995", "26-05-1994", "30-05-1995", "30-06-1995", "30-06-1995", "30-06-1995", "30-08-1995", "31-08-1995", "01-09-1995","02-09-1995"),'%d-%m-%Y'),
area= c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C"))
我真的想要一个能给我额外的采样时间列的输出,例如整个数据框为 0、10 天、30 天,但每个区域的时间应从 0 开始。我试过这个:
effort<-df1%>% arrange(PlotID, date,species) %>% group_by(area) %>%
mutate(diffDate = difftime(date, lag(date,1))) %>% ungroup()
但不知何故我的代码产生了废话? 能请教一下吗?
T 最后我想实现类似下面这个例子的东西。每个研究区域的矩阵列表,以物种为行,但不以抽样地块为列,而是时间(以天为单位,显示增加的抽样工作量)。该示例显示了 iNEXT 包中的数据集。但我一直坚持为采样 dates.For 之间的每个区域计算采样天数,现在我只想让这个额外的列显示每个区域采样事件与发现物种之间的天数。我希望现在更清楚一点?
编辑:这是我真实数据集中的日期:
output from dput(head(my.data))
date= structure(c(801878400, 798940800, 780710400, 769910400, 775785600, 798940800), class = c("POSIXct", "POSIXt"), tzone = "UTC")
我用 for
循环解决了它
areas <- unique(df1$area)
df1$diffdate <- 0
for (i in 1:length(areas)){
df1$diffdate[df1$area == areas[i]] <- df1$date[df1$area == areas[i]] - min(df1$date[df1$area == areas[i]])
}
一个可能的tidyverse
解决方案是
library(dplyr)
df1 %>% arrange(area, date) %>%
group_by(area) %>%
mutate(diff_date_from_start = date - min(date),
diff_date_from_prev = date - lag(date))
#> # A tibble: 23 x 6
#> # Groups: area [3]
#> PlotID species date area diff_date_from_start diff_date_from_prev
#> <chr> <chr> <date> <chr> <drtn> <drtn>
#> 1 B x2 1994-08-02 A 0 days NA days
#> 2 A x 1995-04-27 A 268 days 268 days
#> 3 A x 1995-05-02 A 273 days 5 days
#> 4 A x 1995-05-26 A 297 days 24 days
#> 5 B x1 1995-05-31 A 302 days 5 days
#> 6 B x3 1995-07-27 A 359 days 57 days
#> 7 A x1 1995-08-02 A 365 days 6 days
#> 8 A x1 1995-09-28 A 422 days 57 days
#> 9 B x4 1995-12-06 A 491 days 69 days
#> 10 B x4 1996-05-03 A 640 days 149 days
#> # … with 13 more rows
- 如果您也按其他变量分组,
diff_date_from_prev
变量可能更有意义,例如species
和PlotID
. diff_date_from_prev
计算每个区域的当前样本与第一个样本之间的天数差异。
编辑以回复评论:
您的 date
存储为 POSIX 而不是日期 class。如果时区不相关,我发现使用 Date
更容易,因此一种选择是转换为日期 as.Date()
,然后应用前面所述的操作。或者,您可以按照@Rui Barradas 在评论中的建议使用 difftime()
函数并相应地指定单位。
df1 <- data.frame(PlotID = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D","E","E","E"),
species = c("x","x","x1","x","x1","x2","x1","x3","x4","x4","x5","x5","x","x3","x","x3","x3","x4","x5","x","x1","x2","x3"),
# date as posix not as date. they are different data classs.
date = as.POSIXct(c("27-04-1995", "26-05-1995", "02-08-1995", "02-05-1995", "28-09-1995", "02-08-1994", "31-05-1995", "27-07-1995", "06-12-1995", "03-05-1996", "27-04-1995", "31-05-1995", "29-06-1994", "30-08-1995", "26-05-1994", "30-05-1995", "30-06-1995", "30-06-1995", "30-06-1995", "30-08-1995", "31-08-1995", "01-09-1995","02-09-1995"),'%d-%m-%Y'),
area= c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C"))
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df1 %>% arrange(area, date) %>%
group_by(area) %>%
mutate(
date = as.Date(date),
diff_date_from_start = date - min(date)
)
#> # A tibble: 23 x 6
#> # Groups: area [3]
#> PlotID species date area diff_date_from_start diff_date_time
#> <chr> <chr> <date> <chr> <drtn> <drtn>
#> 1 A x 2-05-19 A 0 days 0 days
#> 2 A x1 2-08-19 A 92 days 92 days
#> 3 B x2 2-08-19 A 92 days 92 days
#> 4 B x4 3-05-19 A 365 days 365 days
#> 5 B x4 6-12-19 A 1675 days 1675 days
#> 6 A x 26-05-19 A 8766 days 8766 days
#> 7 A x 27-04-19 A 9101 days 9101 days
#> 8 B x3 27-07-19 A 9192 days 9192 days
#> 9 A x1 28-09-19 A 9620 days 9620 days
#> 10 B x1 31-05-19 A 10592 days 10592 days
#> # … with 13 more rows
# or as suggested by Rui Barradas. you can use difftime function and keep you date as a POSIX class
df1 %>% arrange(area, date) %>%
group_by(area) %>%
mutate(
diff_date_time = difftime(date, min(date), unit = "days")
)
#> # A tibble: 23 x 5
#> # Groups: area [3]
#> PlotID species date area diff_date_time
#> <chr> <chr> <dttm> <chr> <drtn>
#> 1 A x 2-05-19 00:00:00 A 0 days
#> 2 A x1 2-08-19 00:00:00 A 92 days
#> 3 B x2 2-08-19 00:00:00 A 92 days
#> 4 B x4 3-05-19 00:00:00 A 365 days
#> 5 B x4 6-12-19 00:00:00 A 1675 days
#> 6 A x 26-05-19 00:00:00 A 8766 days
#> 7 A x 27-04-19 00:00:00 A 9101 days
#> 8 B x3 27-07-19 00:00:00 A 9192 days
#> 9 A x1 28-09-19 00:00:00 A 9620 days
#> 10 B x1 31-05-19 00:00:00 A 10592 days
#> # … with 13 more rows
由 reprex package (v2.0.0)
于 2021 年 6 月 13 日创建您是否希望每组 area
的日期序列按 10 天计算?
library(dplyr)
library(tidyr)
df1 %>%
arrange(PlotID, date, species) %>%
group_by(area) %>%
complete(date = full_seq(date, 1)) %>%
mutate(species = zoo::na.locf(species),
PlotID = zoo::na.locf(PlotID),
diffDate = 10*as.integer(date - first(date)) %/% 10) %>%
ungroup() %>%
group_by(diffDate) %>%
filter(row_number() == 1)
## A tibble: 65 x 5
## Groups: diffDate [65]
# area date PlotID species diffDate
# <chr> <date> <chr> <chr> <dbl>
# 1 A 1994-08-02 B x2 0
# 2 A 1994-08-12 B x2 10
# 3 A 1994-08-22 B x2 20
# 4 A 1994-09-01 B x2 30
# 5 A 1994-09-11 B x2 40
# 6 A 1994-09-21 B x2 50
# 7 A 1994-10-01 B x2 60
# 8 A 1994-10-11 B x2 70
# 9 A 1994-10-21 B x2 80
#10 A 1994-10-31 B x2 90
## … with 55 more rows