根据 R 中的多个条件创建开始和结束时间列(dplyr、lubridate)
Create start and endtime columns based on multiple conditions in R (dplyr, lubridate)
我有一个数据集,df
Read Box ID Time
T out 10/1/2019 9:00:01 AM
T out 10/1/2019 9:00:02 AM
T out 10/1/2019 9:00:03 AM
T out 10/1/2019 9:02:59 AM
T out 10/1/2019 9:03:00 AM
F 10/1/2019 9:05:00 AM
T out 10/1/2019 9:06:00 AM
T out 10/1/2019 9:06:02 AM
T in 10/1/2019 9:07:00 AM
T in 10/1/2019 9:07:02 AM
T out 10/1/2019 9:07:04 AM
T out 10/1/2019 9:07:05 AM
T out 10/1/2019 9:07:06 AM
hello 10/1/2019 9:07:08 AM
基于此数据集中的某些条件,我想创建一个开始时间列和一个结束时间列。
我想在发生以下情况时创建一个 'starttime':Read == "T"、Box == "out" 和 ID == ""
当这种情况的第一个实例发生时,将生成开始时间。例如,对于此数据集,开始时间将为 10/1/2019 9:00:01 AM,因为这是我们首先看到所需条件发生的地方(Read = T,Box = out 和 ID = "")
但是,当这些条件中的任何一个不成立时,就会创建结束时间。因此,第一个结束时间将发生在第 6 行之前,时间为 10/1/2019 9:03:00 AM。我的最终目标是为此创建一个持续时间列。
这是我想要的输出:
starttime endtime duration
10/01/2019 9:00:01 AM 10/01/2019 9:03:00 AM 179 secs
10/1/2019 9:06:00 AM 10/1/2019 9:06:02 AM 2 secs
10/1/2019 9:07:04 AM 10/1/2019 9:07:06 AM 2 secs
输出:
structure(list(Read = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 3L, 1L), .Label = c("", "F", "T", "T "), class = "factor"),
Box = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L,
3L, 3L, 3L, 1L), .Label = c("", "in", "out"), class = "factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L), .Label = c("", "hello"), class = "factor"),
Time = structure(1:14, .Label = c("10/1/2019 9:00:01 AM",
"10/1/2019 9:00:02 AM", "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM",
"10/1/2019 9:03:00 AM", "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM",
"10/1/2019 9:06:02 AM", "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM",
"10/1/2019 9:07:04 AM", "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM",
"10/1/2019 9:07:08 AM"), class = "factor")), class = "data.frame", row.names = c(NA,
-14L))
我认为总的来说,我必须创建一个循环。我相信我的思维过程是正确的,只是不确定如何制定代码。这就是我正在尝试的:
df2 <- mutate(df,
Date = lubridate::mdy_hms(Date))
for ( i in 2:nrow(df2))
{
if(df2$Read[[i]] == 'T')
}
我认为这可能是一个开始(只是将我的条件放在循环中,我不确定如何完成)
欢迎任何建议。
你可以不用循环就可以做到这一点。使用 dplyr
因为使用管道很容易做很多事情。
我们首先将Time
列转换为POSIXct
class,创建一个cond
列,根据我们要检查的条件给出逻辑值,创建一个列使用 cond
列的累积总和创建组。仅保留满足条件的行,并获得 Time
的 first
和 last
值以及它们之间的差异。
library(dplyr)
df %>%
mutate(Time = lubridate::mdy_hms(Time),
cond = Read == "T" & Box == "out" & ID == "",
grp = cumsum(!cond)) %>%
filter(cond) %>%
group_by(grp) %>%
summarise(starttime = first(Time),
endtime = last(Time),
duration = difftime(endtime, starttime, units = "secs")) %>%
select(-grp)
# A tibble: 3 x 3
# starttime endtime duration
# <dttm> <dttm> <drtn>
#1 2019-10-01 09:00:01 2019-10-01 09:03:00 179 secs
#2 2019-10-01 09:06:00 2019-10-01 09:06:02 2 secs
#3 2019-10-01 09:07:04 2019-10-01 09:07:06 2 secs
数据
我已经稍微清理了你的数据并将其用作 df
。
df <- structure(list(Read = c("T", "T", "T", "T", "T", "F", "T", "T",
"T", "T", "T", "T", "T", ""), Box = c("out", "out", "out", "out",
"out", "", "out", "out", "in", "in", "out", "out", "out", "hello"
), ID = c("", "", "", "", "", "", "", "", "", "", "", "", "",
""), Time = c("10/1/2019 9:00:01 AM", "10/1/2019 9:00:02 AM",
"10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", "10/1/2019 9:03:00 AM",
"10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", "10/1/2019 9:06:02 AM",
"10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", "10/1/2019 9:07:04 AM",
"10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", "10/1/2019 9:07:08 AM"
)), row.names = c(NA, -14L), class = "data.frame")
我有一个数据集,df
Read Box ID Time
T out 10/1/2019 9:00:01 AM
T out 10/1/2019 9:00:02 AM
T out 10/1/2019 9:00:03 AM
T out 10/1/2019 9:02:59 AM
T out 10/1/2019 9:03:00 AM
F 10/1/2019 9:05:00 AM
T out 10/1/2019 9:06:00 AM
T out 10/1/2019 9:06:02 AM
T in 10/1/2019 9:07:00 AM
T in 10/1/2019 9:07:02 AM
T out 10/1/2019 9:07:04 AM
T out 10/1/2019 9:07:05 AM
T out 10/1/2019 9:07:06 AM
hello 10/1/2019 9:07:08 AM
基于此数据集中的某些条件,我想创建一个开始时间列和一个结束时间列。 我想在发生以下情况时创建一个 'starttime':Read == "T"、Box == "out" 和 ID == "" 当这种情况的第一个实例发生时,将生成开始时间。例如,对于此数据集,开始时间将为 10/1/2019 9:00:01 AM,因为这是我们首先看到所需条件发生的地方(Read = T,Box = out 和 ID = "") 但是,当这些条件中的任何一个不成立时,就会创建结束时间。因此,第一个结束时间将发生在第 6 行之前,时间为 10/1/2019 9:03:00 AM。我的最终目标是为此创建一个持续时间列。
这是我想要的输出:
starttime endtime duration
10/01/2019 9:00:01 AM 10/01/2019 9:03:00 AM 179 secs
10/1/2019 9:06:00 AM 10/1/2019 9:06:02 AM 2 secs
10/1/2019 9:07:04 AM 10/1/2019 9:07:06 AM 2 secs
输出:
structure(list(Read = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 3L, 1L), .Label = c("", "F", "T", "T "), class = "factor"),
Box = structure(c(3L, 3L, 3L, 3L, 3L, 1L, 3L, 3L, 2L, 2L,
3L, 3L, 3L, 1L), .Label = c("", "in", "out"), class = "factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L), .Label = c("", "hello"), class = "factor"),
Time = structure(1:14, .Label = c("10/1/2019 9:00:01 AM",
"10/1/2019 9:00:02 AM", "10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM",
"10/1/2019 9:03:00 AM", "10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM",
"10/1/2019 9:06:02 AM", "10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM",
"10/1/2019 9:07:04 AM", "10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM",
"10/1/2019 9:07:08 AM"), class = "factor")), class = "data.frame", row.names = c(NA,
-14L))
我认为总的来说,我必须创建一个循环。我相信我的思维过程是正确的,只是不确定如何制定代码。这就是我正在尝试的:
df2 <- mutate(df,
Date = lubridate::mdy_hms(Date))
for ( i in 2:nrow(df2))
{
if(df2$Read[[i]] == 'T')
}
我认为这可能是一个开始(只是将我的条件放在循环中,我不确定如何完成)
欢迎任何建议。
你可以不用循环就可以做到这一点。使用 dplyr
因为使用管道很容易做很多事情。
我们首先将Time
列转换为POSIXct
class,创建一个cond
列,根据我们要检查的条件给出逻辑值,创建一个列使用 cond
列的累积总和创建组。仅保留满足条件的行,并获得 Time
的 first
和 last
值以及它们之间的差异。
library(dplyr)
df %>%
mutate(Time = lubridate::mdy_hms(Time),
cond = Read == "T" & Box == "out" & ID == "",
grp = cumsum(!cond)) %>%
filter(cond) %>%
group_by(grp) %>%
summarise(starttime = first(Time),
endtime = last(Time),
duration = difftime(endtime, starttime, units = "secs")) %>%
select(-grp)
# A tibble: 3 x 3
# starttime endtime duration
# <dttm> <dttm> <drtn>
#1 2019-10-01 09:00:01 2019-10-01 09:03:00 179 secs
#2 2019-10-01 09:06:00 2019-10-01 09:06:02 2 secs
#3 2019-10-01 09:07:04 2019-10-01 09:07:06 2 secs
数据
我已经稍微清理了你的数据并将其用作 df
。
df <- structure(list(Read = c("T", "T", "T", "T", "T", "F", "T", "T",
"T", "T", "T", "T", "T", ""), Box = c("out", "out", "out", "out",
"out", "", "out", "out", "in", "in", "out", "out", "out", "hello"
), ID = c("", "", "", "", "", "", "", "", "", "", "", "", "",
""), Time = c("10/1/2019 9:00:01 AM", "10/1/2019 9:00:02 AM",
"10/1/2019 9:00:03 AM", "10/1/2019 9:02:59 AM", "10/1/2019 9:03:00 AM",
"10/1/2019 9:05:00 AM", "10/1/2019 9:06:00 AM", "10/1/2019 9:06:02 AM",
"10/1/2019 9:07:00 AM", "10/1/2019 9:07:02 AM", "10/1/2019 9:07:04 AM",
"10/1/2019 9:07:05 AM", "10/1/2019 9:07:06 AM", "10/1/2019 9:07:08 AM"
)), row.names = c(NA, -14L), class = "data.frame")