R:确定每个日期间隔是否与数据框中的所有其他日期间隔重叠
R: Determine if each date interval overlaps with all other date intervals in a dataframe
对于我的数据框中的每个日期间隔行,我想确定它是否与所有其他日期间隔重叠。排除自身。
具有开始和结束日期的数据框,表示间隔:
`data <- read.table(header=TRUE,text="
start.date end.date
2019-09-01 2019-09-10
2019-09-05 2019-09-07
2019-08-25 2019-09-05
2019-10-10 2019-10-15
")`
此函数 lubridate::int_overlaps()
通过返回逻辑 TRUE 或 FALSE 来检查两个日期间隔是否重叠。
`int_overlaps(interval(ymd("2019-09-01"),ymd("2019-09-10")), interval(ymd("2019-09-05"), ymd("2019-09-07")))
[1] TRUE
int_overlaps(interval(ymd("2019-09-01"),ymd("2019-09-10")), interval(ymd("2019-10-10"), ymd("2019-10-15")))
[1] FALSE`
我想使用 int_overlap() 将每个日期间隔与除自身之外的所有其他日期间隔进行迭代,以确定它是否与其他日期重叠。
输出应如下所示:
`data <- read.table(header=TRUE,text="
start.date end.date overlaps
2019-09-01 2019-09-10 TRUE
2019-09-05 2019-09-07 TRUE
2019-08-25 2019-09-05 TRUE
2019-10-10 2019-10-15 FALSE
")
`
这是一个使用 dplyr
和 purrr
的选项,我们遍历 Int
的索引,将当前间隔与其他间隔进行比较。
library(dplyr)
library(purrr)
library(lubridate)
data %>% mutate(Int = interval(start.date, end.date),
overlaps = map(seq_along(Int), function(x){
#browser()
#Get all Int indexes other than the current one
y = setdiff(seq_along(Int), x)
#The interval overlaps with all other intervals
#return(all(int_overlaps(Int[x], Int[y])))
#The interval overlaps with any other intervals
return(any(int_overlaps(Int[x], Int[y])))
}))
start.date end.date Int overlaps
1 2019-09-01 2019-09-10 2019-09-01 UTC--2019-09-10 UTC TRUE
2 2019-09-05 2019-09-07 2019-09-05 UTC--2019-09-07 UTC TRUE
3 2019-08-25 2019-09-05 2019-08-25 UTC--2019-09-05 UTC TRUE
4 2019-10-10 2019-10-15 2019-10-10 UTC--2019-10-15 UTC FALSE
我认为这可以通过 dplyr 和 ivs 包的组合很好地完成,这是一个用于处理 interval vectors 的包,这就是你这里有
library(ivs)
library(dplyr)
data <- tribble(
~start.date, ~end.date,
"2019-09-01", "2019-09-10",
"2019-09-05", "2019-09-07",
"2019-08-25", "2019-09-05",
"2019-10-10", "2019-10-15"
)
# Parse the dates and then convert them into an interval vector
data <- data %>%
mutate(
start = as.Date(start.date),
end = as.Date(end.date),
.keep = "unused"
) %>%
mutate(interval = iv(start, end), .keep = "unused")
# Note that interval vectors are half-open! You may need to adjust your end
# dates by 1 depending on how you interpret them.
data
#> # A tibble: 4 × 1
#> interval
#> <iv<date>>
#> 1 [2019-09-01, 2019-09-10)
#> 2 [2019-09-05, 2019-09-07)
#> 3 [2019-08-25, 2019-09-05)
#> 4 [2019-10-10, 2019-10-15)
# Use `iv_identify_group()` to identify the wider "overlap group" that rows 1-3
# fall in, noting that row 4 gets its own group. Then it is just a matter of
# grouping by `groups` and checking if there is more than one value in each group
data %>%
mutate(groups = iv_identify_group(interval)) %>%
group_by(groups) %>%
mutate(overlaps = n() > 1)
#> # A tibble: 4 × 3
#> # Groups: groups [2]
#> interval groups overlaps
#> <iv<date>> <iv<date>> <lgl>
#> 1 [2019-09-01, 2019-09-10) [2019-08-25, 2019-09-10) TRUE
#> 2 [2019-09-05, 2019-09-07) [2019-08-25, 2019-09-10) TRUE
#> 3 [2019-08-25, 2019-09-05) [2019-08-25, 2019-09-10) TRUE
#> 4 [2019-10-10, 2019-10-15) [2019-10-10, 2019-10-15) FALSE
由 reprex package (v2.0.1)
于 2022-04-05 创建
对于我的数据框中的每个日期间隔行,我想确定它是否与所有其他日期间隔重叠。排除自身。
具有开始和结束日期的数据框,表示间隔:
`data <- read.table(header=TRUE,text="
start.date end.date
2019-09-01 2019-09-10
2019-09-05 2019-09-07
2019-08-25 2019-09-05
2019-10-10 2019-10-15
")`
此函数 lubridate::int_overlaps()
通过返回逻辑 TRUE 或 FALSE 来检查两个日期间隔是否重叠。
`int_overlaps(interval(ymd("2019-09-01"),ymd("2019-09-10")), interval(ymd("2019-09-05"), ymd("2019-09-07")))
[1] TRUE
int_overlaps(interval(ymd("2019-09-01"),ymd("2019-09-10")), interval(ymd("2019-10-10"), ymd("2019-10-15")))
[1] FALSE`
我想使用 int_overlap() 将每个日期间隔与除自身之外的所有其他日期间隔进行迭代,以确定它是否与其他日期重叠。
输出应如下所示:
`data <- read.table(header=TRUE,text="
start.date end.date overlaps
2019-09-01 2019-09-10 TRUE
2019-09-05 2019-09-07 TRUE
2019-08-25 2019-09-05 TRUE
2019-10-10 2019-10-15 FALSE
")
`
这是一个使用 dplyr
和 purrr
的选项,我们遍历 Int
的索引,将当前间隔与其他间隔进行比较。
library(dplyr)
library(purrr)
library(lubridate)
data %>% mutate(Int = interval(start.date, end.date),
overlaps = map(seq_along(Int), function(x){
#browser()
#Get all Int indexes other than the current one
y = setdiff(seq_along(Int), x)
#The interval overlaps with all other intervals
#return(all(int_overlaps(Int[x], Int[y])))
#The interval overlaps with any other intervals
return(any(int_overlaps(Int[x], Int[y])))
}))
start.date end.date Int overlaps
1 2019-09-01 2019-09-10 2019-09-01 UTC--2019-09-10 UTC TRUE
2 2019-09-05 2019-09-07 2019-09-05 UTC--2019-09-07 UTC TRUE
3 2019-08-25 2019-09-05 2019-08-25 UTC--2019-09-05 UTC TRUE
4 2019-10-10 2019-10-15 2019-10-10 UTC--2019-10-15 UTC FALSE
我认为这可以通过 dplyr 和 ivs 包的组合很好地完成,这是一个用于处理 interval vectors 的包,这就是你这里有
library(ivs)
library(dplyr)
data <- tribble(
~start.date, ~end.date,
"2019-09-01", "2019-09-10",
"2019-09-05", "2019-09-07",
"2019-08-25", "2019-09-05",
"2019-10-10", "2019-10-15"
)
# Parse the dates and then convert them into an interval vector
data <- data %>%
mutate(
start = as.Date(start.date),
end = as.Date(end.date),
.keep = "unused"
) %>%
mutate(interval = iv(start, end), .keep = "unused")
# Note that interval vectors are half-open! You may need to adjust your end
# dates by 1 depending on how you interpret them.
data
#> # A tibble: 4 × 1
#> interval
#> <iv<date>>
#> 1 [2019-09-01, 2019-09-10)
#> 2 [2019-09-05, 2019-09-07)
#> 3 [2019-08-25, 2019-09-05)
#> 4 [2019-10-10, 2019-10-15)
# Use `iv_identify_group()` to identify the wider "overlap group" that rows 1-3
# fall in, noting that row 4 gets its own group. Then it is just a matter of
# grouping by `groups` and checking if there is more than one value in each group
data %>%
mutate(groups = iv_identify_group(interval)) %>%
group_by(groups) %>%
mutate(overlaps = n() > 1)
#> # A tibble: 4 × 3
#> # Groups: groups [2]
#> interval groups overlaps
#> <iv<date>> <iv<date>> <lgl>
#> 1 [2019-09-01, 2019-09-10) [2019-08-25, 2019-09-10) TRUE
#> 2 [2019-09-05, 2019-09-07) [2019-08-25, 2019-09-10) TRUE
#> 3 [2019-08-25, 2019-09-05) [2019-08-25, 2019-09-10) TRUE
#> 4 [2019-10-10, 2019-10-15) [2019-10-10, 2019-10-15) FALSE
由 reprex package (v2.0.1)
于 2022-04-05 创建