lubridate::unique.Interval 这里发生了什么?
What is happening here in lubridate::unique.Interval?
改编自此处的示例数据:https://www.reddit.com/r/rstats/comments/4j2efe/help_counting_unique_days_in_r_with_overlap_and/
df = read.table(text = "Start End
1/8/2015 1/9/2015
1/8/2015 1/9/2015
1/13/2015 1/15/2015
1/7/2015 1/17/2015
1/12/2015 1/22/2015
1/8/2015 1/16/2015" , header = T)
创建间隔
df %>% transmute(Start = mdy(Start), End = mdy(End), Interval = interval(Start, End))
Start End Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
找到唯一的区间。这个间隔发生了什么? 2015-01-12 UTC--2015-01-22 UTC 没了。这是有意为之的行为吗?
.Last.value %>% select(Interval) %>% unique
Interval
1 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
6 2015-01-08 UTC--2015-01-16 UTC
2015-01-12 UTC--2015-01-22 UTC 被删除,因为它是 2015-01-07 UTC--2015-01-17 UTC 的重复案例,即使它们不是相同的对象但它们在 ==
运算符下彼此相等。
> intervalDf
Start End Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
> intervalDf[4,3]
[1] 2015-01-07 UTC--2015-01-17 UTC
> intervalDf[5,3]
[1] 2015-01-12 UTC--2015-01-22 UTC
> intervalDf[4,3] == intervalDf[5,3]
[1] TRUE
但是
> identical(intervalDf[4,3], intervalDf[5,3])
[1] FALSE
这也可能意味着unique
使用==
作为比较函数。如果要保留它们,可以将 Interval
列转换为字符,然后应用唯一函数。
更新:
unique
单列和多列数据框上的函数不一致。
> dfTest
x Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
2 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest)
x Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> dfTest1
Interval
1 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest1)
Interval
1 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
6 2015-01-08 UTC--2015-01-16 UTC
解释差异的两种方法定义。
> getAnywhere("unique.data.frame") A single object matching ‘unique.data.frame’ was found It was found in the following places package:base registered S3 method for unique from namespace base namespace:base with value
function (x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!identical(incomparables, FALSE))
.NotYetUsed("incomparables != FALSE")
x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] } <bytecode: 0x10c2ab0a0> <environment: namespace:base>
> getAnywhere("duplicated.data.frame") A single object matching ‘duplicated.data.frame’ was found It was found in the following places package:base registered S3 method for duplicated from namespace base namespace:base with value
function (x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!identical(incomparables, FALSE))
.NotYetUsed("incomparables != FALSE")
if (length(x) != 1L)
duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast)
else duplicated(x[[1L]], fromLast = fromLast, ...) } <bytecode: 0x10c33a4b0> <environment: namespace:base>
改编自此处的示例数据:https://www.reddit.com/r/rstats/comments/4j2efe/help_counting_unique_days_in_r_with_overlap_and/
df = read.table(text = "Start End
1/8/2015 1/9/2015
1/8/2015 1/9/2015
1/13/2015 1/15/2015
1/7/2015 1/17/2015
1/12/2015 1/22/2015
1/8/2015 1/16/2015" , header = T)
创建间隔
df %>% transmute(Start = mdy(Start), End = mdy(End), Interval = interval(Start, End))
Start End Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
找到唯一的区间。这个间隔发生了什么? 2015-01-12 UTC--2015-01-22 UTC 没了。这是有意为之的行为吗?
.Last.value %>% select(Interval) %>% unique
Interval
1 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
6 2015-01-08 UTC--2015-01-16 UTC
2015-01-12 UTC--2015-01-22 UTC 被删除,因为它是 2015-01-07 UTC--2015-01-17 UTC 的重复案例,即使它们不是相同的对象但它们在 ==
运算符下彼此相等。
> intervalDf
Start End Interval
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC
> intervalDf[4,3]
[1] 2015-01-07 UTC--2015-01-17 UTC
> intervalDf[5,3]
[1] 2015-01-12 UTC--2015-01-22 UTC
> intervalDf[4,3] == intervalDf[5,3]
[1] TRUE
但是
> identical(intervalDf[4,3], intervalDf[5,3])
[1] FALSE
这也可能意味着unique
使用==
作为比较函数。如果要保留它们,可以将 Interval
列转换为字符,然后应用唯一函数。
更新:
unique
单列和多列数据框上的函数不一致。
> dfTest
x Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
2 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest)
x Interval
1 1 2015-01-08 UTC--2015-01-09 UTC
3 1 2015-01-13 UTC--2015-01-15 UTC
4 1 2015-01-07 UTC--2015-01-17 UTC
5 1 2015-01-12 UTC--2015-01-22 UTC
6 1 2015-01-08 UTC--2015-01-16 UTC
> dfTest1
Interval
1 2015-01-08 UTC--2015-01-09 UTC
2 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
5 2015-01-12 UTC--2015-01-22 UTC
6 2015-01-08 UTC--2015-01-16 UTC
> unique(dfTest1)
Interval
1 2015-01-08 UTC--2015-01-09 UTC
3 2015-01-13 UTC--2015-01-15 UTC
4 2015-01-07 UTC--2015-01-17 UTC
6 2015-01-08 UTC--2015-01-16 UTC
解释差异的两种方法定义。
> getAnywhere("unique.data.frame") A single object matching ‘unique.data.frame’ was found It was found in the following places package:base registered S3 method for unique from namespace base namespace:base with value
function (x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!identical(incomparables, FALSE))
.NotYetUsed("incomparables != FALSE")
x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] } <bytecode: 0x10c2ab0a0> <environment: namespace:base>
> getAnywhere("duplicated.data.frame") A single object matching ‘duplicated.data.frame’ was found It was found in the following places package:base registered S3 method for duplicated from namespace base namespace:base with value
function (x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!identical(incomparables, FALSE))
.NotYetUsed("incomparables != FALSE")
if (length(x) != 1L)
duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast)
else duplicated(x[[1L]], fromLast = fromLast, ...) } <bytecode: 0x10c33a4b0> <environment: namespace:base>