将一系列区间与其自身进行比较
Compare a series of intervals with itself
对于一系列 Interval
s = pd.Series([
pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-05'), closed='both'),
pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-02'), closed='both'),
pd.Interval(left=pd.Timestamp('2020-01-04'), right=pd.Timestamp('2020-01-05'), closed='both'),
])
我想检查每个间隔对 - 如 外积 - 是否重叠。为此 Interval
提供了方法 overlaps()
.
结果应该是一个 l x l
matrix/data 帧,用于长度为 l
的系列,包含该对是否重叠。例如:
+--------------------------+--------------------------+--------------------------+--------------------------+
| | [2020-01-01, 2020-01-05] | [2020-01-01, 2020-01-02] | [2020-01-04, 2020-01-05] |
+--------------------------+--------------------------+--------------------------+--------------------------+
| [2020-01-01, 2020-01-05] | True | True | True |
+--------------------------+--------------------------+--------------------------+--------------------------+
| [2020-01-01, 2020-01-02] | True | True | False |
+--------------------------+--------------------------+--------------------------+--------------------------+
| [2020-01-04, 2020-01-05] | True | False | False |
+--------------------------+--------------------------+--------------------------+--------------------------+
因为这个系列相当大,我正在寻找一种比 itertuples()
性能更好、效率更高的方法。
你可以使用 pd.IntervalIndex
, to be able to get right
and left
bounds easily and use numpy ufunc.outer
with greater_equal
and less_equal
.
import numpy as np
#work with IntervalIndex
idx = pd.IntervalIndex(s)
#get right and left bounds
right = idx.right
left = idx.left
#create the boolean of True and False
arr = np.greater_equal.outer(right, left) & np.less_equal.outer(left, right)
#create the dataframe if needed
print (pd.DataFrame(arr, index=s.values, columns=s.values))
[2020-01-01, 2020-01-05] [2020-01-01, 2020-01-02] \
[2020-01-01, 2020-01-05] True True
[2020-01-01, 2020-01-02] True True
[2020-01-04, 2020-01-05] True False
[2020-01-04, 2020-01-05]
[2020-01-01, 2020-01-05] True
[2020-01-01, 2020-01-02] False
[2020-01-04, 2020-01-05] True
看来您也可以在 IntervalIndex 上使用 overlaps
并执行如下操作:
np.stack([idx.overlaps(interval) for interval in idx])
#or for dataframe
pd.DataFrame([idx.overlaps(interval) for interval in idx],
index=s.values, columns=s.values)
对于一系列 Interval
s = pd.Series([
pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-05'), closed='both'),
pd.Interval(left=pd.Timestamp('2020-01-01'), right=pd.Timestamp('2020-01-02'), closed='both'),
pd.Interval(left=pd.Timestamp('2020-01-04'), right=pd.Timestamp('2020-01-05'), closed='both'),
])
我想检查每个间隔对 - 如 外积 - 是否重叠。为此 Interval
提供了方法 overlaps()
.
结果应该是一个 l x l
matrix/data 帧,用于长度为 l
的系列,包含该对是否重叠。例如:
+--------------------------+--------------------------+--------------------------+--------------------------+ | | [2020-01-01, 2020-01-05] | [2020-01-01, 2020-01-02] | [2020-01-04, 2020-01-05] | +--------------------------+--------------------------+--------------------------+--------------------------+ | [2020-01-01, 2020-01-05] | True | True | True | +--------------------------+--------------------------+--------------------------+--------------------------+ | [2020-01-01, 2020-01-02] | True | True | False | +--------------------------+--------------------------+--------------------------+--------------------------+ | [2020-01-04, 2020-01-05] | True | False | False | +--------------------------+--------------------------+--------------------------+--------------------------+
因为这个系列相当大,我正在寻找一种比 itertuples()
性能更好、效率更高的方法。
你可以使用 pd.IntervalIndex
, to be able to get right
and left
bounds easily and use numpy ufunc.outer
with greater_equal
and less_equal
.
import numpy as np
#work with IntervalIndex
idx = pd.IntervalIndex(s)
#get right and left bounds
right = idx.right
left = idx.left
#create the boolean of True and False
arr = np.greater_equal.outer(right, left) & np.less_equal.outer(left, right)
#create the dataframe if needed
print (pd.DataFrame(arr, index=s.values, columns=s.values))
[2020-01-01, 2020-01-05] [2020-01-01, 2020-01-02] \
[2020-01-01, 2020-01-05] True True
[2020-01-01, 2020-01-02] True True
[2020-01-04, 2020-01-05] True False
[2020-01-04, 2020-01-05]
[2020-01-01, 2020-01-05] True
[2020-01-01, 2020-01-02] False
[2020-01-04, 2020-01-05] True
看来您也可以在 IntervalIndex 上使用 overlaps
并执行如下操作:
np.stack([idx.overlaps(interval) for interval in idx])
#or for dataframe
pd.DataFrame([idx.overlaps(interval) for interval in idx],
index=s.values, columns=s.values)