在 Python 列表中查找重叠的元组并打乱它们
Finding overlapping tuples in a Python list and shuffling them
我在 Python3.x 中有以下元组列表,其中每个元组由两个整数组成,格式为 (start, end)
:
list_tuple = [(20, 35), (125, 145), (156, 178), (211, 233), (220, 321),
(227, 234), (230, 231), (472, 498), (4765, 8971)]
## list already sorted except for last tuple
这些元组作为沿实线的间隔,例如(1,10)
是1到10的区间。
我可以通过三种方式对该元组进行排序,可以单独按第一个元素排序,单独按第二个元素排序,或者按第一个和第二个元素排序。
仅按第一个元素排序:
sorted_by_first = sorted(list_tuple, key=lambda element: (element[0]) ) ## (first_element, second_element)
输出
print(sorted_by_first)
[(20, 35), (125, 145), (156, 178), (211, 233), (220, 321), (227, 234), (230, 231), (472, 498), (4765, 8971)]
并根据第二个元素排序:
sorted_by_second = sorted(list_tuple, key=lambda element: (element[1]) )
输出
print(sorted_by_second)
[(20, 35), (125, 145), (156, 178), (230, 231), (211, 233), (227, 234), (220, 321), (472, 498), (4765, 8971)]
并且对于两者:
sorted_by_both = sorted(list_tuple, key=lambda element: (element[0], element[1]) )
输出
print(sorted_by_both)
[(20, 35), (125, 145), (156, 178), (211, 233), (220, 321), (227, 234), (230, 231), (472, 498), (4765, 8971), ...]
请注意,这些排序的输出中的每一个都按不同的顺序排列。那些顺序不同的元组是 "overlapping intervals",例如(227, 234)
应该放在 (230, 231)
之前或之后,因为这些间隔重叠。
我的目标是创建一个函数,它 (1) 在排序后的输出中搜索 "overlapping intervals" 和 (2) 然后在它们之间随机排列。
我能想到一个函数输出与给定元组重叠的所有元组,例如
def find_overlaps(input_tuple_list, search_interval):
results = []
for tup in input_tuple_list:
if ((tup[0] >= search_interval[0] and tup[0] <= search_interval[1]) or (tup[1] >= search_interval[0] and tup[1] <= search_interval[1])):
results.append(tup)
return results
工作原理如下
foo = (130, 150)
overlapping_foo = find_overlaps(list_tuple, foo)
print(overlapping_foo)
[(125, 145)]
但是,为了实现目标 (1),我需要编写一个函数来查找 list_tuple
.
中所有重叠的元组
我试过的:我原本以为我可以用它自己搜索原始元组,例如
total_overlaps = []
for tupp in list_tuple:
total_overlaps.append(find_overlaps(list_tuple, tupp))
这显然是错误的,因为输出是原始元组本身。
更大的问题是我看不到如何执行目标 (2)。我必须只 shuffle/re-order 个相互重叠的元组。假设我有一个从 (1) 中找到的重叠元组列表:
overlap_list = [(211, 233), (220, 321), (227, 234), (230, 231), (6491, 7000), (6800, 7200)]
以下列表理解失败
from random import shuffle
reordered = [shuffle(tupp) for tupp in overlap_list]
给予
TypeError: 'tuple' object does not support item assignment
同样重要的是,我不会将 (6491, 7000)
与 (211, 233)
混用,因为它们不相关。
如何找到元组列表中的重叠区间,然后分别打乱这些相互重叠的元组。
请注意,我很确定我了解您对洗牌的要求。但是您可以使用 itertools
配方 pairwise
将元素配对,然后使用 itertools.groupby()
对顺序重叠进行分组,即从 (6491, 7000)
中拆分 (211, 233)
:
import itertools as it
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = it.tee(iterable)
next(b, None)
return zip(a, b)
>>> overlap_list = [(211, 233), (220, 321), (227, 234), (230, 231), (6491, 7000), (6800, 7200)]
>>> [list(p) for k, p in it.groupby(pairwise(overlap_list), lambda x: x[0][0] < x[1][0] < x[0][1]) if k]
[[((211, 233), (220, 321)), ((220, 321), (227, 234)), ((227, 234), (230, 231))],
[((6491, 7000), (6800, 7200))]]
您可以 unpairwise
这些列表:
def unpairwise(iterable):
a, b = zip(*iterable)
yield a[0]
yield from b
所以:
>>> [list(unpairwise(p)) for k, p in it.groupby(pairwise(overlap_list), lambda x: x[0][0] < x[1][0] < x[0][1]) if k]
[[(211, 233), (220, 321), (227, 234), (230, 231)], [(6491, 7000), (6800, 7200)]]
扩展 的答案,应该很容易打乱重叠元组列表的列表以获得您想要的内容:
>>> overlaps = [[(211, 233), (220, 321), (227, 234), (230, 231)], [(6491, 7000), (6800, 7200)]]
>>> for x in overlaps:
... random.shuffle(x)
...
>>> overlaps
[[(227, 234), (230, 231), (220, 321), (211, 233)], [(6491, 7000), (6800, 7200)]]
>>> for x in overlaps:
... random.shuffle(x)
...
>>> overlaps
[[(220, 321), (227, 234), (230, 231), (211, 233)], [(6491, 7000), (6800, 7200)]]
>>> for x in overlaps:
... random.shuffle(x)
...
>>> overlaps
[[(227, 234), (211, 233), (220, 321), (230, 231)], [(6800, 7200), (6491, 7000)]]
注意random.shuffle
到位了
我在 Python3.x 中有以下元组列表,其中每个元组由两个整数组成,格式为 (start, end)
:
list_tuple = [(20, 35), (125, 145), (156, 178), (211, 233), (220, 321),
(227, 234), (230, 231), (472, 498), (4765, 8971)]
## list already sorted except for last tuple
这些元组作为沿实线的间隔,例如(1,10)
是1到10的区间。
我可以通过三种方式对该元组进行排序,可以单独按第一个元素排序,单独按第二个元素排序,或者按第一个和第二个元素排序。
仅按第一个元素排序:
sorted_by_first = sorted(list_tuple, key=lambda element: (element[0]) ) ## (first_element, second_element)
输出
print(sorted_by_first)
[(20, 35), (125, 145), (156, 178), (211, 233), (220, 321), (227, 234), (230, 231), (472, 498), (4765, 8971)]
并根据第二个元素排序:
sorted_by_second = sorted(list_tuple, key=lambda element: (element[1]) )
输出
print(sorted_by_second)
[(20, 35), (125, 145), (156, 178), (230, 231), (211, 233), (227, 234), (220, 321), (472, 498), (4765, 8971)]
并且对于两者:
sorted_by_both = sorted(list_tuple, key=lambda element: (element[0], element[1]) )
输出
print(sorted_by_both)
[(20, 35), (125, 145), (156, 178), (211, 233), (220, 321), (227, 234), (230, 231), (472, 498), (4765, 8971), ...]
请注意,这些排序的输出中的每一个都按不同的顺序排列。那些顺序不同的元组是 "overlapping intervals",例如(227, 234)
应该放在 (230, 231)
之前或之后,因为这些间隔重叠。
我的目标是创建一个函数,它 (1) 在排序后的输出中搜索 "overlapping intervals" 和 (2) 然后在它们之间随机排列。
我能想到一个函数输出与给定元组重叠的所有元组,例如
def find_overlaps(input_tuple_list, search_interval):
results = []
for tup in input_tuple_list:
if ((tup[0] >= search_interval[0] and tup[0] <= search_interval[1]) or (tup[1] >= search_interval[0] and tup[1] <= search_interval[1])):
results.append(tup)
return results
工作原理如下
foo = (130, 150)
overlapping_foo = find_overlaps(list_tuple, foo)
print(overlapping_foo)
[(125, 145)]
但是,为了实现目标 (1),我需要编写一个函数来查找 list_tuple
.
我试过的:我原本以为我可以用它自己搜索原始元组,例如
total_overlaps = []
for tupp in list_tuple:
total_overlaps.append(find_overlaps(list_tuple, tupp))
这显然是错误的,因为输出是原始元组本身。
更大的问题是我看不到如何执行目标 (2)。我必须只 shuffle/re-order 个相互重叠的元组。假设我有一个从 (1) 中找到的重叠元组列表:
overlap_list = [(211, 233), (220, 321), (227, 234), (230, 231), (6491, 7000), (6800, 7200)]
以下列表理解失败
from random import shuffle
reordered = [shuffle(tupp) for tupp in overlap_list]
给予
TypeError: 'tuple' object does not support item assignment
同样重要的是,我不会将 (6491, 7000)
与 (211, 233)
混用,因为它们不相关。
如何找到元组列表中的重叠区间,然后分别打乱这些相互重叠的元组。
请注意,我很确定我了解您对洗牌的要求。但是您可以使用 itertools
配方 pairwise
将元素配对,然后使用 itertools.groupby()
对顺序重叠进行分组,即从 (6491, 7000)
中拆分 (211, 233)
:
import itertools as it
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = it.tee(iterable)
next(b, None)
return zip(a, b)
>>> overlap_list = [(211, 233), (220, 321), (227, 234), (230, 231), (6491, 7000), (6800, 7200)]
>>> [list(p) for k, p in it.groupby(pairwise(overlap_list), lambda x: x[0][0] < x[1][0] < x[0][1]) if k]
[[((211, 233), (220, 321)), ((220, 321), (227, 234)), ((227, 234), (230, 231))],
[((6491, 7000), (6800, 7200))]]
您可以 unpairwise
这些列表:
def unpairwise(iterable):
a, b = zip(*iterable)
yield a[0]
yield from b
所以:
>>> [list(unpairwise(p)) for k, p in it.groupby(pairwise(overlap_list), lambda x: x[0][0] < x[1][0] < x[0][1]) if k]
[[(211, 233), (220, 321), (227, 234), (230, 231)], [(6491, 7000), (6800, 7200)]]
扩展
>>> overlaps = [[(211, 233), (220, 321), (227, 234), (230, 231)], [(6491, 7000), (6800, 7200)]]
>>> for x in overlaps:
... random.shuffle(x)
...
>>> overlaps
[[(227, 234), (230, 231), (220, 321), (211, 233)], [(6491, 7000), (6800, 7200)]]
>>> for x in overlaps:
... random.shuffle(x)
...
>>> overlaps
[[(220, 321), (227, 234), (230, 231), (211, 233)], [(6491, 7000), (6800, 7200)]]
>>> for x in overlaps:
... random.shuffle(x)
...
>>> overlaps
[[(227, 234), (211, 233), (220, 321), (230, 231)], [(6800, 7200), (6491, 7000)]]
注意random.shuffle
到位了