根据条件比较列表列表中的所有列表,并根据它们的差异将它们分组在一起
Compare all lists within list of lists based on condition and group them together by their difference
我有以下列表:
a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]
为了便于理解,用索引表示:
0 [1, 2, 3, 4, 5]
1 [4, 5, 6, 7, 8]
2 [1, 2, 3, 4]
3 [4, 5, 6, 7, 8, 9]
4 [2, 3, 4, 5, 6, 7, 8]
5 [6, 7, 8, 9]
6 [5, 6, 7, 8, 9]
7 [2, 3, 4, 5, 6]
8 [3, 4, 5, 6]
9 [11, 12, 13, 14, 15]
10 [13, 14, 15]
我期待输出如下所示的元组列表:
output = [(0,2,1), (3,1,1), (4,7,2), (4,1,2), (6,5,1), (3,5,2), (3,6,1), (7,8,1), (9,10,2)]
For example to explain first item of output i.e, (0,2,1):
0 ---> index of list under comparison with highest length
2 ---> index of list under comparison with lowest length
1 ---> difference in length of the two lists 0 & 2
现在,问题来了:
我有一些列表,其中包含相似的项目,但在列表的开头或结尾处长度相差一和二(或三)。
我想排序、分组、识别列表的索引以及它们作为元组的差异。
我查看了多个 Whosebug 问题,但找不到类似的问题。
我是 python 的新手,开始使用以下位代码并卡住了:
a = sorted(a, key = len)
incr = [list(g) for k, g in groupby(a, key=len)]
decr = list(reversed(incr))
ndecr = [i for j in decr for i in j]
for i in range(len(ndecr)-1):
if len(ndecr[i]) - len(ndecr[i+1]) == 1:
print(ndecr[i])
for i in range(len(ndecr)-2):
if len(ndecr[i]) - len(ndecr[i+2]) == 2:
print(ndecr[i])
for i in ndecr:
ele = i
ndecr.remove(i)
for j in ndecr:
if ele[:-1] == j:
print(j)
for i in ndecr:
ele = i
ndecr.remove(i)
for j in ndecr:
if ele[:-2] == j:
print(i)
请帮助我实现输出的方法。
编辑(原文如下):
现在,我可能会更好地理解你(感谢 @vash_the_stampede 的澄清评论)。这种方法嵌套了几个循环来比较列表列表中的每个列表,并确定一个是否是另一个的子集。然后,如果比较列表是超集/子集,它会创建一个元组输出列表,每个元组包含两个比较列表的索引,最长排在第一位,以及这些比较列表的长度差异。
重要提示:此方法不比较列表顺序,因此它可能会提供您可能不想要的输出,例如 [1,2,4,5]
是 [1,2,3,4,5]
的子集,长度差为 1。或者,具体对于您的示例,与您的示例输出相比,此方法会输出一个额外的元组,因为索引 8 处的 [3,4,5,6]
是索引 4 处 [2,3,4,5,6,7,8]
的子集,长度差为 3。我认为 @DSM 的答案可以解决这个问题,因此它可能更接近您的需求。
当前数据集的示例输出:
a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]
output = []
for i in range(len(a)):
for j in range(i + 1, len(a)):
if set(a[i]).issubset(a[j]) or set(a[i]).issuperset(a[j]):
diff = abs(len(a[i]) - len(a[j]))
if len(a[i]) > len(a[j]):
output.append((i, j, diff))
else:
output.append((j, i, diff))
print(output)
# OUTPUT
# [(0, 2, 1), (3, 1, 1), (4, 1, 2), (3, 5, 2), (3, 6, 1), (4, 7, 2), (4, 8, 3), (6, 5, 1), (7, 8, 1), (9, 10, 2)]
原版:
如果我的理解正确,那么您可以嵌套几个循环来比较列表列表中的每个列表。然后,创建一个元组输出列表,每个元组包含两个比较列表的索引以及这些比较列表的长度差异。例如:
a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]
output = []
for i in range(len(a)):
for j in range(i + 1, len(a)):
diff = abs(len(a[i]) - len(a[j]))
output.append((i, j, diff))
print(output)
# OUTPUT
# [(0, 1, 0), (0, 2, 1), (0, 3, 1), (0, 4, 2), (0, 5, 1), (0, 6, 0), (0, 7, 0), (0, 8, 1), (0, 9, 0), (0, 10, 2), (1, 2, 1), (1, 3, 1), (1, 4, 2), (1, 5, 1), (1, 6, 0), (1, 7, 0), (1, 8, 1), (1, 9, 0), (1, 10, 2), (2, 3, 2), (2, 4, 3), (2, 5, 0), (2, 6, 1), (2, 7, 1), (2, 8, 0), (2, 9, 1), (2, 10, 1), (3, 4, 1), (3, 5, 2), (3, 6, 1), (3, 7, 1), (3, 8, 2), (3, 9, 1), (3, 10, 3), (4, 5, 3), (4, 6, 2), (4, 7, 2), (4, 8, 3), (4, 9, 2), (4, 10, 4), (5, 6, 1), (5, 7, 1), (5, 8, 0), (5, 9, 1), (5, 10, 1), (6, 7, 0), (6, 8, 1), (6, 9, 0), (6, 10, 2), (7, 8, 1), (7, 9, 0), (7, 10, 2), (8, 9, 1), (8, 10, 1), (9, 10, 2)]
IIUC,假设列表的总数很小,因此 len(lists)^2 仍然很小,类似于
from itertools import combinations
# sort by length but preserve the index
ax = sorted(enumerate(a), key=lambda x: len(x[1]))
done = []
for (i0, seq0), (i1, seq1) in combinations(ax, 2):
if seq1[:len(seq0)] == seq0 or seq1[-len(seq0):] == seq0:
done.append((i1, i0, len(seq1)-len(seq0)))
给我
In [117]: sorted(done)
Out[117]:
[(0, 2, 1),
(3, 1, 1),
(3, 5, 2),
(3, 6, 1),
(4, 1, 2),
(4, 7, 2),
(6, 5, 1),
(7, 8, 1),
(9, 10, 2)]
这与您的输出匹配,但为了顺序,事实上您列出了 (4, 7, 2) 两次。
seq1[:len(seq0)] == seq0
是"does seq1 start with seq0?"条件,而
seq1[-len(seq0):] == seq0
是"does seq1 end with seq0?"条件。
嗯,我确定这可以更有效地完成,但我所做的是创建原始列表的副本,其中每个项目的两端都缩短了一两个,然后比较这些项目并返回索引它们相差的相应长度,它有效,但它相当大我将看到如何减少它
l1 = a[:]
tups = []
for idx, item in enumerate(l1):
for x, i in enumerate(a):
if sorted(item[:-1]) == sorted(i):
tups.append((idx, x, 1))
elif sorted(item[:-2]) == sorted(i):
tups.append((idx, x, 2))
elif sorted(item[1:]) == sorted(i):
tups.append((idx, x, 1))
elif sorted(item[2:]) == sorted(i):
tups.append((idx, x, 2))
print(tups)
[(0, 2, 1), (3, 1, 1), (4, 7, 2), (3, 6, 1), (6, 5, 1), (7, 8, 1), (3, 5, 2), (4, 1, 2), (9, 10, 2)]
我有以下列表:
a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]
为了便于理解,用索引表示:
0 [1, 2, 3, 4, 5]
1 [4, 5, 6, 7, 8]
2 [1, 2, 3, 4]
3 [4, 5, 6, 7, 8, 9]
4 [2, 3, 4, 5, 6, 7, 8]
5 [6, 7, 8, 9]
6 [5, 6, 7, 8, 9]
7 [2, 3, 4, 5, 6]
8 [3, 4, 5, 6]
9 [11, 12, 13, 14, 15]
10 [13, 14, 15]
我期待输出如下所示的元组列表:
output = [(0,2,1), (3,1,1), (4,7,2), (4,1,2), (6,5,1), (3,5,2), (3,6,1), (7,8,1), (9,10,2)]
For example to explain first item of output i.e, (0,2,1):
0 ---> index of list under comparison with highest length
2 ---> index of list under comparison with lowest length
1 ---> difference in length of the two lists 0 & 2
现在,问题来了:
我有一些列表,其中包含相似的项目,但在列表的开头或结尾处长度相差一和二(或三)。
我想排序、分组、识别列表的索引以及它们作为元组的差异。
我查看了多个 Whosebug 问题,但找不到类似的问题。
我是 python 的新手,开始使用以下位代码并卡住了:
a = sorted(a, key = len)
incr = [list(g) for k, g in groupby(a, key=len)]
decr = list(reversed(incr))
ndecr = [i for j in decr for i in j]
for i in range(len(ndecr)-1):
if len(ndecr[i]) - len(ndecr[i+1]) == 1:
print(ndecr[i])
for i in range(len(ndecr)-2):
if len(ndecr[i]) - len(ndecr[i+2]) == 2:
print(ndecr[i])
for i in ndecr:
ele = i
ndecr.remove(i)
for j in ndecr:
if ele[:-1] == j:
print(j)
for i in ndecr:
ele = i
ndecr.remove(i)
for j in ndecr:
if ele[:-2] == j:
print(i)
请帮助我实现输出的方法。
编辑(原文如下):
现在,我可能会更好地理解你(感谢 @vash_the_stampede 的澄清评论)。这种方法嵌套了几个循环来比较列表列表中的每个列表,并确定一个是否是另一个的子集。然后,如果比较列表是超集/子集,它会创建一个元组输出列表,每个元组包含两个比较列表的索引,最长排在第一位,以及这些比较列表的长度差异。
重要提示:此方法不比较列表顺序,因此它可能会提供您可能不想要的输出,例如 [1,2,4,5]
是 [1,2,3,4,5]
的子集,长度差为 1。或者,具体对于您的示例,与您的示例输出相比,此方法会输出一个额外的元组,因为索引 8 处的 [3,4,5,6]
是索引 4 处 [2,3,4,5,6,7,8]
的子集,长度差为 3。我认为 @DSM 的答案可以解决这个问题,因此它可能更接近您的需求。
当前数据集的示例输出:
a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]
output = []
for i in range(len(a)):
for j in range(i + 1, len(a)):
if set(a[i]).issubset(a[j]) or set(a[i]).issuperset(a[j]):
diff = abs(len(a[i]) - len(a[j]))
if len(a[i]) > len(a[j]):
output.append((i, j, diff))
else:
output.append((j, i, diff))
print(output)
# OUTPUT
# [(0, 2, 1), (3, 1, 1), (4, 1, 2), (3, 5, 2), (3, 6, 1), (4, 7, 2), (4, 8, 3), (6, 5, 1), (7, 8, 1), (9, 10, 2)]
原版:
如果我的理解正确,那么您可以嵌套几个循环来比较列表列表中的每个列表。然后,创建一个元组输出列表,每个元组包含两个比较列表的索引以及这些比较列表的长度差异。例如:
a = [[1,2,3,4,5], [4,5,6,7,8], [1,2,3,4], [4,5,6,7,8,9], [2,3,4,5,6,7,8], [6,7,8,9], [5,6,7,8,9], [2,3,4,5,6], [3,4,5,6], [11,12,13,14,15], [13,14,15]]
output = []
for i in range(len(a)):
for j in range(i + 1, len(a)):
diff = abs(len(a[i]) - len(a[j]))
output.append((i, j, diff))
print(output)
# OUTPUT
# [(0, 1, 0), (0, 2, 1), (0, 3, 1), (0, 4, 2), (0, 5, 1), (0, 6, 0), (0, 7, 0), (0, 8, 1), (0, 9, 0), (0, 10, 2), (1, 2, 1), (1, 3, 1), (1, 4, 2), (1, 5, 1), (1, 6, 0), (1, 7, 0), (1, 8, 1), (1, 9, 0), (1, 10, 2), (2, 3, 2), (2, 4, 3), (2, 5, 0), (2, 6, 1), (2, 7, 1), (2, 8, 0), (2, 9, 1), (2, 10, 1), (3, 4, 1), (3, 5, 2), (3, 6, 1), (3, 7, 1), (3, 8, 2), (3, 9, 1), (3, 10, 3), (4, 5, 3), (4, 6, 2), (4, 7, 2), (4, 8, 3), (4, 9, 2), (4, 10, 4), (5, 6, 1), (5, 7, 1), (5, 8, 0), (5, 9, 1), (5, 10, 1), (6, 7, 0), (6, 8, 1), (6, 9, 0), (6, 10, 2), (7, 8, 1), (7, 9, 0), (7, 10, 2), (8, 9, 1), (8, 10, 1), (9, 10, 2)]
IIUC,假设列表的总数很小,因此 len(lists)^2 仍然很小,类似于
from itertools import combinations
# sort by length but preserve the index
ax = sorted(enumerate(a), key=lambda x: len(x[1]))
done = []
for (i0, seq0), (i1, seq1) in combinations(ax, 2):
if seq1[:len(seq0)] == seq0 or seq1[-len(seq0):] == seq0:
done.append((i1, i0, len(seq1)-len(seq0)))
给我
In [117]: sorted(done)
Out[117]:
[(0, 2, 1),
(3, 1, 1),
(3, 5, 2),
(3, 6, 1),
(4, 1, 2),
(4, 7, 2),
(6, 5, 1),
(7, 8, 1),
(9, 10, 2)]
这与您的输出匹配,但为了顺序,事实上您列出了 (4, 7, 2) 两次。
seq1[:len(seq0)] == seq0
是"does seq1 start with seq0?"条件,而
seq1[-len(seq0):] == seq0
是"does seq1 end with seq0?"条件。
嗯,我确定这可以更有效地完成,但我所做的是创建原始列表的副本,其中每个项目的两端都缩短了一两个,然后比较这些项目并返回索引它们相差的相应长度,它有效,但它相当大我将看到如何减少它
l1 = a[:]
tups = []
for idx, item in enumerate(l1):
for x, i in enumerate(a):
if sorted(item[:-1]) == sorted(i):
tups.append((idx, x, 1))
elif sorted(item[:-2]) == sorted(i):
tups.append((idx, x, 2))
elif sorted(item[1:]) == sorted(i):
tups.append((idx, x, 1))
elif sorted(item[2:]) == sorted(i):
tups.append((idx, x, 2))
print(tups)
[(0, 2, 1), (3, 1, 1), (4, 7, 2), (3, 6, 1), (6, 5, 1), (7, 8, 1), (3, 5, 2), (4, 1, 2), (9, 10, 2)]