两个数组累积交集的快速算法
Fast algorithm for cumulative intersection of two arrays
我想计算两个数组的累积交集。例如
> cumulative_intersection([1,3,4,5,2,6], [4,1,5,2,3,7])
[[], [1], [1,4], [1,4,5], [1,4,5,3,2], [1,4,5,3,2]]
我可以使用循环暴力实现它。例如,在 Python、
a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
for i in range(n):
intersect = np.intersect1d(a1[:i], a2[:i])
all_intersects.append(intersect)
有没有更有效的算法来计算这个?我的直觉是计算可以按对数方式减少,因为每个连续列表都是前一个列表的超集。
如果您将交集算法替换为不对结果进行排序的算法(因此需要线性时间),则您的蛮力算法的渐近复杂度变为 O(n2)这已经是您问题的下限渐近复杂性。
为了证明这个下限,为了论证,我们假设您有一个 O(1) 方法来确定需要将哪些数字添加到第 i 个交点以获得第 (i+1) 个交点.因此,如果 C(i) 是复制第 i 个路口的时间,则您的总 运行 时间将为
现在,复制一个数组是一个 O(n) 操作,在最坏的情况下,你的两个数组相同,第 i 个交集的长度为 i+1。因此,你的最坏情况 运行-time 是
也就是说,您可以通过修改用于生成两个列表的交集的 O(n) 贪心算法,以常数因子击败您的原始算法的 运行 时间。
def intersection(a: list,b: list):
''' Computes the interection of lists a and b. Assumes that a and b have the same
length'''
a_leftover = set()
b_leftover = set()
stored = set() # We only want unique values
n = len(a)
result = []
for i in range(n):
a_elm = a[i]
b_elm = b[i]
if a_elm not in stored and a_elm == b_elm:
result.append(a_elm)
stored.add(a_elm)
else:
if a_elm not in stored:
if a_elm in b_leftover:
# a_elm was previously encountered in b
result.append(a_elm)
stored.add(a_elm)
b_leftover.remove(a_elm)
else:
a_leftover.add(a_elm)
if b_elm not in stored:
if b_elm in a_leftover:
# b_elf was previously encountered in a
result.append(b_elm)
stored.add(b_elm)
a_leftover.remove(b_elm)
else:
b_leftover.add(b_elm)
return result
您需要做的就是修改算法以存储和 return 中间结果。
def cumulative_intersection(a: list,b: list):
''' Computes the cumulative interection of lists a and b. Assumes that a and b have the same
length'''
a_leftover = set()
b_leftover = set()
stored = set() # We only want unique values
n = len(a)
result = []
results = []
for i in range(n):
a_elm = a[i]
b_elm = b[i]
if a_elm not in stored and a_elm == b_elm:
result.append(a_elm)
stored.add(a_elm)
else:
if a_elm not in stored:
if a_elm in b_leftover:
# a_elm was previously encountered in b
result.append(a_elm)
stored.add(a_elm)
b_leftover.remove(a_elm)
else:
a_leftover.add(a_elm)
if b_elm not in stored:
if b_elm in a_leftover:
# b_elf was previously encountered in a
result.append(b_elm)
stored.add(b_elm)
a_leftover.remove(b_elm)
else:
b_leftover.add(b_elm)
results.append(result[:])
return results
也许是这样的:
a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
current_intersect = list()
s1 = set()
s2 = set()
for i in range(n):
if a2[i] not in s2 and a2[i] in s1:
current_intersect.append(a2[i])
if a1[i] not in s1 and a1[i] in s2:
current_intersect.append(a1[i])
all_intersects.append(list(current_intersect))
s1.add(a1[i])
s2.add(a2[i])
大意是保留两组看过的数。 Set s1
跟踪列表 a1
中出现的数字。 Set s2
跟踪列表 a2
.
中出现的数字
处理列表a1
中的数字时,如果该数字尚未出现,但在列表a2
中出现,那么我们可以将其添加到数字列表中在 current_intersect
。同样对于列表 a2
.
中的数字
更新 current_intersect
后,current_intersect
添加到 all_intersects
。同时,将当前数组元素添加到两个集合中。
我想计算两个数组的累积交集。例如
> cumulative_intersection([1,3,4,5,2,6], [4,1,5,2,3,7])
[[], [1], [1,4], [1,4,5], [1,4,5,3,2], [1,4,5,3,2]]
我可以使用循环暴力实现它。例如,在 Python、
a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
for i in range(n):
intersect = np.intersect1d(a1[:i], a2[:i])
all_intersects.append(intersect)
有没有更有效的算法来计算这个?我的直觉是计算可以按对数方式减少,因为每个连续列表都是前一个列表的超集。
如果您将交集算法替换为不对结果进行排序的算法(因此需要线性时间),则您的蛮力算法的渐近复杂度变为 O(n2)这已经是您问题的下限渐近复杂性。
为了证明这个下限,为了论证,我们假设您有一个 O(1) 方法来确定需要将哪些数字添加到第 i 个交点以获得第 (i+1) 个交点.因此,如果 C(i) 是复制第 i 个路口的时间,则您的总 运行 时间将为
现在,复制一个数组是一个 O(n) 操作,在最坏的情况下,你的两个数组相同,第 i 个交集的长度为 i+1。因此,你的最坏情况 运行-time 是
也就是说,您可以通过修改用于生成两个列表的交集的 O(n) 贪心算法,以常数因子击败您的原始算法的 运行 时间。
def intersection(a: list,b: list):
''' Computes the interection of lists a and b. Assumes that a and b have the same
length'''
a_leftover = set()
b_leftover = set()
stored = set() # We only want unique values
n = len(a)
result = []
for i in range(n):
a_elm = a[i]
b_elm = b[i]
if a_elm not in stored and a_elm == b_elm:
result.append(a_elm)
stored.add(a_elm)
else:
if a_elm not in stored:
if a_elm in b_leftover:
# a_elm was previously encountered in b
result.append(a_elm)
stored.add(a_elm)
b_leftover.remove(a_elm)
else:
a_leftover.add(a_elm)
if b_elm not in stored:
if b_elm in a_leftover:
# b_elf was previously encountered in a
result.append(b_elm)
stored.add(b_elm)
a_leftover.remove(b_elm)
else:
b_leftover.add(b_elm)
return result
您需要做的就是修改算法以存储和 return 中间结果。
def cumulative_intersection(a: list,b: list):
''' Computes the cumulative interection of lists a and b. Assumes that a and b have the same
length'''
a_leftover = set()
b_leftover = set()
stored = set() # We only want unique values
n = len(a)
result = []
results = []
for i in range(n):
a_elm = a[i]
b_elm = b[i]
if a_elm not in stored and a_elm == b_elm:
result.append(a_elm)
stored.add(a_elm)
else:
if a_elm not in stored:
if a_elm in b_leftover:
# a_elm was previously encountered in b
result.append(a_elm)
stored.add(a_elm)
b_leftover.remove(a_elm)
else:
a_leftover.add(a_elm)
if b_elm not in stored:
if b_elm in a_leftover:
# b_elf was previously encountered in a
result.append(b_elm)
stored.add(b_elm)
a_leftover.remove(b_elm)
else:
b_leftover.add(b_elm)
results.append(result[:])
return results
也许是这样的:
a1 = [1,3,4,5,2,6]
a2 = [4,1,5,2,3,7]
n = len(a1)
all_intersects = list()
current_intersect = list()
s1 = set()
s2 = set()
for i in range(n):
if a2[i] not in s2 and a2[i] in s1:
current_intersect.append(a2[i])
if a1[i] not in s1 and a1[i] in s2:
current_intersect.append(a1[i])
all_intersects.append(list(current_intersect))
s1.add(a1[i])
s2.add(a2[i])
大意是保留两组看过的数。 Set s1
跟踪列表 a1
中出现的数字。 Set s2
跟踪列表 a2
.
处理列表a1
中的数字时,如果该数字尚未出现,但在列表a2
中出现,那么我们可以将其添加到数字列表中在 current_intersect
。同样对于列表 a2
.
更新 current_intersect
后,current_intersect
添加到 all_intersects
。同时,将当前数组元素添加到两个集合中。