使用递归的生成器对象
Generator object using recursion
Python2.7
给定列表 alleles
和数组长度 numb_alleles
,例如:
alleles = [11, 12, 13, 14, 15, 16]
numb_alleles = 8
我一直在尝试遍历每个笛卡尔积和 select 符合以下 select离子标准的与我的研究相关的等位基因:
- 笛卡尔积中的每个第二个值必须大于它之前的值。例如,给定上述条件,笛卡尔积
[13, 15, 11, 12, 14, 15, 16, 16]
将满足select离子标准,而 [13, 15, 16, 12, 14, 15, 16, 16]
不会因为索引 2 和 3。
alleles
中的每个值都必须存在于笛卡尔积中。
例如,[13, 15, 11, 12, 14, 15, 16, 16]
将满足 selection 标准,而 [13, 15, 11, 12, 14, 15, 11, 13]
则不会,因为 16
不在产品中。
我一直在使用 itertools.product(alleles, repeat = numb_alleles)
遍历每个可能的笛卡尔积以进一步分析。然而,随着 numb_alleles
增加到 10 或 12,整体计算量显着增加。
我试图通过使用下面的递归函数 select 相关的笛卡尔积来解决这个问题。
def check_allele(allele_combination, alleles):
"""Check if all the alleles are present in allele_combination"""
for allele in alleles:
if allele not in allele_combination:
return False
return True
def recursive_product(alleles, numb_alleles, result):
current_len = len(result[0])
new_result = []
final_result = []
for comb in result:
for allele in alleles:
if current_len % 2 == 0:
new_result.append(comb + [allele])
elif current_len % 2 == 1:
if comb[-1] <= allele:
new_result.append(comb + [allele])
if (check_allele(comb + [allele], alleles)):
final_result.append(comb + [allele])
if current_len + 1 < numb_alleles:
return recursive_product(alleles, numb_alleles, new_result)
else:
return final_result
a = (recursive_product(alleles, numb_alleles, [[]]))
但是,使用这种方法我仍然无法处理数组 numb_alleles = 12
或当 alleles
的长度增加时,因为我使用的是 return
而不是 [=25] =].因此,它会导致内存不足错误。
我想知道我是否可以将这个函数变成一个生成器,或者是否有人可以建议不同的方法,以便我可以进一步计算 numb_alleles = 12
和更长 [=12] 的输出=]数组。
非常感谢!
你说:"Every second value in the Cartesian product must be larger than the value before it." 但在你的例子中 [13, 15, 11, 12, 14, 15, 16, 16]
槽位 7 (16) 中的项目等于前一个槽位中的项目,所以我假设你的意思是奇数项索引必须 >= 到前一个偶数索引处的项目。
下面的生成器比您当前的方法更有效,并且它避免了在 RAM 中保存大量临时列表。核心思想是使用 itertools.product
为偶数槽生成组合,然后再次使用 product
来填充满足选择标准#1 的奇数槽。我们使用集合操作来确保最终组合包含 alleles
.
中的每个项目
from itertools import product
def combine_alleles(alleles, numb_alleles):
''' Make combinations that conform to the selection criteria. First create
the items for the even slots, then create items for the odd slots such
that each odd slot item >= the corresponding even slot item. Then test
that the whole combination contains each item in alleles.
'''
# If the number of unique items in the even slots is < min_len, then it's
# impossible to make a full combination containing all of the alleles.
min_len = len(alleles) - numb_alleles // 2
# Create a function to test if a given combination
# contains all of the alleles.
alleles_set = set(alleles)
complete = alleles_set.issubset
# Make lists of alleles that are >= the current allele number
higher = {k: [u for u in alleles if u >= k] for k in alleles}
# Make combinations for the even slots
for evens in product(alleles, repeat=numb_alleles // 2):
if len(set(evens)) < min_len:
continue
# Make combinations for the odd slots that go with this
# combination of evens.
a = [higher[u] for u in evens]
for odds in product(*a):
if complete(evens + odds):
yield [u for pair in zip(evens, odds) for u in pair]
# test
alleles = [11, 12, 13, 14, 15, 16]
numb_alleles = 8
for i, t in enumerate(combine_alleles(alleles, numb_alleles), 1):
print(i, t)
此代码找到 16020 个组合,因此输出太大,无法包含在此处。
这是一个更接近您的版本的替代生成器,但在我的测试中它比我的第一个版本慢一点。
def combine_alleles(alleles, numb_alleles):
total_len = len(alleles)
# Make lists of alleles that are >= the current allele number
higher = {k: [u for u in alleles if u >= k] for k in alleles}
def combos(i, base):
remaining = numb_alleles - i
if len(set(base)) + remaining < total_len:
return
if remaining == 0:
yield base
return
ii = i + 1
for u in higher[base[-1]] if i % 2 else alleles:
yield from combos(ii, base + [u])
yield from combos(0, [])
此版本适用于 Python 3。Python 2 没有 yield from
,但这很容易修复:
yield from some_iterable
等同于
for t in some_iterable:
yield t
Python2.7
给定列表 alleles
和数组长度 numb_alleles
,例如:
alleles = [11, 12, 13, 14, 15, 16]
numb_alleles = 8
我一直在尝试遍历每个笛卡尔积和 select 符合以下 select离子标准的与我的研究相关的等位基因:
- 笛卡尔积中的每个第二个值必须大于它之前的值。例如,给定上述条件,笛卡尔积
[13, 15, 11, 12, 14, 15, 16, 16]
将满足select离子标准,而[13, 15, 16, 12, 14, 15, 16, 16]
不会因为索引 2 和 3。 alleles
中的每个值都必须存在于笛卡尔积中。 例如,[13, 15, 11, 12, 14, 15, 16, 16]
将满足 selection 标准,而[13, 15, 11, 12, 14, 15, 11, 13]
则不会,因为16
不在产品中。
我一直在使用 itertools.product(alleles, repeat = numb_alleles)
遍历每个可能的笛卡尔积以进一步分析。然而,随着 numb_alleles
增加到 10 或 12,整体计算量显着增加。
我试图通过使用下面的递归函数 select 相关的笛卡尔积来解决这个问题。
def check_allele(allele_combination, alleles):
"""Check if all the alleles are present in allele_combination"""
for allele in alleles:
if allele not in allele_combination:
return False
return True
def recursive_product(alleles, numb_alleles, result):
current_len = len(result[0])
new_result = []
final_result = []
for comb in result:
for allele in alleles:
if current_len % 2 == 0:
new_result.append(comb + [allele])
elif current_len % 2 == 1:
if comb[-1] <= allele:
new_result.append(comb + [allele])
if (check_allele(comb + [allele], alleles)):
final_result.append(comb + [allele])
if current_len + 1 < numb_alleles:
return recursive_product(alleles, numb_alleles, new_result)
else:
return final_result
a = (recursive_product(alleles, numb_alleles, [[]]))
但是,使用这种方法我仍然无法处理数组 numb_alleles = 12
或当 alleles
的长度增加时,因为我使用的是 return
而不是 [=25] =].因此,它会导致内存不足错误。
我想知道我是否可以将这个函数变成一个生成器,或者是否有人可以建议不同的方法,以便我可以进一步计算 numb_alleles = 12
和更长 [=12] 的输出=]数组。
非常感谢!
你说:"Every second value in the Cartesian product must be larger than the value before it." 但在你的例子中 [13, 15, 11, 12, 14, 15, 16, 16]
槽位 7 (16) 中的项目等于前一个槽位中的项目,所以我假设你的意思是奇数项索引必须 >= 到前一个偶数索引处的项目。
下面的生成器比您当前的方法更有效,并且它避免了在 RAM 中保存大量临时列表。核心思想是使用 itertools.product
为偶数槽生成组合,然后再次使用 product
来填充满足选择标准#1 的奇数槽。我们使用集合操作来确保最终组合包含 alleles
.
from itertools import product
def combine_alleles(alleles, numb_alleles):
''' Make combinations that conform to the selection criteria. First create
the items for the even slots, then create items for the odd slots such
that each odd slot item >= the corresponding even slot item. Then test
that the whole combination contains each item in alleles.
'''
# If the number of unique items in the even slots is < min_len, then it's
# impossible to make a full combination containing all of the alleles.
min_len = len(alleles) - numb_alleles // 2
# Create a function to test if a given combination
# contains all of the alleles.
alleles_set = set(alleles)
complete = alleles_set.issubset
# Make lists of alleles that are >= the current allele number
higher = {k: [u for u in alleles if u >= k] for k in alleles}
# Make combinations for the even slots
for evens in product(alleles, repeat=numb_alleles // 2):
if len(set(evens)) < min_len:
continue
# Make combinations for the odd slots that go with this
# combination of evens.
a = [higher[u] for u in evens]
for odds in product(*a):
if complete(evens + odds):
yield [u for pair in zip(evens, odds) for u in pair]
# test
alleles = [11, 12, 13, 14, 15, 16]
numb_alleles = 8
for i, t in enumerate(combine_alleles(alleles, numb_alleles), 1):
print(i, t)
此代码找到 16020 个组合,因此输出太大,无法包含在此处。
这是一个更接近您的版本的替代生成器,但在我的测试中它比我的第一个版本慢一点。
def combine_alleles(alleles, numb_alleles):
total_len = len(alleles)
# Make lists of alleles that are >= the current allele number
higher = {k: [u for u in alleles if u >= k] for k in alleles}
def combos(i, base):
remaining = numb_alleles - i
if len(set(base)) + remaining < total_len:
return
if remaining == 0:
yield base
return
ii = i + 1
for u in higher[base[-1]] if i % 2 else alleles:
yield from combos(ii, base + [u])
yield from combos(0, [])
此版本适用于 Python 3。Python 2 没有 yield from
,但这很容易修复:
yield from some_iterable
等同于
for t in some_iterable:
yield t