在范围的每个循环中修改范围
modify range in every loop of the range
我有一个 groups.txt 文件,其中包含每个组中带有物种和基因 ID 的直向同源组。看起来像:
OG_117996: R_baltica_p|32476565 V_spinosum_v|497645257
OG_117997: R_baltica_p|32476942 S_pleomorpha_s|374317197
OG_117998: R_baltica_p|32477405 V_bacterium_v|198258541
我做了一个函数,它创建了一个名为 listOfAllSpecies 的整个文件(总共 66 个)中每个物种的列表。我需要创建一个函数,为我提供包含这 66 个中的 1 个物种的所有组,然后是包含这 66 个中的 2 个物种的所有组,等等
简化它:
OG_1: A|1 A|3 B|1 C|2
OG_2: A|4 B|6
OG_3: C|8 B|9 A|10
我需要在这个例子中得到:
(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) A,C (are in groups) OG_1, OG_3
(species) B,C (are in groups) OG_1, OG_2, OG_3
(species) A,B,C (are in groups) OG_1, OG_3
(species) B (is in groups) OG_1, OG_2, OG_3
我想试试
for species in range(start, end=None):
if end == None:
start = 0
end = start + 1
获取我的 listOfAllSpecies 中的第一个物种,然后告诉我它包含在哪些组中 OG_XXXX
。然后获取第一个和第二个物种,以此类推,直到它需要所有 66 个物种。如何修改 for 循环内的范围,或者有不同的方法吗?
这是我的实际代码,其中包含我需要的功能,但没有我要求的部分:
import sys
if len(sys.argv) != 2:
print("Error, file name to open is missing")
sys.exit([1])
def readGroupFile(groupFileName):
dict_gene_taxonomy = {}
fh = open(groupFileName,"r")
for line in fh:
liste = line.split(": ")
groupName = liste[0]
genesAsString = liste[1]
dict_taxon = {}
liste_gene = genesAsString.split()
for item in liste_gene:
taxonomy_gene = item.split("|")
taxonomy = taxonomy_gene[0]
geneId = taxonomy_gene[1]
if not taxonomy in dict_taxon:
dict_taxon[taxonomy] = []
dict_taxon[taxonomy].append(geneId)
dict_gene_taxonomy[groupName] = dict_taxon
fh.close()
return dict_gene_taxonomy
def showListOfAllSpecies(dictio):
listAllSpecies = []
for groupName in dictio:
dictio_in_dictio = dictio[groupName]
for speciesName in dictio_in_dictio:
if not speciesName in listAllSpecies:
listAllSpecies.append(speciesName)
return listAllSpecies
dico = readGroupFile(sys.argv[1])
listAllSpecies = showListOfAllSpecies(dico)
使用 while 循环来控制 range() 参数怎么样?
end = 0
start = 0
while end < 1000:
for species in range(start, end):
...do something
end += 1
一组N个项目(你所有物种的集合)的所有非空子集的列表是2N – 1
那是因为它就像一个N位的二进制数,其中每个位可以是1(取该物种在子集中)或0(从子集中排除该物种。)-1排除空集(所有位为 0)
因此你可以用一个简单的整数循环枚举物种的所有子集:
# sample data
listOfAllSpecies = ['A', 'B', 'C']
# enumerate all subsets of listOfAllSpecies, 0 excluded (the empty set)
for bits in range(1, 2**len(listOfAllSpecies)):
# build the subset
subset = []
for n in range(len(listOfAllSpecies)):
# test if the current subset includes bit n
if bits & 2**n:
subset.append(listOfAllSpecies[n])
# see which groups contain the given subset
print "species", ",".join(subset), "are in groups TODO"
结果:
species A are in groups TODO
species B are in groups TODO
species A,B are in groups TODO
species C are in groups TODO
species A,C are in groups TODO
species B,C are in groups TODO
species A,B,C are in groups TODO
如果您还需要代码来测试一个组是否包含一个子集,您需要指定这些组在您的程序中的存储方式。
如果post回答了您的问题,您应该点击左上角的绿色复选标记✔。
不确定这是否正是您想要的,但这是一个开始:)
from itertools import combinations
# Assume input is a list of strings called input_list
input_list = ['OG_1: A|1 A|3 B|1 C|2','OG_2: A|4 B|6','OG_3: C|8 B|9 A|10']
# Create a dict to store relationships and a list to store OGs
rels = {}
species = set()
# Populate the dict
for item in input_list:
params = item.split(': ')
og = params[0]
raw_species = params[1].split()
s = [rs.split('|')[0] for rs in raw_species]
rels[og] = s
for item in s:
species.add(item)
# Get the possible combinations of species:
combos = [c for limit in range(1, len(l)-1) for c in combinations(species,limit)]
def combo_in_og(combo, og):
for item in combo:
if item not in rels[og]:
return False
return True
# Loop over the combinations and print
for combo in combos:
valid_ogs = []
for og in ogs:
if combo_in_og(combo, og):
valid_ogs.append(og)
print('(species) ' + ','.join(combo) + ' (are in groups) ' + ', '.join(valid_ogs))
生产:
(species) C (are in groups) OG_1, OG_3
(species) A (are in groups) OG_1, OG_2, OG_3
(species) B (are in groups) OG_1, OG_2, OG_3
(species) C,A (are in groups) OG_1, OG_3
(species) C,B (are in groups) OG_1, OG_3
(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) C,A,B (are in groups) OG_1, OG_3
只是一个警告:如果输入的数量足够多,您正在尝试做的事情将开始永远持续下去,因为它的复杂度是 2^N。你无法绕过它(这就是 the problem demands),但它就在那里。
我有一个 groups.txt 文件,其中包含每个组中带有物种和基因 ID 的直向同源组。看起来像:
OG_117996: R_baltica_p|32476565 V_spinosum_v|497645257
OG_117997: R_baltica_p|32476942 S_pleomorpha_s|374317197
OG_117998: R_baltica_p|32477405 V_bacterium_v|198258541
我做了一个函数,它创建了一个名为 listOfAllSpecies 的整个文件(总共 66 个)中每个物种的列表。我需要创建一个函数,为我提供包含这 66 个中的 1 个物种的所有组,然后是包含这 66 个中的 2 个物种的所有组,等等
简化它:
OG_1: A|1 A|3 B|1 C|2
OG_2: A|4 B|6
OG_3: C|8 B|9 A|10
我需要在这个例子中得到:
(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) A,C (are in groups) OG_1, OG_3
(species) B,C (are in groups) OG_1, OG_2, OG_3
(species) A,B,C (are in groups) OG_1, OG_3
(species) B (is in groups) OG_1, OG_2, OG_3
我想试试
for species in range(start, end=None):
if end == None:
start = 0
end = start + 1
获取我的 listOfAllSpecies 中的第一个物种,然后告诉我它包含在哪些组中 OG_XXXX
。然后获取第一个和第二个物种,以此类推,直到它需要所有 66 个物种。如何修改 for 循环内的范围,或者有不同的方法吗?
这是我的实际代码,其中包含我需要的功能,但没有我要求的部分:
import sys
if len(sys.argv) != 2:
print("Error, file name to open is missing")
sys.exit([1])
def readGroupFile(groupFileName):
dict_gene_taxonomy = {}
fh = open(groupFileName,"r")
for line in fh:
liste = line.split(": ")
groupName = liste[0]
genesAsString = liste[1]
dict_taxon = {}
liste_gene = genesAsString.split()
for item in liste_gene:
taxonomy_gene = item.split("|")
taxonomy = taxonomy_gene[0]
geneId = taxonomy_gene[1]
if not taxonomy in dict_taxon:
dict_taxon[taxonomy] = []
dict_taxon[taxonomy].append(geneId)
dict_gene_taxonomy[groupName] = dict_taxon
fh.close()
return dict_gene_taxonomy
def showListOfAllSpecies(dictio):
listAllSpecies = []
for groupName in dictio:
dictio_in_dictio = dictio[groupName]
for speciesName in dictio_in_dictio:
if not speciesName in listAllSpecies:
listAllSpecies.append(speciesName)
return listAllSpecies
dico = readGroupFile(sys.argv[1])
listAllSpecies = showListOfAllSpecies(dico)
使用 while 循环来控制 range() 参数怎么样?
end = 0
start = 0
while end < 1000:
for species in range(start, end):
...do something
end += 1
一组N个项目(你所有物种的集合)的所有非空子集的列表是2N – 1
那是因为它就像一个N位的二进制数,其中每个位可以是1(取该物种在子集中)或0(从子集中排除该物种。)-1排除空集(所有位为 0)
因此你可以用一个简单的整数循环枚举物种的所有子集:
# sample data
listOfAllSpecies = ['A', 'B', 'C']
# enumerate all subsets of listOfAllSpecies, 0 excluded (the empty set)
for bits in range(1, 2**len(listOfAllSpecies)):
# build the subset
subset = []
for n in range(len(listOfAllSpecies)):
# test if the current subset includes bit n
if bits & 2**n:
subset.append(listOfAllSpecies[n])
# see which groups contain the given subset
print "species", ",".join(subset), "are in groups TODO"
结果:
species A are in groups TODO
species B are in groups TODO
species A,B are in groups TODO
species C are in groups TODO
species A,C are in groups TODO
species B,C are in groups TODO
species A,B,C are in groups TODO
如果您还需要代码来测试一个组是否包含一个子集,您需要指定这些组在您的程序中的存储方式。
如果post回答了您的问题,您应该点击左上角的绿色复选标记✔。
不确定这是否正是您想要的,但这是一个开始:)
from itertools import combinations
# Assume input is a list of strings called input_list
input_list = ['OG_1: A|1 A|3 B|1 C|2','OG_2: A|4 B|6','OG_3: C|8 B|9 A|10']
# Create a dict to store relationships and a list to store OGs
rels = {}
species = set()
# Populate the dict
for item in input_list:
params = item.split(': ')
og = params[0]
raw_species = params[1].split()
s = [rs.split('|')[0] for rs in raw_species]
rels[og] = s
for item in s:
species.add(item)
# Get the possible combinations of species:
combos = [c for limit in range(1, len(l)-1) for c in combinations(species,limit)]
def combo_in_og(combo, og):
for item in combo:
if item not in rels[og]:
return False
return True
# Loop over the combinations and print
for combo in combos:
valid_ogs = []
for og in ogs:
if combo_in_og(combo, og):
valid_ogs.append(og)
print('(species) ' + ','.join(combo) + ' (are in groups) ' + ', '.join(valid_ogs))
生产:
(species) C (are in groups) OG_1, OG_3
(species) A (are in groups) OG_1, OG_2, OG_3
(species) B (are in groups) OG_1, OG_2, OG_3
(species) C,A (are in groups) OG_1, OG_3
(species) C,B (are in groups) OG_1, OG_3
(species) A,B (are in groups) OG_1, OG_2, OG_3
(species) C,A,B (are in groups) OG_1, OG_3
只是一个警告:如果输入的数量足够多,您正在尝试做的事情将开始永远持续下去,因为它的复杂度是 2^N。你无法绕过它(这就是 the problem demands),但它就在那里。