从第一个值的唯一值创建嵌套列表
Create nested-list from unique values of first value
我正在尝试列出基因突变。
为此,我制作了一个列表,其中包含两个值的列表:基因和改变,结果是:
Gene_names = ['GPL2', 'GWEL', 'VREI', 'GPL2', 'GPL2', 'VREI']
Mutation_names = ['R278W', 'L72K', 'M939I', 'I354S', 'F472M', 'H8F']
my_list = []
for i in range(len(Gene_names)):
my_list.append([Gene_names[i], Mutation_names[i]])
print(my_list)
[[GPL2, R278W], [GWEL, L72K], [VREI, M939I], [GPL2, I354S], [GPL2, F472M], [VREI, H8F]]
如您所见,一些基因名称重复了新的改动。我想从之前的列表中得到的是另一个包含基因及其改变的列表列表,如下所示:
new_list = [[GPL2, R278W, I354S, F472M], [GWEL, L72K], [VREI, M939I, H8F]]
这样,内部列表由'First value = Gene_names'和'Next values = Mutation_names'
组成
对于我的特定目的来说,这将是一种理想的选择,但任何类似的东西也可能有效。
正如@PeterWood 在评论中建议的那样,您可以创建字典来对项目进行分组,稍后您可以将其转换为列表。
你可以用collections.defaultdict
来简化
from collections import defaultdict
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
my_dict = defaultdict(list)
for gene, mutation in my_list:
my_dict[gene].append(mutation)
print(my_dict)
结果:
defaultdict(<class 'list'>, {'GPL2': ['R278W', 'I354S', 'F472M'], 'GWEL': ['L72K'], 'VREI': ['M939I', 'H8F']})
并转换为列表(我使用 [gene]
创建列表,因为 all_mutations
也是列表)
result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]
print(result)
结果
[['GPL2', 'R278W', 'I354S', 'F472M'], ['GWEL', 'L72K'], ['VREI', 'M939I', 'H8F']]
完整示例
from collections import defaultdict
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
my_dict = defaultdict(list)
for gene, mutation in my_list:
my_dict[gene].append(mutation)
print(my_dict)
result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]
print(result)
与普通 dict
相同,而不是 defaultdict
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
my_dict = dict()
for gene, mutation in my_list:
if gene not in my_dict:
my_dict[gene] = []
my_dict[gene].append(mutation)
print(my_dict)
result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]
print(result)
编辑:
示例使用 pandas
和 groupby
import pandas as pd
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
# convert list to dataframe
df = pd.DataFrame(my_list, columns=['gene', 'mutation'])
print(df)
# group by `gene`
groups = df.groupby('gene')
# convert every group to list `all_mutations`
df_groups = groups['mutation'].apply(list).reset_index(name='all_mutations')
print(df_groups)
# convert two columns to one column
data = df_groups.apply(lambda row: [row['gene']] + row['all_mutations'], axis=1)
# convert dataframe back to list
result = data.to_list()
print(result)
我正在尝试列出基因突变。 为此,我制作了一个列表,其中包含两个值的列表:基因和改变,结果是:
Gene_names = ['GPL2', 'GWEL', 'VREI', 'GPL2', 'GPL2', 'VREI']
Mutation_names = ['R278W', 'L72K', 'M939I', 'I354S', 'F472M', 'H8F']
my_list = []
for i in range(len(Gene_names)):
my_list.append([Gene_names[i], Mutation_names[i]])
print(my_list)
[[GPL2, R278W], [GWEL, L72K], [VREI, M939I], [GPL2, I354S], [GPL2, F472M], [VREI, H8F]]
如您所见,一些基因名称重复了新的改动。我想从之前的列表中得到的是另一个包含基因及其改变的列表列表,如下所示:
new_list = [[GPL2, R278W, I354S, F472M], [GWEL, L72K], [VREI, M939I, H8F]]
这样,内部列表由'First value = Gene_names'和'Next values = Mutation_names'
组成对于我的特定目的来说,这将是一种理想的选择,但任何类似的东西也可能有效。
正如@PeterWood 在评论中建议的那样,您可以创建字典来对项目进行分组,稍后您可以将其转换为列表。
你可以用collections.defaultdict
来简化
from collections import defaultdict
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
my_dict = defaultdict(list)
for gene, mutation in my_list:
my_dict[gene].append(mutation)
print(my_dict)
结果:
defaultdict(<class 'list'>, {'GPL2': ['R278W', 'I354S', 'F472M'], 'GWEL': ['L72K'], 'VREI': ['M939I', 'H8F']})
并转换为列表(我使用 [gene]
创建列表,因为 all_mutations
也是列表)
result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]
print(result)
结果
[['GPL2', 'R278W', 'I354S', 'F472M'], ['GWEL', 'L72K'], ['VREI', 'M939I', 'H8F']]
完整示例
from collections import defaultdict
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
my_dict = defaultdict(list)
for gene, mutation in my_list:
my_dict[gene].append(mutation)
print(my_dict)
result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]
print(result)
与普通 dict
相同,而不是 defaultdict
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
my_dict = dict()
for gene, mutation in my_list:
if gene not in my_dict:
my_dict[gene] = []
my_dict[gene].append(mutation)
print(my_dict)
result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]
print(result)
编辑:
示例使用 pandas
和 groupby
import pandas as pd
my_list = [
['GPL2', 'R278W'],
['GWEL', 'L72K'],
['VREI', 'M939I'],
['GPL2', 'I354S'],
['GPL2', 'F472M'],
['VREI', 'H8F']
]
# convert list to dataframe
df = pd.DataFrame(my_list, columns=['gene', 'mutation'])
print(df)
# group by `gene`
groups = df.groupby('gene')
# convert every group to list `all_mutations`
df_groups = groups['mutation'].apply(list).reset_index(name='all_mutations')
print(df_groups)
# convert two columns to one column
data = df_groups.apply(lambda row: [row['gene']] + row['all_mutations'], axis=1)
# convert dataframe back to list
result = data.to_list()
print(result)