从第一个值的唯一值创建嵌套列表

Create nested-list from unique values of first value

我正在尝试列出基因突变。 为此,我制作了一个列表,其中包含两个值的列表:基因和改变,结果是:

Gene_names = ['GPL2', 'GWEL', 'VREI', 'GPL2', 'GPL2', 'VREI']
Mutation_names = ['R278W', 'L72K', 'M939I', 'I354S', 'F472M', 'H8F']

my_list = []
for i in range(len(Gene_names)):
    my_list.append([Gene_names[i], Mutation_names[i]])

print(my_list)
[[GPL2, R278W], [GWEL, L72K], [VREI, M939I], [GPL2, I354S], [GPL2, F472M], [VREI, H8F]]

如您所见,一些基因名称重复了新的改动。我想从之前的列表中得到的是另一个包含基因及其改变的列表列表,如下所示:

new_list = [[GPL2, R278W, I354S, F472M], [GWEL, L72K], [VREI, M939I, H8F]]

这样,内部列表由'First value = Gene_names'和'Next values = Mutation_names'

组成

对于我的特定目的来说,这将是一种理想的选择,但任何类似的东西也可能有效。

正如@PeterWood 在评论中建议的那样,您可以创建字典来对项目进行分组,稍后您可以将其转换为列表。

你可以用collections.defaultdict来简化

from collections import defaultdict

my_list = [
    ['GPL2', 'R278W'], 
    ['GWEL', 'L72K'], 
    ['VREI', 'M939I'], 
    ['GPL2', 'I354S'], 
    ['GPL2', 'F472M'], 
    ['VREI', 'H8F']
]

my_dict = defaultdict(list)

for gene, mutation in my_list:
    my_dict[gene].append(mutation)

print(my_dict)

结果:

defaultdict(<class 'list'>, {'GPL2': ['R278W', 'I354S', 'F472M'], 'GWEL': ['L72K'], 'VREI': ['M939I', 'H8F']})

并转换为列表(我使用 [gene] 创建列表,因为 all_mutations 也是列表)

result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]

print(result)

结果

[['GPL2', 'R278W', 'I354S', 'F472M'], ['GWEL', 'L72K'], ['VREI', 'M939I', 'H8F']]

完整示例

from collections import defaultdict

my_list = [
    ['GPL2', 'R278W'],
    ['GWEL', 'L72K'],
    ['VREI', 'M939I'],
    ['GPL2', 'I354S'],
    ['GPL2', 'F472M'],
    ['VREI', 'H8F']
]

my_dict = defaultdict(list)

for gene, mutation in my_list:
    my_dict[gene].append(mutation)

print(my_dict)

result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]

print(result)

与普通 dict 相同,而不是 defaultdict

my_list = [
    ['GPL2', 'R278W'],
    ['GWEL', 'L72K'],
    ['VREI', 'M939I'],
    ['GPL2', 'I354S'],
    ['GPL2', 'F472M'],
    ['VREI', 'H8F']
]

my_dict = dict()

for gene, mutation in my_list:
    if gene not in my_dict:
        my_dict[gene] = []
        
    my_dict[gene].append(mutation)

print(my_dict)

result = [ [gene]+all_mutations for gene, all_mutations in my_dict.items() ]

print(result)

编辑:

示例使用 pandasgroupby

import pandas as pd

my_list = [
    ['GPL2', 'R278W'],
    ['GWEL', 'L72K'],
    ['VREI', 'M939I'],
    ['GPL2', 'I354S'],
    ['GPL2', 'F472M'],
    ['VREI', 'H8F']
]

# convert list to dataframe
df = pd.DataFrame(my_list, columns=['gene', 'mutation'])

print(df)

# group by `gene`
groups = df.groupby('gene')
# convert every group to list `all_mutations`
df_groups = groups['mutation'].apply(list).reset_index(name='all_mutations')

print(df_groups)

# convert two columns to one column
data = df_groups.apply(lambda row: [row['gene']] + row['all_mutations'], axis=1)
# convert dataframe back to list
result = data.to_list()

print(result)