如何从嵌套列表和 return 那些列表中找到包含较高值的列表?
How to find the list that contains the higher value from nested list and return those lists?
我有这个包含重复条目的嵌套列表:
[['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
我想通过 i[3] 过滤嵌套列表,所以最终输出将是这样的
[['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
我尝试了 for 循环,但我不知道如何获得重复列表的最大值
这个问题可能有更优雅/pythonic 的解决方案,但这是一个可能的途径:
my_list = [...] # Nested list here
def compare_duplicates(nested_list, name_index=0, compare_index=3):
max_values = dict() # Used two dictionaries for readability
final_indexes = dict()
for i, item in enumerate(nested_list):
name, value = item[name_index], item[compare_index]
if value > max_values.get(name, 0):
max_values[name] = value
final_indexes[name] = i
return [nested_list[i] for i in final_indexes.values()]
print(compare_duplicates(my_list))
像这样:
_DATA = [
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
]
def print_highest(data):
list_map = {}
for d in data:
key = str(d[0:3] + d[4:])
if key not in list_map:
list_map[key] = d
continue
if d[3] > list_map[key][3]:
list_map[key] = d
for l in list_map.values():
print(l)
print_highest(_DATA)
输出:
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
这是我能想到的最 pythonic 的方法。我的做法是先对列表的列表进行排序,按sublist[3]
排序,这意味着当我们遍历列表时,我们最终会在遇到重复项之前遇到具有最大评论数的子列表。这个技巧将用于构建最终列表。
meta_list = [['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
# Sort the list by review count and review name - make sure the highest review is first
meta_list.sort(key=lambda x: (int(x[3]), x[0]), reverse=True)
# This is the list we'll use to store the final data in
final_list = []
# Go through all the items in the meta_list
for meta in meta_list:
if not meta[0] in [item[0] for item in final_list]:
'''
If another meta with the same name (0th index)
doesn't already exist in final_list, add it
'''
final_list.append(meta)
输出-
[['Instagram',
'SOCIAL',
'4.5',
66577446,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Teen',
'Social',
'July 31, 2018',
'Varies with device',
'Varies with device'],
['Gmail',
'COMMUNICATION',
'4.3',
4604483,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Everyone',
'Communication',
'August 2, 2018',
'Varies with device',
'Varies with device'],
['Coloring book moana',
'FAMILY',
'3.9',
974,
'14M',
'500,000+',
'Free',
'0',
'Everyone',
'Art & Design;Pretend Play',
'January 15, 2018',
'2.0.0',
'4.0.3 and up']]
基本上它会将所有不存在的元数据添加到 final_list
。为什么这行得通?因为您在循环时遇到的第一个元 是具有最高评论数 的元。所以一旦添加了那个,它的复制品就无法添加,我们就完成了。
注意:这不会保留评论本身的顺序。它只会确保只保留评论数最高的评论,以防出现同名的重复评论。
我有这个包含重复条目的嵌套列表:
[['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
我想通过 i[3] 过滤嵌套列表,所以最终输出将是这样的
[['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
我尝试了 for 循环,但我不知道如何获得重复列表的最大值
这个问题可能有更优雅/pythonic 的解决方案,但这是一个可能的途径:
my_list = [...] # Nested list here
def compare_duplicates(nested_list, name_index=0, compare_index=3):
max_values = dict() # Used two dictionaries for readability
final_indexes = dict()
for i, item in enumerate(nested_list):
name, value = item[name_index], item[compare_index]
if value > max_values.get(name, 0):
max_values[name] = value
final_indexes[name] = i
return [nested_list[i] for i in final_indexes.values()]
print(compare_duplicates(my_list))
像这样:
_DATA = [
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
]
def print_highest(data):
list_map = {}
for d in data:
key = str(d[0:3] + d[4:])
if key not in list_map:
list_map[key] = d
continue
if d[3] > list_map[key][3]:
list_map[key] = d
for l in list_map.values():
print(l)
print_highest(_DATA)
输出:
['Coloring book moana', 'ART_AND_DESIGN', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
这是我能想到的最 pythonic 的方法。我的做法是先对列表的列表进行排序,按sublist[3]
排序,这意味着当我们遍历列表时,我们最终会在遇到重复项之前遇到具有最大评论数的子列表。这个技巧将用于构建最终列表。
meta_list = [['Coloring book moana', 'ART_AND_DESIGN', '3.9', 967, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Coloring book moana', 'FAMILY', '3.9', 974, '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'],
['Gmail', 'COMMUNICATION', '4.3', 4604324, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Gmail', 'COMMUNICATION', '4.3', 4604483, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577313, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66577446, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'],
['Instagram', 'SOCIAL', '4.5', 66509917, 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']]
# Sort the list by review count and review name - make sure the highest review is first
meta_list.sort(key=lambda x: (int(x[3]), x[0]), reverse=True)
# This is the list we'll use to store the final data in
final_list = []
# Go through all the items in the meta_list
for meta in meta_list:
if not meta[0] in [item[0] for item in final_list]:
'''
If another meta with the same name (0th index)
doesn't already exist in final_list, add it
'''
final_list.append(meta)
输出-
[['Instagram',
'SOCIAL',
'4.5',
66577446,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Teen',
'Social',
'July 31, 2018',
'Varies with device',
'Varies with device'],
['Gmail',
'COMMUNICATION',
'4.3',
4604483,
'Varies with device',
'1,000,000,000+',
'Free',
'0',
'Everyone',
'Communication',
'August 2, 2018',
'Varies with device',
'Varies with device'],
['Coloring book moana',
'FAMILY',
'3.9',
974,
'14M',
'500,000+',
'Free',
'0',
'Everyone',
'Art & Design;Pretend Play',
'January 15, 2018',
'2.0.0',
'4.0.3 and up']]
基本上它会将所有不存在的元数据添加到 final_list
。为什么这行得通?因为您在循环时遇到的第一个元 是具有最高评论数 的元。所以一旦添加了那个,它的复制品就无法添加,我们就完成了。
注意:这不会保留评论本身的顺序。它只会确保只保留评论数最高的评论,以防出现同名的重复评论。