根据多个值过滤字典列表
Filtering a list of dictionaries based on multiple values
我有一个字典列表,我想根据多个条件对其进行过滤。列表的简化版本如下所示:
orders = [{"name": "v", "price": 123, "location": "Mars"},
{"name": "x", "price": 223, "location": "Mars"},
{"name": "x", "price": 124, "location": "Mars"},
{"name": "y", "price": 456, "location": "Mars"},
{"name": "z", "price": 123, "location": "Mars"},
{"name": "z", "price": 5623, "location": "Mars"}]
我希望得到一个列表,其中包含具有相同 "name" 键的每本词典价格最低的词典。
例如,上面会变成:
minimums = [{"name": "v", "price": 123, "location": "Mars"},
{"name": "x", "price": 124, "location": "Mars"},
{"name": "y", "price": 456, "location": "Mars"},
{"name": "z", "price": 123, "location": "Mars"}]
我已经通过讨厌的嵌套 if 语句和 for 循环实现了这一点,但是我希望有更多 "Pythonic" 的方式来实现。
重复使用同一个列表或创建一个新列表都可以。
感谢您的帮助。
编辑:
谢谢你的回答,我试着用下面的代码给每个人计时
print("Number of dictionaries in orders: " + str(len(orders)))
t0 = time.time()
sorted_orders = sorted(orders, key=lambda i: i["name"])
t1 = time.time()
sorting_time = (t1 - t0)
t0 = time.time()
listcomp_wikiben = [x for x in orders if all(x["price"] <= y["price"] for y in orders if x["name"] == y["name"])]
t1 = time.time()
print("listcomp_wikiben: " + str(t1 - t0))
t0 = time.time()
itertools_MrGeek = [min(g[1], key=lambda x: x['price']) for g in groupby(sorted_orders, lambda o: o['name'])]
t1 = time.time()
print("itertools_MrGeek: " + str(t1 - t0 + sorting_time))
t0 = time.time()
itertools_Cory = [min(g, key=lambda j: j["price"]) for k,g in groupby(sorted_orders, key=lambda i: i["name"])]
t1 = time.time()
print("itertools_CoryKramer: " + str(t1 - t0 + sorting_time))
t0 = time.time()
pandas_Trenton = pd.DataFrame(orders)
pandas_Trenton.groupby(['name'])['price'].min()
t1 = time.time()
print("pandas_Trenton_M: " + str(t1 - t0))
结果是:
Number of dictionaries in orders: 20867
listcomp_wikiben: 39.78123s
itertools_MrGeek: 0.01562s
itertools_CoryKramer: 0.01565s
pandas_Trenton_M: 0.29685s
如果您首先按 "name"
对列表进行排序,您可以使用 itertools.groupby
对它们进行分组,然后使用 min
和 lambda 来找到最小值 "price"
每组。
>>> from itertools import groupby
>>> sorted_orders = sorted(orders, key=lambda i: i["name"])
>>> [min(g, key=lambda j: j["price"]) for k,g in groupby(sorted_orders , key=lambda i: i["name"])]
[{'name': 'v', 'price': 123, 'location': 'Mars'},
{'name': 'x', 'price': 124, 'location': 'Mars'},
{'name': 'y', 'price': 456, 'location': 'Mars'},
{'name': 'z', 'price': 123, 'location': 'Mars'}]
您可以使用 itertools.groupby
:
from itertools import groupby
print(
[
min(g[1], key=lambda x: x['price'])
for g in groupby(sorted(orders, key=lambda o: o['name']), lambda o: o['name'])
]
)
输出:
[
{'name': 'v', 'price': 123, 'location': 'Mars'},
{'name': 'x', 'price': 124, 'location': 'Mars'},
{'name': 'y', 'price': 456, 'location': 'Mars'},
{'name': 'z', 'price': 123, 'location': 'Mars'}
]
没有 itertools 的解决方案
[x for x in orders if all(x["price"] <= y["price"] for y in orders if x["name"] == y["name"])]
使用pandas
:
orders = [{"name": "v", "price": 123, "location": "Mars"},
{"name": "x", "price": 223, "location": "Mars"},
{"name": "x", "price": 124, "location": "Mars"},
{"name": "y", "price": 456, "location": "Mars"},
{"name": "z", "price": 123, "location": "Pluto"},
{"name": "z", "price": 5623, "location": "Mars"}]
import pandas as pd
df = pd.DataFrame(orders)
df.groupby(['name', 'location'])['price'].min()
我有一个字典列表,我想根据多个条件对其进行过滤。列表的简化版本如下所示:
orders = [{"name": "v", "price": 123, "location": "Mars"},
{"name": "x", "price": 223, "location": "Mars"},
{"name": "x", "price": 124, "location": "Mars"},
{"name": "y", "price": 456, "location": "Mars"},
{"name": "z", "price": 123, "location": "Mars"},
{"name": "z", "price": 5623, "location": "Mars"}]
我希望得到一个列表,其中包含具有相同 "name" 键的每本词典价格最低的词典。 例如,上面会变成:
minimums = [{"name": "v", "price": 123, "location": "Mars"},
{"name": "x", "price": 124, "location": "Mars"},
{"name": "y", "price": 456, "location": "Mars"},
{"name": "z", "price": 123, "location": "Mars"}]
我已经通过讨厌的嵌套 if 语句和 for 循环实现了这一点,但是我希望有更多 "Pythonic" 的方式来实现。
重复使用同一个列表或创建一个新列表都可以。
感谢您的帮助。
编辑: 谢谢你的回答,我试着用下面的代码给每个人计时
print("Number of dictionaries in orders: " + str(len(orders)))
t0 = time.time()
sorted_orders = sorted(orders, key=lambda i: i["name"])
t1 = time.time()
sorting_time = (t1 - t0)
t0 = time.time()
listcomp_wikiben = [x for x in orders if all(x["price"] <= y["price"] for y in orders if x["name"] == y["name"])]
t1 = time.time()
print("listcomp_wikiben: " + str(t1 - t0))
t0 = time.time()
itertools_MrGeek = [min(g[1], key=lambda x: x['price']) for g in groupby(sorted_orders, lambda o: o['name'])]
t1 = time.time()
print("itertools_MrGeek: " + str(t1 - t0 + sorting_time))
t0 = time.time()
itertools_Cory = [min(g, key=lambda j: j["price"]) for k,g in groupby(sorted_orders, key=lambda i: i["name"])]
t1 = time.time()
print("itertools_CoryKramer: " + str(t1 - t0 + sorting_time))
t0 = time.time()
pandas_Trenton = pd.DataFrame(orders)
pandas_Trenton.groupby(['name'])['price'].min()
t1 = time.time()
print("pandas_Trenton_M: " + str(t1 - t0))
结果是:
Number of dictionaries in orders: 20867
listcomp_wikiben: 39.78123s
itertools_MrGeek: 0.01562s
itertools_CoryKramer: 0.01565s
pandas_Trenton_M: 0.29685s
如果您首先按 "name"
对列表进行排序,您可以使用 itertools.groupby
对它们进行分组,然后使用 min
和 lambda 来找到最小值 "price"
每组。
>>> from itertools import groupby
>>> sorted_orders = sorted(orders, key=lambda i: i["name"])
>>> [min(g, key=lambda j: j["price"]) for k,g in groupby(sorted_orders , key=lambda i: i["name"])]
[{'name': 'v', 'price': 123, 'location': 'Mars'},
{'name': 'x', 'price': 124, 'location': 'Mars'},
{'name': 'y', 'price': 456, 'location': 'Mars'},
{'name': 'z', 'price': 123, 'location': 'Mars'}]
您可以使用 itertools.groupby
:
from itertools import groupby
print(
[
min(g[1], key=lambda x: x['price'])
for g in groupby(sorted(orders, key=lambda o: o['name']), lambda o: o['name'])
]
)
输出:
[
{'name': 'v', 'price': 123, 'location': 'Mars'},
{'name': 'x', 'price': 124, 'location': 'Mars'},
{'name': 'y', 'price': 456, 'location': 'Mars'},
{'name': 'z', 'price': 123, 'location': 'Mars'}
]
没有 itertools 的解决方案
[x for x in orders if all(x["price"] <= y["price"] for y in orders if x["name"] == y["name"])]
使用pandas
:
orders = [{"name": "v", "price": 123, "location": "Mars"},
{"name": "x", "price": 223, "location": "Mars"},
{"name": "x", "price": 124, "location": "Mars"},
{"name": "y", "price": 456, "location": "Mars"},
{"name": "z", "price": 123, "location": "Pluto"},
{"name": "z", "price": 5623, "location": "Mars"}]
import pandas as pd
df = pd.DataFrame(orders)
df.groupby(['name', 'location'])['price'].min()