Python 函数根据嵌套字典结构中的单个属性查找 min/max
Python function to find the min/max based on single attribute from a nested dictionary structure
以下数据表示:
[
{u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
{u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
{u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},
{u'0x79dcc6ab82b2024a0d4135d4fa3a5cd62ab740f28fffa3fc4dfdb8b00430baab': u'158971'},
{u'0x034c9e7f28f136188ebb2a2630c26183b3df90c387490159b411cf7326764341': u'21000'},
{u'0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e': u'1000000'},
{u'0x90ca439b7daa648fafee829d145adefa1dc17c064f43db77f573da873b641f19': u'90000'},
{u'0x7cba9f140ab0b3ec360e0a55c06f75b51c83b2e97662736523c26259a730007f': u'40000'},
{u'0x92dedff7dab405220c473aefd12e2e41d260d2dff7816c26005f78d92254aba2': u'21000'},
{u'0x0abe75e40a954d4d355e25e4498f3580e7d029769897d4187c323080a0be0fdd': u'21000'},
{u'0x22c2b6490900b21d67ca56066e127fa57c0af973b5d166ca1a4bf52fcb6cf81c': u'90000'},
{u'0x8570106b0385caf729a17593326db1afe0d75e3f8c6daef25cd4a0499a873a6f': u'90000'},
{u'0x8adfe7fc3cf0eb34bb56c59fa3dc4fdd3ec3f3514c0100fef800f065219b7707': u'40000'},
{u'0x8b0fe2b7727664a14406e7377732caed94315b026b37577e2d9d258253067553': u'21000'},
{u'0x244b29b60c696f4ab07c36342344fe6116890f8056b4abc9f734f7a197c93341': u'50000'},
{u'0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63': u'121000'},
{u'0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26': u'121000'}
]
从此循环生成:
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append({resource['first']:resource['second']})
根据看起来的数据,或多或少,像这样:
{"first":"A","second":"1","third":"2"}
{"first":"B","second":"1","third":"2"}
{"first":"C","second":"2","third":"2"}
{"first":"D","second":"3","third":"2"}
{"first":"E","second":"3","third":"2"}
{"first":"F","second":"3","third":"2"}
我试图在每个字典中找到第二个值的最大值,即
{"first":"A","second":"LOOKING_FOR_MAX"}
如何从那组嵌套字典中访问所有第二个值(看起来像 u'90000'
的值),记录并输出 max
和 min
?
要精确定义术语:在上面的示例中,即:
{u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
{u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
{u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},
我想根据 u'90000'
、u'40000'
和 u'21000'
进行搜索 - 这就是我所说的 "second" 值。
我想选择 max
将仅基于数字,所以在那种情况下 u'90000'
。
编辑:
尝试按以下方式调用它时,我生成了下面重现的错误:
def _main():
with open('transactions000000000029.json', 'rb') as inpt:
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append({resource['hash']:resource['gas']})
pairs = list(_as_pairs(dict_hash_gas))
if pairs:
# Avoid a ValueError from min() and max() if the list is empty.
print(min(pairs, key=lambda pair: pair.value))
print(max(pairs, key=lambda pair: pair.value))
您在这里只能使用字典吗?元组列表可能更易于使用:
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append((resource['first'], resource['second']))
sorted_data = sorted(dict_hash_gas, key=lambda x: int(x[1]))
minimum = sorted_data[0]
maximum = sorted_data[-1]
产量:
('0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5', '21000')
最小值
和
('0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e',
'1000000')
最大值
编辑以使用 collections.namedtuple
显示示例:
from collections import namedtuple
DataItem = namedtuple('DataItem', ['first', 'second'])
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append(DataItem(resource['first'], resource['second']))
已排序(dict_hash_gas, key=lambda x: int(x.second))
一旦你的数据以一种易于处理的形式出现,它就是一个单行文件。
在这种情况下,由于这些词典显然是某种记录,理想的数据类型是自定义 class 或
collections.namedtuple
。
我选择了 namedtuple
,因为所有值都是原子的且不可变的。
(此外,它还具有许多方便的功能,例如体面的 __str__
和 __hash__
方法,而且效率也更高。)
下面的所有工作都在 _as_pairs
中,它从令人沮丧的单项词典列表中生成不可变的键值对。
它还转换字符串化整数
(value
)
进入您希望的 actual 整数。
之后,使用数据就很容易了。
import collections
# FIXME: Use more descriptive names than "Pair", "key", and "value".
Pair = collections.namedtuple('Pair', ['key', 'value'])
def _as_pairs(pairs):
for pair in pairs:
# TODO: Verify the dict conatains exactly one item?
for k, v in pair.items():
# Should the `key` string also be an integer?
#yield Pair(key=int(k, base=16), value=int(v))
yield Pair(key=k, value=int(v))
def _main():
# Abbreviated below, but conatains same inputs as your example.
dict_hash_gas = [
...,
{u'0xffda...606e': u'1000000'},
{u'0x90ca...1f19': u'90000'},
...,
]
pairs = list(_as_pairs(dict_hash_gas))
if pairs:
# Avoid a ValueError from min() and max() if the list is empty.
print(min(pairs, key=lambda pair: pair.value))
print(max(pairs, key=lambda pair: pair.value))
if '__main__' == __name__:
_main()
输出(Python 3):
Pair(key='0xc3f6...f9b5', value=21000)
Pair(key='0xffda...606e', value=1000000)
我在评论中提出了一些建议:
这些词典中只有一项很重要吗?
那些十六进制字符串应该
(我称之为 id
)
也可以转换成整数?
我不知道你用这个做什么,所以我无法回答这两个问题。
以下数据表示:
[
{u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
{u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
{u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},
{u'0x79dcc6ab82b2024a0d4135d4fa3a5cd62ab740f28fffa3fc4dfdb8b00430baab': u'158971'},
{u'0x034c9e7f28f136188ebb2a2630c26183b3df90c387490159b411cf7326764341': u'21000'},
{u'0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e': u'1000000'},
{u'0x90ca439b7daa648fafee829d145adefa1dc17c064f43db77f573da873b641f19': u'90000'},
{u'0x7cba9f140ab0b3ec360e0a55c06f75b51c83b2e97662736523c26259a730007f': u'40000'},
{u'0x92dedff7dab405220c473aefd12e2e41d260d2dff7816c26005f78d92254aba2': u'21000'},
{u'0x0abe75e40a954d4d355e25e4498f3580e7d029769897d4187c323080a0be0fdd': u'21000'},
{u'0x22c2b6490900b21d67ca56066e127fa57c0af973b5d166ca1a4bf52fcb6cf81c': u'90000'},
{u'0x8570106b0385caf729a17593326db1afe0d75e3f8c6daef25cd4a0499a873a6f': u'90000'},
{u'0x8adfe7fc3cf0eb34bb56c59fa3dc4fdd3ec3f3514c0100fef800f065219b7707': u'40000'},
{u'0x8b0fe2b7727664a14406e7377732caed94315b026b37577e2d9d258253067553': u'21000'},
{u'0x244b29b60c696f4ab07c36342344fe6116890f8056b4abc9f734f7a197c93341': u'50000'},
{u'0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63': u'121000'},
{u'0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26': u'121000'}
]
从此循环生成:
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append({resource['first']:resource['second']})
根据看起来的数据,或多或少,像这样:
{"first":"A","second":"1","third":"2"}
{"first":"B","second":"1","third":"2"}
{"first":"C","second":"2","third":"2"}
{"first":"D","second":"3","third":"2"}
{"first":"E","second":"3","third":"2"}
{"first":"F","second":"3","third":"2"}
我试图在每个字典中找到第二个值的最大值,即
{"first":"A","second":"LOOKING_FOR_MAX"}
如何从那组嵌套字典中访问所有第二个值(看起来像 u'90000'
的值),记录并输出 max
和 min
?
要精确定义术语:在上面的示例中,即:
{u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
{u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
{u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},
我想根据 u'90000'
、u'40000'
和 u'21000'
进行搜索 - 这就是我所说的 "second" 值。
我想选择 max
将仅基于数字,所以在那种情况下 u'90000'
。
编辑:
尝试按以下方式调用它时,我生成了下面重现的错误:
def _main():
with open('transactions000000000029.json', 'rb') as inpt:
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append({resource['hash']:resource['gas']})
pairs = list(_as_pairs(dict_hash_gas))
if pairs:
# Avoid a ValueError from min() and max() if the list is empty.
print(min(pairs, key=lambda pair: pair.value))
print(max(pairs, key=lambda pair: pair.value))
您在这里只能使用字典吗?元组列表可能更易于使用:
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append((resource['first'], resource['second']))
sorted_data = sorted(dict_hash_gas, key=lambda x: int(x[1]))
minimum = sorted_data[0]
maximum = sorted_data[-1]
产量:
('0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5', '21000')
最小值
和
('0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e',
'1000000')
最大值
编辑以使用 collections.namedtuple
显示示例:
from collections import namedtuple
DataItem = namedtuple('DataItem', ['first', 'second'])
dict_hash_gas = list()
for line in inpt:
resource = json.loads(line)
dict_hash_gas.append(DataItem(resource['first'], resource['second']))
已排序(dict_hash_gas, key=lambda x: int(x.second))
一旦你的数据以一种易于处理的形式出现,它就是一个单行文件。
在这种情况下,由于这些词典显然是某种记录,理想的数据类型是自定义 class 或
collections.namedtuple
。
我选择了 namedtuple
,因为所有值都是原子的且不可变的。
(此外,它还具有许多方便的功能,例如体面的 __str__
和 __hash__
方法,而且效率也更高。)
下面的所有工作都在 _as_pairs
中,它从令人沮丧的单项词典列表中生成不可变的键值对。
它还转换字符串化整数
(value
)
进入您希望的 actual 整数。
之后,使用数据就很容易了。
import collections
# FIXME: Use more descriptive names than "Pair", "key", and "value".
Pair = collections.namedtuple('Pair', ['key', 'value'])
def _as_pairs(pairs):
for pair in pairs:
# TODO: Verify the dict conatains exactly one item?
for k, v in pair.items():
# Should the `key` string also be an integer?
#yield Pair(key=int(k, base=16), value=int(v))
yield Pair(key=k, value=int(v))
def _main():
# Abbreviated below, but conatains same inputs as your example.
dict_hash_gas = [
...,
{u'0xffda...606e': u'1000000'},
{u'0x90ca...1f19': u'90000'},
...,
]
pairs = list(_as_pairs(dict_hash_gas))
if pairs:
# Avoid a ValueError from min() and max() if the list is empty.
print(min(pairs, key=lambda pair: pair.value))
print(max(pairs, key=lambda pair: pair.value))
if '__main__' == __name__:
_main()
输出(Python 3):
Pair(key='0xc3f6...f9b5', value=21000)
Pair(key='0xffda...606e', value=1000000)
我在评论中提出了一些建议:
这些词典中只有一项很重要吗?
那些十六进制字符串应该 (我称之为
id
) 也可以转换成整数?
我不知道你用这个做什么,所以我无法回答这两个问题。