Python 函数根据嵌套字典结构中的单个属性查找 min/max

Question

以下数据表示：

[
 {u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
 {u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
 {u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},
 {u'0x79dcc6ab82b2024a0d4135d4fa3a5cd62ab740f28fffa3fc4dfdb8b00430baab': u'158971'},
 {u'0x034c9e7f28f136188ebb2a2630c26183b3df90c387490159b411cf7326764341': u'21000'},
 {u'0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e': u'1000000'},
 {u'0x90ca439b7daa648fafee829d145adefa1dc17c064f43db77f573da873b641f19': u'90000'},
 {u'0x7cba9f140ab0b3ec360e0a55c06f75b51c83b2e97662736523c26259a730007f': u'40000'},
 {u'0x92dedff7dab405220c473aefd12e2e41d260d2dff7816c26005f78d92254aba2': u'21000'},
 {u'0x0abe75e40a954d4d355e25e4498f3580e7d029769897d4187c323080a0be0fdd': u'21000'},
 {u'0x22c2b6490900b21d67ca56066e127fa57c0af973b5d166ca1a4bf52fcb6cf81c': u'90000'},
 {u'0x8570106b0385caf729a17593326db1afe0d75e3f8c6daef25cd4a0499a873a6f': u'90000'},
 {u'0x8adfe7fc3cf0eb34bb56c59fa3dc4fdd3ec3f3514c0100fef800f065219b7707': u'40000'},
 {u'0x8b0fe2b7727664a14406e7377732caed94315b026b37577e2d9d258253067553': u'21000'},
 {u'0x244b29b60c696f4ab07c36342344fe6116890f8056b4abc9f734f7a197c93341': u'50000'},
 {u'0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63': u'121000'},
 {u'0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26': u'121000'}
]

从此循环生成：

dict_hash_gas = list()
for line in inpt:
    resource = json.loads(line)
    dict_hash_gas.append({resource['first']:resource['second']})

根据看起来的数据，或多或少，像这样：

{"first":"A","second":"1","third":"2"} 
{"first":"B","second":"1","third":"2"} 
{"first":"C","second":"2","third":"2"} 
{"first":"D","second":"3","third":"2"} 
{"first":"E","second":"3","third":"2"} 
{"first":"F","second":"3","third":"2"}

我试图在每个字典中找到第二个值的最大值，即

{"first":"A","second":"LOOKING_FOR_MAX"}

如何从那组嵌套字典中访问所有第二个值（看起来像 u'90000' 的值），记录并输出 max 和 min？

要精确定义术语：在上面的示例中，即：

{u'0xbd4f1cc0da707c5712651b659b86766ec6f25af5e388fc82474523339dd1da37': u'90000'},
{u'0x05a04a7bb2500087c14bc89eb6a49cd4c5afcac63270aff2d4508e610f606eed': u'40000'},
{u'0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5': u'21000'},

我想根据 u'90000'、u'40000' 和 u'21000' 进行搜索 - 这就是我所说的 "second" 值。

我想选择 max 将仅基于数字，所以在那种情况下 u'90000'。

编辑：

尝试按以下方式调用它时，我生成了下面重现的错误：

def _main():

    with open('transactions000000000029.json', 'rb') as inpt:
        dict_hash_gas = list()
        for line in inpt:
            resource = json.loads(line)
            dict_hash_gas.append({resource['hash']:resource['gas']})

    pairs = list(_as_pairs(dict_hash_gas))
    if pairs:
        # Avoid a ValueError from min() and max() if the list is empty.
        print(min(pairs, key=lambda pair: pair.value))
        print(max(pairs, key=lambda pair: pair.value))

Answer 1

您在这里只能使用字典吗？元组列表可能更易于使用：

dict_hash_gas = list()
for line in inpt:
    resource = json.loads(line)
    dict_hash_gas.append((resource['first'], resource['second']))

sorted_data = sorted(dict_hash_gas, key=lambda x: int(x[1]))
minimum = sorted_data[0]
maximum = sorted_data[-1]

产量： ('0xc3f68d46b9e462110e4897a41b573a10fef72747fd4c9e8413eb2e4cba0af9b5', '21000') 最小值和 ('0xffda7269775dcd710565c5e0289a2254c195e006f34cafc80c4a3c89f479606e', '1000000') 最大值

编辑以使用 collections.namedtuple 显示示例：

from collections import namedtuple

DataItem = namedtuple('DataItem', ['first', 'second'])

dict_hash_gas = list()
for line in inpt:
    resource = json.loads(line)
    dict_hash_gas.append(DataItem(resource['first'], resource['second']))

已排序(dict_hash_gas, key=lambda x: int(x.second))

Answer 2

一旦你的数据以一种易于处理的形式出现，它就是一个单行文件。在这种情况下，由于这些词典显然是某种记录，理想的数据类型是自定义 class 或 collections.namedtuple。我选择了 namedtuple，因为所有值都是原子的且不可变的。（此外，它还具有许多方便的功能，例如体面的 __str__ 和 __hash__ 方法，而且效率也更高。）

下面的所有工作都在 _as_pairs 中，它从令人沮丧的单项词典列表中生成不可变的键值对。它还转换字符串化整数 (value) 进入您希望的 actual 整数。之后，使用数据就很容易了。

import collections

# FIXME:  Use more descriptive names than "Pair", "key", and "value".
Pair = collections.namedtuple('Pair', ['key', 'value'])

def _as_pairs(pairs):
    for pair in pairs:
        # TODO:  Verify the dict conatains exactly one item?
        for k, v in pair.items():
            # Should the `key` string also be an integer?
            #yield Pair(key=int(k, base=16), value=int(v))
            yield Pair(key=k, value=int(v))

def _main():
    # Abbreviated below, but conatains same inputs as your example.
    dict_hash_gas = [
      ...,
      {u'0xffda...606e': u'1000000'},
      {u'0x90ca...1f19': u'90000'},
      ...,
      ]
    pairs = list(_as_pairs(dict_hash_gas))
    if pairs:
        # Avoid a ValueError from min() and max() if the list is empty.
        print(min(pairs, key=lambda pair: pair.value))
        print(max(pairs, key=lambda pair: pair.value))

if '__main__' == __name__:
    _main()

输出（Python 3）：

Pair(key='0xc3f6...f9b5', value=21000)
Pair(key='0xffda...606e', value=1000000)

我在评论中提出了一些建议：

这些词典中只有一项很重要吗？
那些十六进制字符串应该（我称之为 id）也可以转换成整数？

我不知道你用这个做什么，所以我无法回答这两个问题。

Python 函数根据嵌套字典结构中的单个属性查找 min/max

Python function to find the min/max based on single attribute from a nested dictionary structure

python

dictionary

nested

max

min