根据条件总结字典列表并删除几个键

Summing up list of dictionaries based on a condition and deleting few keys

我有一个带有动态键的字典列表(键是从代码生成的)如下:

l=[{"key1":1,"author":"test","year":"2011"},{"key2":5,"author":"test","year":"2012"},
{"key1":3,"author":"test1","year":"2012"},
{"key1":1,"author":"test","year":"2012"}]

现在,如果键相同,我想将第一个键值相加并将它们分组 finally.So,我的最终列表应该如下所示:

l=[{"key1":2,"author":"test","year":["2011","2012"]},{"key2":5,"author":"test","year":"2012"},{"key1":3,"author":"test1","year":"2012"}]

我试过pandas groupby 但是我不能用因为key是auto-generated.However,代码如下:

(pd.DataFrame(l)
   .groupby(['author', 'year'], as_index=False)
   .key1.sum()
   .to_dict('r'))

有什么更好的方法? 规则:

  1. 如果字典中的第一个键相同,并且其他键作者和年份保持不变,则将两个值相加
  2. 如果作者不同就不要加起来
  3. 如果作者相同但年份不同,则将年份分组 和 添加密钥

.groupby(['author', 'year'] 的结果上尝试 groupby-agg。除了 authoryear 之外的每个键都在单独的步骤中应用聚合。

df = pd.DataFrame(l)
df_gp = df.groupby(['author', 'year'], as_index=False).sum()

def agg_key(df, key):
    return df[df[key] != 0].groupby("author", as_index=False).agg({
        # collect the years
        "year": lambda sr: [str(el) for el in sr],
        # sum the key
        key: "sum",
    }).to_dict(orient="records")

# keys except group and author
keys = df.columns[~df.columns.isin(["author", "year"])]

# apply aggregation and flatten list of lists
ans = [el for key in keys for el in agg_key(df_gp, key)]

输出

print(ans)

[{'author': 'test', 'year': ['2011', '2012'], 'key1': 2.0},
 {'author': 'test1', 'year': ['2012'], 'key1': 3.0},
 {'author': 'test', 'year': ['2012'], 'key2': 5.0}]

为了类型一致性(推荐),单个“年”作为单元素 list 而不是 str 返回。

你最好使用更干净的数据结构,其中你的字典的第一个映射没有什么特别之处,并且第一个映射被分成例如'key':first_mapping_key'count':first_mapping_value.

从你的字典结构列表(其中“第一个键是特殊的”)做到这一点的一种方法是:

def transform(d):
    (k, v), *t = d.items()
    return dict(key=k, count=v, **dict(t))

lmod = [transform(d) for d in l]
lmod
# out:
[{'key': 'key1', 'count': 0, 'author': 'test', 'year': '2010'},
 {'key': 'key1', 'count': 1, 'author': 'test', 'year': '2011'},
 {'key': 'key2', 'count': 5, 'author': 'test', 'year': '2012'},
 {'key': 'key1', 'count': 3, 'author': 'test1', 'year': '2012'},
 {'key': 'key1', 'count': 1, 'author': 'test', 'year': '2012'}]

现在您可以随心所欲地轻松分组和聚合。例如:

(pd.DataFrame(lmod)
 .query('count != 0')
 .groupby(['key', 'author'])
 .agg({'count': sum, 'year': set})
)

第二个话题是不使用pandas如何分组聚合。这是一种使用第一原理(仅使用核心库函数)的方法:

def grp_key(d):
    return d['key'], d['author']

def expect_single(a):
    values = set(a)
    assert len(values) == 1
    return next(iter(values))

_funcdict = {
    'key': expect_single,
    'author': expect_single,
    'count': sum,
}
def agg(lod):
    keys = {k: 1 for d in lod for k in d}  # insertion-order union of all keys
    d = {k: _funcdict.get(k, set)(d.get(k) for d in lod) for k in keys}
    return d

申请:

out = [
    agg(list(g))
    for k, g in groupby(sorted([
        d for d in lmod if d['count'] != 0
    ], key=grp_key), key=grp_key)
]
out
# output:
[{'key': 'key1', 'count': 2, 'author': 'test', 'year': {'2011', '2012'}},
 {'key': 'key1', 'count': 3, 'author': 'test1', 'year': {'2012'}},
 {'key': 'key2', 'count': 5, 'author': 'test', 'year': {'2012'}}]