字典中值的平均值

Question

我有一本名为 model_scores_for_datasets 的字典，它看起来像这样：

{'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}
{'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}

我想获取词典列表中每个词典的平均值。共有 4 个“未处理标准化正常化重新缩放" 每个指标共有 8 个指标，如下所示：

{'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}

所以 4 个量表中的每一个都有 8 个不同的 ML altos，我想得到一个平均值，例如平均“标准化”得分最高，因此它将在机器学习过程中使用。

这是代码，但它给我一个错误：TypeError: can't convert type 'str' to numerator/denominator


avgDict = model_scores_for_datasets
for st,vals in avgDict.items():
    print(st,(vals))
    #print (st)
    for st,vals in avgDict.items():
        print("Average for {} is {}".format(st,mean(vals)))

Answer 1

首先你必须转换为正确的类型：

avgDict = model_scores_for_datasets
#conversion
avgDict=dict(zip(avgDict.keys(),list(map(float,avgDict.keys())))

for st,vals in avgDict.items():
    print(st,(vals))
    #print (st)
    for st,vals in avgDict.items():
        print("Average for {} is {}".format(st,mean(vals)))

输出：

Average for Logistic Regression is 0.967
Average for Support Vector Machine is 0.967
Average for Decision Tree is 0.933
Average for Random Forest is 0.933
Average for LinearDiscriminant is 1.0
Average for K-Nearest Neighbour is 1.0
Average for Naive Bayes is 0.967
Average for XGBoost is 0.933

Answer 2

import numpy as np
for mode in results.keys():
    mean = np.mean([float(value) for value in results[mode].values()])
    print(f"{mode}: {mean}")

输出：

Unprocessed: 0.9624999999999999
Standardisation: 0.9542499999999999
Normalisation: 0.9584999999999999
Rescale: 0.9542499999999999

对于 PythonCrazy

print({mode: np.mean([float(value) for value in results[mode].values()]) for mode in results.keys()})

Answer 3

一个easy-to-read解决方案是：

data = {'Unprocessed': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '1.000', 'K-Nearest Neighbour': '1.000', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Standardisation': {'Logistic Regression': '0.933', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Normalisation': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.967', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}, 'Rescale': {'Logistic Regression': '0.967', 'Support Vector Machine': '0.967', 'Decision Tree': '0.933', 'Random Forest': '0.933', 'LinearDiscriminant': '0.967', 'K-Nearest Neighbour': '0.967', 'Naive Bayes': '0.967', 'XGBoost': '0.933'}}

dicts = list(data.keys())
keys = list(data['Unprocessed'].keys())

r = {}
for k in keys:
    r[k] = sum([float(data[d][k]) for d in dicts])/len(dicts)
    
print(r)
#{'Logistic Regression': 0.9585, 'Support Vector Machine': 0.967, 'Decision Tree': 0.933, 'Random Forest': 0.95, 'LinearDiscriminant': 0.9752500000000001, 'K-Nearest Neighbour': 0.9752500000000001, 'Naive Bayes': 0.967, 'XGBoost': 0.933}

同理，如果要按字典求平均：

r2 = {}
for d in dicts:
    r2[d] = sum([float(data[d][k]) for k in keys])/len(keys)
    
print(r2)
#{'Unprocessed': 0.9624999999999999, 'Standardisation': 0.9542499999999999, 'Normalisation': 0.9584999999999999, 'Rescale': 0.9542499999999998}

字典中值的平均值

Average of the values in a dictionary

python

dictionary