将具有唯一键的字典添加到没有唯一键的DataFrame

Question

我正在尝试使用 GroupBy 对 DataFrame 进行描述性统计，并将这些值放回 DataFrame。

我的 DataFrame 包含一个非唯一的运行数字（匿名），以及一些与每个人相关的值。

例如：

RunNr    Value
1        126
1        158
1        18
2        65
3        31   
3        4

通过使用 GroupBy，我可以计算每个人的描述性统计数据（运行个数字），例如标准偏差。我想将这些添加回 DataFrame 以进行进一步处理（例如在 Word 中制作报告）。

结果应如下所示：

RunNr    Value    Std
1        126      59,9
1        158      59,9
1        18       59,9
2        65       Nan
3        31       13,5
3        4        13,5

我想出的最佳解决方案是计算标准偏差（和其他统计数据），将它们放入字典中，其中运行数字是键，值是标准偏差。

我现在有一本字典，字典中的运行数字是唯一键，而它不在 DataFrame 中。我的下一步是遍历字典，并使用 .loc() 将相应的值插入到正确的行中：

for key, value in self.dict_of_std:
    self.internal_main_df.loc[self.internal_main_df.Fnr == key] = value

我收到这个错误：

TypeError: cannot unpack non-iterable float object

对改进我的代码或我的整体方法的建议表示赞赏。

Answer 1

如果每组需要用 std 填充一列，请使用 GroupBy.transform 和一个聚合函数，此处 std:

df['Std'] = df.groupby('RunNr')['Value'].transform('std')
print (df)
   RunNr  Value        Std
0      1    126  73.357572
1      1    158  73.357572
2      1     18  73.357572
3      2     65        NaN
4      3     31  19.091883
5      3      4  19.091883

如果需要更多统计信息，可以对新列使用 DataFrameGroupBy.describe with DataFrame.join：

df1 = df.join(df.groupby('RunNr')['Value'].describe(), on='RunNr')
print (df1)
   RunNr  Value  count        mean        std   min    25%    50%     75%  \
0      1    126    3.0  100.666667  73.357572  18.0  72.00  126.0  142.00   
1      1    158    3.0  100.666667  73.357572  18.0  72.00  126.0  142.00   
2      1     18    3.0  100.666667  73.357572  18.0  72.00  126.0  142.00   
3      2     65    1.0   65.000000        NaN  65.0  65.00   65.0   65.00   
4      3     31    2.0   17.500000  19.091883   4.0  10.75   17.5   24.25   
5      3      4    2.0   17.500000  19.091883   4.0  10.75   17.5   24.25   

     max  
0  158.0  
1  158.0  
2  158.0  
3   65.0  
4   31.0  
5   31.0

或者可以在 GroupBy.agg 函数中指定聚合函数：

df2 = df.join(df.groupby('RunNr')['Value'].agg(['mean','max','std']),  on='RunNr')
print (df2)
   RunNr  Value        mean  max        std
0      1    126  100.666667  158  73.357572
1      1    158  100.666667  158  73.357572
2      1     18  100.666667  158  73.357572
3      2     65   65.000000   65        NaN
4      3     31   17.500000   31  19.091883
5      3      4   17.500000   31  19.091883

将具有唯一键的字典添加到没有唯一键的DataFrame

Adding dictionary with unique keys to DataFrame without unique keys

statistics

dataframe

pandas

data-science

python-3.7