查找具有相同第一列的所有行的 Pandas、Python 中的最小值、最大值、平均值

Question

是否可以找到具有相同第一列的所有数据的最小值、最大值和平均值？

例如，对于第一列 1_204192587：

考虑从4到n的所有行和列
查找第 4+ 列中所有条目的最小值、最大值和平均值以及第一列中具有 **1_204192587** 值的所有行。

意思是，为下面显示的每个唯一开始值做某种描述数据。

 `In: data.groupby(["Start"]).groups.keys()

 out: dict_keys(['1_204192587', '1_204197200'])`

This is how data frame looks like

我试过了

df=data.groupby(["Start"]).describe()

但这不是我想要的。

我描述的时候也尝试着指定坐标轴，

data.apply.(pd.DataFrame.describe, axis=1)

但是我出错了。

期望的输出

unique key/first column value   MIN   MAX   AVG
 1_204192587                    *     *      *
 1_204197200                    *     *      *

我是初学者，谢谢大家的回复。

Answer 1

您可以使用以下内容：

df.loc[4:].describe()

df 是你的数据框
[4:] 选择第 5 行并在
.describe() 为您提供统计摘要（平均值、均值...）

您还可以添加 .transpose() 和结尾以获得您要求的输出。

如果你想将它分配给另一个变量（dataframe）

所以它看起来像：

new_df = df.loc[4:].describe().trasnpose()

Answer 2

我想你想比较每个组的所有数字列，所以将 Start 列转换为 index，然后 select 数字列 DataFrame.select_dtypes, reshape by DataFrame.stack and last use DataFrameGroupBy.describe 按索引：

    data = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'Start':list('aaabbb')
})
df1 = data.set_index("Start").select_dtypes(np.number).stack().groupby(level=0).describe() 
print (df1)
       count      mean       std  min   25%  50%   75%  max
Start                                                      
a       12.0  5.000000  2.256304  1.0  3.75  5.0  6.25  9.0
b       12.0  3.833333  2.516611  0.0  2.00  4.0  5.00  9.0

或通过GroupBy.agg指定聚合函数列表:

df2 = (data.set_index("Start")
           .select_dtypes(np.number)
           .stack()
           .groupby(level=0)
           .agg(['min','max','mean']))
print (df2)
       min  max      mean
Start                    
a        1    9  5.000000
b        0    9  3.833333

查找具有相同第一列的所有行的 Pandas、Python 中的最小值、最大值、平均值

Finding min, max, avg in Pandas, Python for all rows with the same first column

python

average

max

min

pandas