Pandas:摘要数据框中的多个总计

Pandas: Multiple Grand Totals in a Summary Dataframe

对于我尝试学习时出现的菜鸟问题表示歉意Python。期待加快速度并回馈社会

假设我有以下数据,

YEAR         SECTOR    PROFIT   STARTMVYEAR TOTALPROFIT STARTMV
IBM         TECHNOLOGY  -500    2500        500         1500
APPLE       TECHNOLOGY   800    4000        300         4500
GM          INDUSTRIAL   250    1000          0         1250
CHRYSLER    INDUSTRIAL   600    3000        100         3500

我想创建如下所示的摘要

SECTOR      PROFITYEAR  TOTALPROFIT
TECHNOLOGY     .046       .133
INDUSTRIAL     .213       .021

对于每个组,我们有 sum(PROFIT)/sum(STARTMVYEAR)sum(TOTALPROFIT)/sum(STARTMV)

如果我只想为第一个基准测试做,我可以做

by_profit_totals =(df.groupby(['SECTOR'])['PROFIT'].sum()/by_first_count.groupby(['SECTOR'])['STARTMVYEAR'].sum())

但是我该怎么做呢?另外,是否有我可以使用的简单函数,例如 profit 和 startmvyear 以及 returns 汇总值?

您可以使用 groupby with aggregating cython optimized sum and then div by numpy array created by values:

g = df.groupby('SECTOR').sum()
print (g[['PROFIT','TOTALPROFIT']].div( g[['STARTMVYEAR','STARTMV']].values).reset_index())
       SECTOR    PROFIT  TOTALPROFIT
0  INDUSTRIAL  0.212500     0.021053
1  TECHNOLOGY  0.046154     0.133333