在 Pandas pivot_table 中排序

Sorting in a Pandas pivot_table

我一直在四处寻找,试图弄清楚如何正确地对我的枢轴 table 进行排序,但我没有任何运气。

    client          unit    task                hours   month
0   A               DVADA   Account Management  6.50    January     
1   A               DVADA   Buying              1.25    January 
2   A               DVADA   Meeting / Call      0.50    January 
3   A               DVADA   Account Management  3.00    January 
4   A               DVADA   Billing             2.50    February    
5   A               DVADA   Account Management  6.50    February        
6   A               DVADA   Buying              1.25    February    
7   A               DVADA   Meeting / Call      0.50    February    
8   A               DVADA   Account Management  3.00    February    
9   A               DVADA   Billing             2.50    February
10  A               DVADA   Billing             2.50    December    
11  A               DVADA   Account Management  6.50    December        
12  A               DVADA   Buying              1.25    December    
13  A               DVADA   Meeting / Call      0.50    December    
14  A               DVADA   Account Management  3.00    December    
15  A               DVADA   Billing             2.50    December
16  A               DVADA   Account Management  6.50    August      
17  A               DVADA   Buying              1.25    August  
18  A               DVADA   Meeting / Call      0.50    August  
19  A               DVADA   Account Management  3.00    August
20  A               DVADA   Account Management  6.50    April       
21  A               DVADA   Buying              1.25    April   
22  A               DVADA   Meeting / Call      0.50    April   
23  A               DVADA   Account Management  3.00    April
24  B               DVADA   Account Management  6.50    January     
25  B               DVADA   Buying              1.25    January 
26  B               DVADA   Meeting / Call      0.50    January 
27  B               DVADA   Account Management  3.00    January 
28  B               DVADA   Billing             2.50    February    
29  B               DVADA   Account Management  6.50    February        
30  B               DVADA   Buying              1.25    February    
31  B               DVADA   Meeting / Call      0.50    February    
32  B               DVADA   Account Management  3.00    February    
33  B               DVADA   Billing             2.50    February
34  B               DVADA   Billing             2.50    December    
35  B               DVADA   Account Management  6.50    December        
36  B               DVADA   Buying              1.25    December    
37  B               DVADA   Meeting / Call      0.50    December    
38  B               DVADA   Account Management  3.00    December    
39  B               DVADA   Billing             2.50    December
40  B               DVADA   Account Management  6.50    August      
41  B               DVADA   Buying              1.25    August  
42  B               DVADA   Meeting / Call      0.50    August  
43  B               DVADA   Account Management  3.00    August
44  B               DVADA   Account Management  6.50    April       
45  B               DVADA   Buying              1.25    April   
46  B               DVADA   Meeting / Call      0.50    April   
47  C               DVADA   Account Management  3.00    April
48  C               DVADA   Account Management  6.50    January     
49  C               DVADA   Buying              1.25    January 
50  C               DVADA   Meeting / Call      0.50    January 
51  C               DVADA   Account Management  3.00    January 
52  C               DVADA   Billing             2.50    February    
53  C               DVADA   Account Management  6.50    February        
54  C               DVADA   Buying              1.25    February    
55  C               DVADA   Meeting / Call      0.50    February    
56  C               DVADA   Account Management  3.00    February    
57  C               DVADA   Billing             2.50    February
58  C               DVADA   Billing             2.50    December    
59  C               DVADA   Account Management  6.50    December        
60  C               DVADA   Buying              1.25    December    
61  C               DVADA   Meeting / Call      0.50    December    
62  C               DVADA   Account Management  3.00    December    
63  C               DVADA   Billing             2.50    December
64  C               DVADA   Account Management  6.50    August      
65  C               DVADA   Buying              1.25    August  
66  C               DVADA   Meeting / Call      0.50    August  
67  C               DVADA   Account Management  3.00    August
68  C               DVADA   Account Management  6.50    April       
69  C               DVADA   Buying              1.25    April   
70  C               DVADA   Meeting / Call      0.50    April   
71  C               DVADA   Account Management  3.00    April

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

其中 returns 一个具有三列(客户、月份、小时)的数据透视表 table。每个客户有 12 个月(1 月至 12 月),每个月都有一个小时。

                        hours
client          month

A               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

B               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

C               April   203.50
                August  227.75
                December 159.75
                February 203.25
                January 199.25

我想按月份对这个数据透视表 table 进行排序,但保留客户列。

                           hours
client           month

A               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

B               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

C               January 203.50
                February 227.75
                March    159.75
                April    203.25
                May     199.90

排序问题已通过 Scott 的以下回答得到解决。现在我想为每个客户添加一行,其中包含使用的总小时数。

                           hours
client           month

A               January    203.50
                February   227.75
                March      159.75
                April      203.25
                May        199.90
                Total     1000.34

B               January    203.50
                February   227.75
                March      159.75
                April      203.25
                May       199.90
                Total     1000.34

C               January   203.50
                February   227.75
                March      159.75
                April      203.25
                May       199.90
                Total     1000.34

任何帮助将不胜感激

更新以在每个客户末尾添加总计

vp_clients['month'] = pd.Categorical(vp_clients['month'], 
                                     ordered=True, 
                                     categories=['January','February','March',
                                                 'April','May','June','July',
                                                 'August','September','October',
                                                 'November','December','Total'])

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

df = df.dropna()

pd.concat([df,df.sum(level=0).assign(month='Total').set_index('month', append=True)]).sort_index()

输出:

                 hours
client month          
A      January   11.25
       February  16.25
       April     11.25
       August    11.25
       December  16.25
       Total     66.25
B      January   11.25
       February  16.25
       April      8.25
       August    11.25
       December  16.25
       Total     63.25
C      January   11.25
       February  16.25
       April     14.25
       August    11.25
       December  16.25
       Total     69.25

让我们使用 pd.Categorical:

vp_clients['month'] = pd.Categorical(vp_clients['month'], 
                                     ordered=True, 
                                     categories=['January','February','March',
                                                 'April','May','June','July',
                                                 'August','September','October',
                                                 'November','December'])

df = pd.pivot_table(vp_clients, values='hours', index=['client', 'month'], aggfunc=sum)

df.dropna()

输出:

                 hours
client month          
A      January   11.25
       February  16.25
       April     11.25
       August    11.25
       December  16.25
B      January   11.25
       February  16.25
       April      8.25
       August    11.25
       December  16.25
C      January   11.25
       February  16.25
       April     14.25
       August    11.25
       December  16.25

此外,如评论所述,由于您没有将值旋转到宽格式的新列,请考虑简单地使用 groupby()。 re-consider reindex() 用于自定义 January-December 订单,指定 级别 并与 python 的 built-in calendar模块。

import calendar
...

grp_df = df.groupby(['client', 'month']).agg({'hours': 'sum'})\
           .reindex(level=1, labels=calendar.month_name)

#                  hours
# client month          
# A      January   11.25
#        February  16.25
#        April     11.25
#        August    11.25
#        December  16.25
# B      January   11.25
#        February  16.25
#        April      8.25
#        August    11.25
#        December  16.25
# C      January   11.25
#        February  16.25
#        April     14.25
#        August    11.25
#        December  16.25