Pandas Multiindex 中的自定义排序行

Pandas Custom Sort Row in Multiindex

鉴于以下情况:

import pandas as pd
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
          ['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
s

first  second
bar    total     0.334158
       two      -0.267854
       one       1.161727
baz    two      -0.748685
       four     -0.888634
       total     0.383310
       five      0.506120
dtype: float64

如何确保 'total' 行(根据第二个索引)始终像这样位于每个组的底部?:

first  second
bar    one       0.210911
       two       0.628357
       total    -0.911331
baz    two       0.315396
       four     -0.195451
       five      0.060159
       total     0.638313
dtype: float64

解决方案 1

我对此不满意。我正在研究不同的解决方案

unstacked = s.unstack(0)
total = unstacked.loc['total']
unstacked.drop('total').append(total).unstack().dropna()

first  second
bar    one       1.682996
       two       0.343783
       total     1.287503
baz    five      0.360170
       four      1.113498
       two       0.083691
       total    -0.377132
dtype: float64

解决方案 2

我对这个感觉好多了

second = pd.Categorical(
    s.index.levels[1].values,
    categories=['one', 'two', 'three', 'four', 'five', 'total'],
    ordered=True
)
s.index.set_levels(second, level='second', inplace=True)

cols = s.index.names
s.reset_index().sort_values(cols).set_index(cols)

                     0
first second          
bar   one     1.682996
      two     0.343783
      total   1.287503
baz   two     0.083691
      four    1.113498
      five    0.360170
      total  -0.377132

unstack for creating DataFrame with columns with second level of MultiIndex, then reorder columns for total to last column and last use ordered CategoricalIndex.

所以如果 stack 级别 total 是最后一个。

np.random.seed(123)
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
          ['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
print (s)
first  second
bar    total    -1.085631
       two       0.997345
       one       0.282978
baz    two      -1.506295
       four     -0.578600
       total     1.651437
       five     -2.426679
dtype: float64
df = s.unstack()
df = df[df.columns[df.columns != 'total'].tolist() + ['total']]
df.columns = pd.CategoricalIndex(df.columns, ordered=True)
print (df)
second      five    four       one       two     total
first                                                 
bar          NaN     NaN  0.282978  0.997345 -1.085631
baz    -2.426679 -0.5786       NaN -1.506295  1.651437
s1 = df.stack()
print (s1)
first  second
bar    one       0.282978
       two       0.997345
       total    -1.085631
baz    five     -2.426679
       four     -0.578600
       two      -1.506295
       total     1.651437
dtype: float64

print (s1.sort_index())
first  second
bar    one       0.282978
       two       0.997345
       total    -1.085631
baz    five     -2.426679
       four     -0.578600
       two      -1.506295
       total     1.651437
dtype: float64