Pandas Multiindex 中的自定义排序行
Pandas Custom Sort Row in Multiindex
鉴于以下情况:
import pandas as pd
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
s
first second
bar total 0.334158
two -0.267854
one 1.161727
baz two -0.748685
four -0.888634
total 0.383310
five 0.506120
dtype: float64
如何确保 'total' 行(根据第二个索引)始终像这样位于每个组的底部?:
first second
bar one 0.210911
two 0.628357
total -0.911331
baz two 0.315396
four -0.195451
five 0.060159
total 0.638313
dtype: float64
解决方案 1
我对此不满意。我正在研究不同的解决方案
unstacked = s.unstack(0)
total = unstacked.loc['total']
unstacked.drop('total').append(total).unstack().dropna()
first second
bar one 1.682996
two 0.343783
total 1.287503
baz five 0.360170
four 1.113498
two 0.083691
total -0.377132
dtype: float64
解决方案 2
我对这个感觉好多了
second = pd.Categorical(
s.index.levels[1].values,
categories=['one', 'two', 'three', 'four', 'five', 'total'],
ordered=True
)
s.index.set_levels(second, level='second', inplace=True)
cols = s.index.names
s.reset_index().sort_values(cols).set_index(cols)
0
first second
bar one 1.682996
two 0.343783
total 1.287503
baz two 0.083691
four 1.113498
five 0.360170
total -0.377132
unstack
for creating DataFrame
with columns with second level of MultiIndex
, then reorder columns for total
to last column and last use ordered CategoricalIndex
.
所以如果 stack
级别 total
是最后一个。
np.random.seed(123)
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
print (s)
first second
bar total -1.085631
two 0.997345
one 0.282978
baz two -1.506295
four -0.578600
total 1.651437
five -2.426679
dtype: float64
df = s.unstack()
df = df[df.columns[df.columns != 'total'].tolist() + ['total']]
df.columns = pd.CategoricalIndex(df.columns, ordered=True)
print (df)
second five four one two total
first
bar NaN NaN 0.282978 0.997345 -1.085631
baz -2.426679 -0.5786 NaN -1.506295 1.651437
s1 = df.stack()
print (s1)
first second
bar one 0.282978
two 0.997345
total -1.085631
baz five -2.426679
four -0.578600
two -1.506295
total 1.651437
dtype: float64
print (s1.sort_index())
first second
bar one 0.282978
two 0.997345
total -1.085631
baz five -2.426679
four -0.578600
two -1.506295
total 1.651437
dtype: float64
鉴于以下情况:
import pandas as pd
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
s
first second
bar total 0.334158
two -0.267854
one 1.161727
baz two -0.748685
four -0.888634
total 0.383310
five 0.506120
dtype: float64
如何确保 'total' 行(根据第二个索引)始终像这样位于每个组的底部?:
first second
bar one 0.210911
two 0.628357
total -0.911331
baz two 0.315396
four -0.195451
five 0.060159
total 0.638313
dtype: float64
解决方案 1
我对此不满意。我正在研究不同的解决方案
unstacked = s.unstack(0)
total = unstacked.loc['total']
unstacked.drop('total').append(total).unstack().dropna()
first second
bar one 1.682996
two 0.343783
total 1.287503
baz five 0.360170
four 1.113498
two 0.083691
total -0.377132
dtype: float64
解决方案 2
我对这个感觉好多了
second = pd.Categorical(
s.index.levels[1].values,
categories=['one', 'two', 'three', 'four', 'five', 'total'],
ordered=True
)
s.index.set_levels(second, level='second', inplace=True)
cols = s.index.names
s.reset_index().sort_values(cols).set_index(cols)
0
first second
bar one 1.682996
two 0.343783
total 1.287503
baz two 0.083691
four 1.113498
five 0.360170
total -0.377132
unstack
for creating DataFrame
with columns with second level of MultiIndex
, then reorder columns for total
to last column and last use ordered CategoricalIndex
.
所以如果 stack
级别 total
是最后一个。
np.random.seed(123)
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
print (s)
first second
bar total -1.085631
two 0.997345
one 0.282978
baz two -1.506295
four -0.578600
total 1.651437
five -2.426679
dtype: float64
df = s.unstack()
df = df[df.columns[df.columns != 'total'].tolist() + ['total']]
df.columns = pd.CategoricalIndex(df.columns, ordered=True)
print (df)
second five four one two total
first
bar NaN NaN 0.282978 0.997345 -1.085631
baz -2.426679 -0.5786 NaN -1.506295 1.651437
s1 = df.stack()
print (s1)
first second
bar one 0.282978
two 0.997345
total -1.085631
baz five -2.426679
four -0.578600
two -1.506295
total 1.651437
dtype: float64
print (s1.sort_index())
first second
bar one 0.282978
two 0.997345
total -1.085631
baz five -2.426679
four -0.578600
two -1.506295
total 1.651437
dtype: float64