使用 groupbyvars 作为 cols / 索引获取 Pandas.groupby.shift() 结果？

Question

给定这个简单的数据集

df = pd.DataFrame({'one':   ['a', 'a', 'a', 'b', 'b', 'b'],
                   'two':   ['c', 'c', 'c', 'c', 'd', 'd'],
                   'three': [1,   2,    3,   4,   5,   6]})

在 one / two 上分组并应用 .max() returns 我按 groupby vars 索引的系列，正如预期的那样...

df.groupby(['one', 'two'])['three'].max()

输出：

one  two
a    c      3
b    c      4
     d      6
Name: three, dtype: int64

...在我的例子中，我想 shift() 我的记录，按组。但出于某种原因，当我将 .shift() 应用于 groupby 对象时，我的结果不包括 groupby 变量：

输出：

df.groupby(['one', 'two'])['three'].shift()
0    NaN
1    1.0
2    2.0
3    NaN
4    NaN
5    5.0
Name: three, dtype: float64

有没有办法在结果中保留这些 groupby 变量，作为列或多索引系列（如 .max()）？谢谢！

Answer 1

max 和 diff 之间的差异 - max 聚合值（return 聚合 Series）和 diff 不是 - return同样大小Series.

因此可以将输出附加到新列：

df['shifted'] = df.groupby(['one', 'two'])['three'].shift()

理论上可以使用 agg，但 return 错误 pandas 0.20.3:

df1 = df.groupby(['one', 'two'])['three'].agg(['max', lambda x: x.shift()])
print (df1)

ValueError: Function does not reduce

一个可能的解决方案是 transform 如果需要 max 和 diff:

g = df.groupby(['one', 'two'])['three']
df['max'] = g.transform('max')
df['shifted'] = g.shift()
print (df)
  one  three two  max  shifted
0   a      1   c    3      NaN
1   a      2   c    3      1.0
2   a      3   c    3      2.0
3   b      4   c    4      NaN
4   b      5   d    6      NaN
5   b      6   d    6      5.0

Answer 2

正如 Jez 所解释的那样，shift return 系列保持相同的数据帧长度，如果你像 max() 那样分配它，将会得到错误

Function does not reduce

df.assign(shifted=df.groupby(['one', 'two'])['three'].shift()).set_index(['one','two'])
Out[57]: 
         three  shifted
one two                
a   c        1      NaN
    c        2      1.0
    c        3      2.0
b   c        4      NaN
    d        5      NaN
    d        6      5.0

使用 max 作为键，shift 值切片值最大行

df.groupby(['one', 'two'])['three'].apply(lambda x : x.shift()[x==x.max()])
Out[58]: 
one  two   
a    c    2    2.0
b    c    3    NaN
     d    5    5.0
Name: three, dtype: float64

使用 groupbyvars 作为 cols / 索引获取 Pandas.groupby.shift() 结果？

Getting Pandas.groupby.shift() results with groupbyvars as cols / index?

python

pandas

pandas-groupby