使用 groupbyvars 作为 cols / 索引获取 Pandas.groupby.shift() 结果?
Getting Pandas.groupby.shift() results with groupbyvars as cols / index?
给定这个简单的数据集
df = pd.DataFrame({'one': ['a', 'a', 'a', 'b', 'b', 'b'],
'two': ['c', 'c', 'c', 'c', 'd', 'd'],
'three': [1, 2, 3, 4, 5, 6]})
在 one
/ two
上分组并应用 .max()
returns 我按 groupby vars 索引的系列,正如预期的那样...
df.groupby(['one', 'two'])['three'].max()
输出:
one two
a c 3
b c 4
d 6
Name: three, dtype: int64
...在我的例子中,我想 shift()
我的记录,按组。但出于某种原因,当我将 .shift()
应用于 groupby 对象时,我的结果不包括 groupby 变量:
输出:
df.groupby(['one', 'two'])['three'].shift()
0 NaN
1 1.0
2 2.0
3 NaN
4 NaN
5 5.0
Name: three, dtype: float64
有没有办法在结果中保留这些 groupby 变量,作为列或多索引系列(如 .max()
)?谢谢!
max
和 diff
之间的差异 - max
聚合值(return 聚合 Series
)和 diff
不是 - return同样大小Series
.
因此可以将输出附加到新列:
df['shifted'] = df.groupby(['one', 'two'])['three'].shift()
理论上可以使用 agg
,但 return 错误 pandas 0.20.3
:
df1 = df.groupby(['one', 'two'])['three'].agg(['max', lambda x: x.shift()])
print (df1)
ValueError: Function does not reduce
一个可能的解决方案是 transform
如果需要 max
和 diff
:
g = df.groupby(['one', 'two'])['three']
df['max'] = g.transform('max')
df['shifted'] = g.shift()
print (df)
one three two max shifted
0 a 1 c 3 NaN
1 a 2 c 3 1.0
2 a 3 c 3 2.0
3 b 4 c 4 NaN
4 b 5 d 6 NaN
5 b 6 d 6 5.0
正如 Jez 所解释的那样,shift return 系列保持相同的数据帧长度,如果你像 max()
那样分配它,将会得到错误
Function does not reduce
df.assign(shifted=df.groupby(['one', 'two'])['three'].shift()).set_index(['one','two'])
Out[57]:
three shifted
one two
a c 1 NaN
c 2 1.0
c 3 2.0
b c 4 NaN
d 5 NaN
d 6 5.0
使用 max
作为键,shift
值切片值最大行
df.groupby(['one', 'two'])['three'].apply(lambda x : x.shift()[x==x.max()])
Out[58]:
one two
a c 2 2.0
b c 3 NaN
d 5 5.0
Name: three, dtype: float64
给定这个简单的数据集
df = pd.DataFrame({'one': ['a', 'a', 'a', 'b', 'b', 'b'],
'two': ['c', 'c', 'c', 'c', 'd', 'd'],
'three': [1, 2, 3, 4, 5, 6]})
在 one
/ two
上分组并应用 .max()
returns 我按 groupby vars 索引的系列,正如预期的那样...
df.groupby(['one', 'two'])['three'].max()
输出:
one two
a c 3
b c 4
d 6
Name: three, dtype: int64
...在我的例子中,我想 shift()
我的记录,按组。但出于某种原因,当我将 .shift()
应用于 groupby 对象时,我的结果不包括 groupby 变量:
输出:
df.groupby(['one', 'two'])['three'].shift()
0 NaN
1 1.0
2 2.0
3 NaN
4 NaN
5 5.0
Name: three, dtype: float64
有没有办法在结果中保留这些 groupby 变量,作为列或多索引系列(如 .max()
)?谢谢!
max
和 diff
之间的差异 - max
聚合值(return 聚合 Series
)和 diff
不是 - return同样大小Series
.
因此可以将输出附加到新列:
df['shifted'] = df.groupby(['one', 'two'])['three'].shift()
理论上可以使用 agg
,但 return 错误 pandas 0.20.3
:
df1 = df.groupby(['one', 'two'])['three'].agg(['max', lambda x: x.shift()])
print (df1)
ValueError: Function does not reduce
一个可能的解决方案是 transform
如果需要 max
和 diff
:
g = df.groupby(['one', 'two'])['three']
df['max'] = g.transform('max')
df['shifted'] = g.shift()
print (df)
one three two max shifted
0 a 1 c 3 NaN
1 a 2 c 3 1.0
2 a 3 c 3 2.0
3 b 4 c 4 NaN
4 b 5 d 6 NaN
5 b 6 d 6 5.0
正如 Jez 所解释的那样,shift return 系列保持相同的数据帧长度,如果你像 max()
那样分配它,将会得到错误
Function does not reduce
df.assign(shifted=df.groupby(['one', 'two'])['three'].shift()).set_index(['one','two'])
Out[57]:
three shifted
one two
a c 1 NaN
c 2 1.0
c 3 2.0
b c 4 NaN
d 5 NaN
d 6 5.0
使用 max
作为键,shift
值切片值最大行
df.groupby(['one', 'two'])['three'].apply(lambda x : x.shift()[x==x.max()])
Out[58]:
one two
a c 2 2.0
b c 3 NaN
d 5 5.0
Name: three, dtype: float64