Python Pandas if语句基于group by sum
Python Pandas if statement based on group by sum
使用这个 python pandas 数据框 df:
CategoryA | CategoryB | Count
1 A 0
1 A -1
2 B 1
2 B 1
3 C 1
3 C -1
我基本上想标记为删除,所有 CategoryA/B 的总和小于 0 的分组。
df['decision'] = np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].sum()>0, 'keep', 'delete')
但是我得到这个错误ValueError:值的长度与索引的长度不匹配
输出将是:
CategoryA | CategoryB | Count | decision
1 A 0 delete
1 A -1 delete
2 B 1 keep
2 B 1 keep
3 C 1 delete
3 C -1 delete
更愿意使用 df.loc 执行此操作,但不确定如何。
In [67]: df['decision'] = \
np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].transform('sum') > 0,
'keep', 'delete')
In [68]: df
Out[68]:
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
您走在正确的轨道上。
m = df.groupby(['CategoryA', 'CategoryB']).transform('sum').gt(0)
df['decision'] = np.where(m, 'keep', 'delete')
df
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
使用transform
检索相同大小的结果。
df['decision']=df['CategoryB'].map(df.groupby('CategoryB')['Count'].\
apply(lambda x :np.where(x.sum()>0,'keep','delete')))
df
Out[573]:
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
使用这个 python pandas 数据框 df:
CategoryA | CategoryB | Count
1 A 0
1 A -1
2 B 1
2 B 1
3 C 1
3 C -1
我基本上想标记为删除,所有 CategoryA/B 的总和小于 0 的分组。
df['decision'] = np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].sum()>0, 'keep', 'delete')
但是我得到这个错误ValueError:值的长度与索引的长度不匹配
输出将是:
CategoryA | CategoryB | Count | decision
1 A 0 delete
1 A -1 delete
2 B 1 keep
2 B 1 keep
3 C 1 delete
3 C -1 delete
更愿意使用 df.loc 执行此操作,但不确定如何。
In [67]: df['decision'] = \
np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].transform('sum') > 0,
'keep', 'delete')
In [68]: df
Out[68]:
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
您走在正确的轨道上。
m = df.groupby(['CategoryA', 'CategoryB']).transform('sum').gt(0)
df['decision'] = np.where(m, 'keep', 'delete')
df
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
使用transform
检索相同大小的结果。
df['decision']=df['CategoryB'].map(df.groupby('CategoryB')['Count'].\
apply(lambda x :np.where(x.sum()>0,'keep','delete')))
df
Out[573]:
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete