Python Pandas 基于条件和子集的计数函数
Python Pandas count function on condition and subset
我有一个这样的数据框
F_Class Product Packages
Apple Apple_A 1
Apple Apple_A 2
Apple Apple_A 1
Apple Apple_B 2
Bananas Banana_A n.a.
Bananas Banana_A n.a.
我想构建以下计数函数来计算数据框中的项目,如下所示。
- 函数应按子集计算
['F_Class','Product']
- 如果
df['Packages'] == 2
则增加+2
否则增加+1
结果应如下所示:
F_Class Product Packages Counter
Apple Apple_A 1 1
Apple Apple_A 2 3
Apple Apple_A 1 4
Apple Apple_B 2 2
Bananas Banana_A n.a. 1
Bananas Banana_A n.a. 2
如果需要按 Packages
个数字求和,请使用 DataFrameGroupBy.cumsum
并将缺失值替换为 1
:
df['Packages'] = pd.to_numeric(df['Packages'], errors='coerce')
df['Counter'] = (df.assign(Packages = df['Packages'].fillna(1).astype(int))
.groupby(['F_Class','Product'])['Packages'].cumsum())
print (df)
F_Class Product Packages Counter
0 Apple Apple_A 1.0 1
1 Apple Apple_A 2.0 3
2 Apple Apple_A 1.0 4
3 Apple Apple_B 2.0 2
4 Bananas Banana_A NaN 1
5 Bananas Banana_A NaN 2
详情:
print (df.assign(Packages = df['Packages'].fillna(1).astype(int)))
F_Class Product Packages
0 Apple Apple_A 1
1 Apple Apple_A 2
2 Apple Apple_A 1
3 Apple Apple_B 2
4 Bananas Banana_A 1
5 Bananas Banana_A 1
使用df.groupby()
together with df.transform()
如下:
df['Counter'] = (df.groupby(['F_Class','Product'])['Packages']
.transform(lambda x: x.eq('2').add(1).cumsum()))
print(df)
F_Class Product Packages Counter
0 Apple Apple_A 1 1
1 Apple Apple_A 2 3
2 Apple Apple_A 1 4
3 Apple Apple_B 2 2
4 Bananas Banana_A n.a. 1
5 Bananas Banana_A n.a. 2
如果您在 Packages
列中的值是整数而不是字符串,请将 '2'
修改为 2
:
df['Counter'] = (df.groupby(['F_Class','Product'])['Packages']
.transform(lambda x: x.eq(2).add(1).cumsum()))
我有一个这样的数据框
F_Class Product Packages
Apple Apple_A 1
Apple Apple_A 2
Apple Apple_A 1
Apple Apple_B 2
Bananas Banana_A n.a.
Bananas Banana_A n.a.
我想构建以下计数函数来计算数据框中的项目,如下所示。
- 函数应按子集计算
['F_Class','Product']
- 如果
df['Packages'] == 2
则增加+2
否则增加+1
结果应如下所示:
F_Class Product Packages Counter
Apple Apple_A 1 1
Apple Apple_A 2 3
Apple Apple_A 1 4
Apple Apple_B 2 2
Bananas Banana_A n.a. 1
Bananas Banana_A n.a. 2
如果需要按 Packages
个数字求和,请使用 DataFrameGroupBy.cumsum
并将缺失值替换为 1
:
df['Packages'] = pd.to_numeric(df['Packages'], errors='coerce')
df['Counter'] = (df.assign(Packages = df['Packages'].fillna(1).astype(int))
.groupby(['F_Class','Product'])['Packages'].cumsum())
print (df)
F_Class Product Packages Counter
0 Apple Apple_A 1.0 1
1 Apple Apple_A 2.0 3
2 Apple Apple_A 1.0 4
3 Apple Apple_B 2.0 2
4 Bananas Banana_A NaN 1
5 Bananas Banana_A NaN 2
详情:
print (df.assign(Packages = df['Packages'].fillna(1).astype(int)))
F_Class Product Packages
0 Apple Apple_A 1
1 Apple Apple_A 2
2 Apple Apple_A 1
3 Apple Apple_B 2
4 Bananas Banana_A 1
5 Bananas Banana_A 1
使用df.groupby()
together with df.transform()
如下:
df['Counter'] = (df.groupby(['F_Class','Product'])['Packages']
.transform(lambda x: x.eq('2').add(1).cumsum()))
print(df)
F_Class Product Packages Counter
0 Apple Apple_A 1 1
1 Apple Apple_A 2 3
2 Apple Apple_A 1 4
3 Apple Apple_B 2 2
4 Bananas Banana_A n.a. 1
5 Bananas Banana_A n.a. 2
如果您在 Packages
列中的值是整数而不是字符串,请将 '2'
修改为 2
:
df['Counter'] = (df.groupby(['F_Class','Product'])['Packages']
.transform(lambda x: x.eq(2).add(1).cumsum()))