Pandas 将函数应用于列中的唯一值
Pandas apply function to unique values in column
我一直被 Pandas 问题困住,我似乎无法弄明白。
我有一个这样的数据框:
ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan
我想使用以下伪代码向此数据框添加一个新列
对于 ref 中的每个唯一值,如果 result = fail
,则对于具有相同 "ref" 值的所有后续行 new_column = no
。
新数据框应该是这样的。
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes
我设法做到了以下几点:
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes
这是通过df.loc
函数实现的。
但我需要函数应用于唯一值,而不是每一行。
我想你可以使用 transform
:
print (df)
ref value rule result new_column
0 a100 25 high pass NaN
1 a100 25 high fail NaN
2 a100 25 medium fail NaN
3 a100 25 medium pass NaN
4 a101 15 high fail NaN
5 a101 15 high pass NaN
6 a102 20 high pass NaN
df['new_column']=df.groupby('ref')['result']
.transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes
谢谢 for another solution with replace
:
df['new_column'] = df.groupby('ref')['result']
.transform(lambda L: (L == 'fail').any())
.replace({True: 'no', False: 'yes'})
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes
我一直被 Pandas 问题困住,我似乎无法弄明白。 我有一个这样的数据框:
ref, value, rule, result, new_column
a100, 25, high, fail, nan
a100, 25, high, pass, nan
a100, 25, medium, fail, nan
a100, 25, medium, pass, nan
a101, 15, high, fail, nan
a101, 15, high, pass, nan
a102, 20, high, pass, nan
我想使用以下伪代码向此数据框添加一个新列
对于 ref 中的每个唯一值,如果 result = fail
,则对于具有相同 "ref" 值的所有后续行 new_column = no
。
新数据框应该是这样的。
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, no
a100, 25, medium, fail, no
a100, 25, medium, pass, no
a101, 15, high, fail, no
a101, 15, high, pass, no
a102, 20, high, pass, yes
我设法做到了以下几点:
ref, value, rule, result, new_column
a100, 25, high, fail, no
a100, 25, high, pass, yes
这是通过df.loc
函数实现的。
但我需要函数应用于唯一值,而不是每一行。
我想你可以使用 transform
:
print (df)
ref value rule result new_column
0 a100 25 high pass NaN
1 a100 25 high fail NaN
2 a100 25 medium fail NaN
3 a100 25 medium pass NaN
4 a101 15 high fail NaN
5 a101 15 high pass NaN
6 a102 20 high pass NaN
df['new_column']=df.groupby('ref')['result']
.transform(lambda x: 'no' if ((x=='fail').any()) else 'yes')
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes
谢谢replace
:
df['new_column'] = df.groupby('ref')['result']
.transform(lambda L: (L == 'fail').any())
.replace({True: 'no', False: 'yes'})
print (df)
ref value rule result new_column
0 a100 25 high pass no
1 a100 25 high fail no
2 a100 25 medium fail no
3 a100 25 medium pass no
4 a101 15 high fail no
5 a101 15 high pass no
6 a102 20 high pass yes