如何使用数据框中其他 2 列的条件在数据框中附加第三列
How to append a third column in a dataframe using conditions from 2 other columns in a dataframe
我有以下数据框,
uuid variable value
AAS Highly_Active False
AAS Highly_Active True
SAP Highly_Active False
SAP Multiple_days True
YAS Highly_Active False
YAS Highly_Active False
YAS Busi_weekday False
而且我需要使用列 variable
和 value
中的值来定义第三列 Activity
并且我有以下经典的 python 代码来做,但是我的主数据帧大小为 121Mb,因此需要很长时间。任何 pandas 解决方案都很棒
def activity(row):
if row['variable'] == "Highly_Active" and row['value'] ==True:
val = "Highly_Active"
else:
val = "NO"
if row['variable'] == "Multiple_days" and row['value']==True:
val = "Multiple_days"
else:
val = "NO"
if row['variable'] == "Busi_weekday" and row['value']==True:
val = "Busi_weekday"
else:
val="NO"
return val
KIS,np.where
-
status = ["Highly_Active", "Multiple_days", "Busi_weekday"]
df['Activity'] = np.where(
df['variable'].isin(status) & df['value'],
df['variable'],
'NO'
)
df
uuid variable value Activity
0 AAS Highly_Active False NO
1 AAS Highly_Active True Highly_Active
2 SAP Highly_Active False NO
3 SAP Multiple_days True Multiple_days
4 YAS Highly_Active False NO
5 YAS Highly_Active False NO
6 YAS Busi_weekday False NO
如果@Paul H 是正确的并且 isin
在上面的解决方案中是多余的,那么您可以只使用 pd.Series.where
/pd.Series.mask
-
df['variable'].where(df['value'], 'NO')
或者,
df['variable'].mask(~df['value'], 'NO')
df
uuid variable value Activity
0 AAS Highly_Active False NO
1 AAS Highly_Active True Highly_Active
2 SAP Highly_Active False NO
3 SAP Multiple_days True Multiple_days
4 YAS Highly_Active False NO
5 YAS Highly_Active False NO
6 YAS Busi_weekday False NO
IIUC
df['Active']=(df.variable*df.value).replace('','No')
df
Out[653]:
uuid variable value Active
0 AAS Highly_Active False No
1 AAS Highly_Active True Highly_Active
2 SAP Highly_Active False No
3 SAP Multiple_days True Multiple_days
4 YAS Highly_Active False No
5 YAS Highly_Active False No
6 YAS Busi_weekday False No
我有以下数据框,
uuid variable value
AAS Highly_Active False
AAS Highly_Active True
SAP Highly_Active False
SAP Multiple_days True
YAS Highly_Active False
YAS Highly_Active False
YAS Busi_weekday False
而且我需要使用列 variable
和 value
中的值来定义第三列 Activity
并且我有以下经典的 python 代码来做,但是我的主数据帧大小为 121Mb,因此需要很长时间。任何 pandas 解决方案都很棒
def activity(row):
if row['variable'] == "Highly_Active" and row['value'] ==True:
val = "Highly_Active"
else:
val = "NO"
if row['variable'] == "Multiple_days" and row['value']==True:
val = "Multiple_days"
else:
val = "NO"
if row['variable'] == "Busi_weekday" and row['value']==True:
val = "Busi_weekday"
else:
val="NO"
return val
KIS,np.where
-
status = ["Highly_Active", "Multiple_days", "Busi_weekday"]
df['Activity'] = np.where(
df['variable'].isin(status) & df['value'],
df['variable'],
'NO'
)
df
uuid variable value Activity
0 AAS Highly_Active False NO
1 AAS Highly_Active True Highly_Active
2 SAP Highly_Active False NO
3 SAP Multiple_days True Multiple_days
4 YAS Highly_Active False NO
5 YAS Highly_Active False NO
6 YAS Busi_weekday False NO
如果@Paul H 是正确的并且 isin
在上面的解决方案中是多余的,那么您可以只使用 pd.Series.where
/pd.Series.mask
-
df['variable'].where(df['value'], 'NO')
或者,
df['variable'].mask(~df['value'], 'NO')
df
uuid variable value Activity
0 AAS Highly_Active False NO
1 AAS Highly_Active True Highly_Active
2 SAP Highly_Active False NO
3 SAP Multiple_days True Multiple_days
4 YAS Highly_Active False NO
5 YAS Highly_Active False NO
6 YAS Busi_weekday False NO
IIUC
df['Active']=(df.variable*df.value).replace('','No')
df
Out[653]:
uuid variable value Active
0 AAS Highly_Active False No
1 AAS Highly_Active True Highly_Active
2 SAP Highly_Active False No
3 SAP Multiple_days True Multiple_days
4 YAS Highly_Active False No
5 YAS Highly_Active False No
6 YAS Busi_weekday False No