如何在单个 np.where 条件下使用多个值?

How to use multiple values in a single np.where condition?

我有一个如下所示的数据框

df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi","hello"],
                   'tokens':["test","correct","Tim",np.nan],
                   'labels':['A','B','C','D']})

而不是多个 np.where 条件,我想使用 Or| 运算符来检查 np.where 条件中的多个值,如下所示

df['labels'] = np.where(df['tokens'] == ('test'|'correct'|is.na()),'new_label',df['labels'])

但是,这会导致错误

TypeError: unsupported operand type(s) for |: 'str' and 'str'

我希望我的输出如下所示。对于拥有数百万条记录的大数据,我该如何有效地做到这一点?

第一个想法是用列表中的某个值替换缺失值,例如test 然后比较 Series.isin:

df['labels'] = np.where(df['tokens'].fillna('test').isin(['test','correct']),
                        'new_label',
                        df['labels'])
print (df)
         text   tokens     labels
0      Hi how     test  new_label
1   I am fine  correct  new_label
2  Ila say Hi      Tim          C
3       hello      NaN  new_label

或通过 | 链接另一个掩码以进行按位 OR 形式比较 NaNs:

df['labels'] = np.where(df['tokens'].isin(['test','correct']) | df['tokens'].isna(),
                        'new_label',
                        df['labels'])
print (df)
         text   tokens     labels
0      Hi how     test  new_label
1   I am fine  correct  new_label
2  Ila say Hi      Tim          C
3       hello      NaN  new_label