如何在单个 np.where 条件下使用多个值?
How to use multiple values in a single np.where condition?
我有一个如下所示的数据框
df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi","hello"],
'tokens':["test","correct","Tim",np.nan],
'labels':['A','B','C','D']})
而不是多个 np.where 条件,我想使用 Or
或 |
运算符来检查 np.where
条件中的多个值,如下所示
df['labels'] = np.where(df['tokens'] == ('test'|'correct'|is.na()),'new_label',df['labels'])
但是,这会导致错误
TypeError: unsupported operand type(s) for |: 'str' and 'str'
我希望我的输出如下所示。对于拥有数百万条记录的大数据,我该如何有效地做到这一点?
第一个想法是用列表中的某个值替换缺失值,例如test
然后比较 Series.isin
:
df['labels'] = np.where(df['tokens'].fillna('test').isin(['test','correct']),
'new_label',
df['labels'])
print (df)
text tokens labels
0 Hi how test new_label
1 I am fine correct new_label
2 Ila say Hi Tim C
3 hello NaN new_label
或通过 |
链接另一个掩码以进行按位 OR
形式比较 NaN
s:
df['labels'] = np.where(df['tokens'].isin(['test','correct']) | df['tokens'].isna(),
'new_label',
df['labels'])
print (df)
text tokens labels
0 Hi how test new_label
1 I am fine correct new_label
2 Ila say Hi Tim C
3 hello NaN new_label
我有一个如下所示的数据框
df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi","hello"],
'tokens':["test","correct","Tim",np.nan],
'labels':['A','B','C','D']})
而不是多个 np.where 条件,我想使用 Or
或 |
运算符来检查 np.where
条件中的多个值,如下所示
df['labels'] = np.where(df['tokens'] == ('test'|'correct'|is.na()),'new_label',df['labels'])
但是,这会导致错误
TypeError: unsupported operand type(s) for |: 'str' and 'str'
我希望我的输出如下所示。对于拥有数百万条记录的大数据,我该如何有效地做到这一点?
第一个想法是用列表中的某个值替换缺失值,例如test
然后比较 Series.isin
:
df['labels'] = np.where(df['tokens'].fillna('test').isin(['test','correct']),
'new_label',
df['labels'])
print (df)
text tokens labels
0 Hi how test new_label
1 I am fine correct new_label
2 Ila say Hi Tim C
3 hello NaN new_label
或通过 |
链接另一个掩码以进行按位 OR
形式比较 NaN
s:
df['labels'] = np.where(df['tokens'].isin(['test','correct']) | df['tokens'].isna(),
'new_label',
df['labels'])
print (df)
text tokens labels
0 Hi how test new_label
1 I am fine correct new_label
2 Ila say Hi Tim C
3 hello NaN new_label