Pandas 具有复合表达式行为的掩码

Question

这个问题之前曾被用户问过（然后删除了），我正在寻找一个解决方案，以便在问题消失时给出答案，而且我似乎无法理解pandas' 的行为，所以我会很清楚一些，原来的问题是这样说的：

How can I replace every negative value except those in a given list with NaN in a Pandas dataframe?

我重现该场景的设置如下：

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A' : [x for x in range(4)],
    'B' : [x for x in range(-2, 2)]
})

这在技术上应该只是将布尔表达式正确传递给 pd.where 的问题，我尝试的解决方案如下：

df[df >= 0 | df.isin([-2])]

产生：

index	A	B
0	0	NaN
1	1	NaN
2	2	0
3	3	1

这也取消了列表中的号码！

此外，如果我用两个条件中的每一个屏蔽数据帧，我就会得到正确的行为：

with `df[df >= 0]`（与复合结果相同）

index	A	B
0	0	NaN
1	1	NaN
2	2	0
3	3	1

与`df[df.isin([-2])]`（与复合结果相同）

index	A	B
0	NaN	-2.0
1	NaN	NaN
2	NaN	NaN
3	NaN	NaN

看来我是

运行由于对 NaN 值执行逻辑而导致一些未定义的行为
我有问题

谁能给我解释一下这个情况？

Answer 1

解决方案

df[(df >= 0) | (df.isin([-2]))]

说明

在python中，按位或，|，比>=这样的比较运算符具有更高的运算符优先级：https://docs.python.org/3/reference/expressions.html#operator-precedence

在多个布尔条件下过滤 pandas DataFrame 时，您需要将每个条件括在括号中。来自 boolean indexing section of the pandas user guide 的更多内容：

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df['A'] > 2 & df['B'] < 3 as df['A'] > (2 & df['B']) < 3, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3).

Pandas 具有复合表达式行为的掩码

Pandas mask with composite expression behaviour

python

boolean

dataframe

pandas

with df[df >= 0]（与复合结果相同）

与df[df.isin([-2])]（与复合结果相同）

解决方案

说明

with `df[df >= 0]`（与复合结果相同）

与`df[df.isin([-2])]`（与复合结果相同）