Pandas 具有复合表达式行为的掩码
Pandas mask with composite expression behaviour
这个问题之前曾被用户问过(然后删除了),我正在寻找一个解决方案,以便在问题消失时给出答案,而且我似乎无法理解pandas' 的行为,所以我会很清楚一些,原来的问题是这样说的:
How can I replace every negative value except those in a given list with NaN in a Pandas dataframe?
我重现该场景的设置如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A' : [x for x in range(4)],
'B' : [x for x in range(-2, 2)]
})
这在技术上应该只是将布尔表达式正确传递给 pd.where 的问题,我尝试的解决方案如下:
df[df >= 0 | df.isin([-2])]
产生:
index
A
B
0
0
NaN
1
1
NaN
2
2
0
3
3
1
这也取消了列表中的号码!
此外,如果我用两个条件中的每一个屏蔽数据帧,我就会得到正确的行为:
with df[df >= 0]
(与复合结果相同)
index
A
B
0
0
NaN
1
1
NaN
2
2
0
3
3
1
与df[df.isin([-2])]
(与复合结果相同)
index
A
B
0
NaN
-2.0
1
NaN
NaN
2
NaN
NaN
3
NaN
NaN
看来我是
- 运行 由于对 NaN 值执行逻辑而导致一些未定义的行为
- 我有问题
谁能给我解释一下这个情况?
解决方案
df[(df >= 0) | (df.isin([-2]))]
说明
在python中,按位或,|
,比>=
这样的比较运算符具有更高的运算符优先级:https://docs.python.org/3/reference/expressions.html#operator-precedence
在多个布尔条件下过滤 pandas DataFrame 时,您需要将每个条件括在括号中。来自 boolean indexing section of the pandas user guide 的更多内容:
Another common operation is the use of boolean vectors to filter the
data. The operators are: |
for or
, &
for and
, and ~
for not
. These
must be grouped by using parentheses, since by default Python will
evaluate an expression such as df['A'] > 2 & df['B'] < 3
as df['A'] > (2 & df['B']) < 3
, while the desired evaluation order is (df['A'] > 2) & (df['B'] < 3)
.
这个问题之前曾被用户问过(然后删除了),我正在寻找一个解决方案,以便在问题消失时给出答案,而且我似乎无法理解pandas' 的行为,所以我会很清楚一些,原来的问题是这样说的:
How can I replace every negative value except those in a given list with NaN in a Pandas dataframe?
我重现该场景的设置如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A' : [x for x in range(4)],
'B' : [x for x in range(-2, 2)]
})
这在技术上应该只是将布尔表达式正确传递给 pd.where 的问题,我尝试的解决方案如下:
df[df >= 0 | df.isin([-2])]
产生:
index | A | B |
---|---|---|
0 | 0 | NaN |
1 | 1 | NaN |
2 | 2 | 0 |
3 | 3 | 1 |
这也取消了列表中的号码!
此外,如果我用两个条件中的每一个屏蔽数据帧,我就会得到正确的行为:
with df[df >= 0]
(与复合结果相同)
index | A | B |
---|---|---|
0 | 0 | NaN |
1 | 1 | NaN |
2 | 2 | 0 |
3 | 3 | 1 |
与df[df.isin([-2])]
(与复合结果相同)
index | A | B |
---|---|---|
0 | NaN | -2.0 |
1 | NaN | NaN |
2 | NaN | NaN |
3 | NaN | NaN |
看来我是
- 运行 由于对 NaN 值执行逻辑而导致一些未定义的行为
- 我有问题
谁能给我解释一下这个情况?
解决方案
df[(df >= 0) | (df.isin([-2]))]
说明
在python中,按位或,|
,比>=
这样的比较运算符具有更高的运算符优先级:https://docs.python.org/3/reference/expressions.html#operator-precedence
在多个布尔条件下过滤 pandas DataFrame 时,您需要将每个条件括在括号中。来自 boolean indexing section of the pandas user guide 的更多内容:
Another common operation is the use of boolean vectors to filter the data. The operators are:
|
foror
,&
forand
, and~
fornot
. These must be grouped by using parentheses, since by default Python will evaluate an expression such asdf['A'] > 2 & df['B'] < 3
asdf['A'] > (2 & df['B']) < 3
, while the desired evaluation order is(df['A'] > 2) & (df['B'] < 3)
.