在 Pandas 数据框布尔索引中使用 "opposite boolean" 的正确方法

Question

我想使用布尔索引，检查我的数据框中特定列不具有 NaN 值的行。所以，我做了以下事情：

import pandas as pd
my_df.loc[pd.isnull(my_df['col_of_interest']) == False].head()

查看该数据框的片段，仅包括不属于 NaN 的值（大多数值为 NaN）。

它有效，但似乎不够优雅。我想输入：

my_df.loc[!pd.isnull(my_df['col_of_interest'])].head()

但是，这产生了一个错误。我也花了很多时间在 R 上，所以也许我把事情弄糊涂了。在 Python 中，我通常会尽可能地使用语法 "not"。例如，if x is not none:，但我在这里真的做不到。有没有更优雅的方式？我不喜欢进行无意义的比较。

Answer 1

而不是使用 pandas.isnull() , you should use pandas.notnull() 查找列不为空值的行。示例 -

import pandas as pd
my_df.loc[pd.notnull(my_df['col_of_interest'])].head()

pandas.notnull() 是 pandas.isnull() 的布尔逆，如文档 -

中给出

See also
pandas.notnull
boolean inverse of pandas.isnull

Answer 2

一般来说 pandas （和 numpy），我们使用按位非 ~ 而不是 ! 或 not （其行为不能被覆盖类型）。

虽然在这种情况下我们有 notnull，但在没有特殊相反方法的情况下，~ 可以派上用场。

>>> df = pd.DataFrame({"a": [1, 2, np.nan, 3]})
>>> df.a.isnull()
0    False
1    False
2     True
3    False
Name: a, dtype: bool
>>> ~df.a.isnull()
0     True
1     True
2    False
3     True
Name: a, dtype: bool
>>> df.a.notnull()
0     True
1     True
2    False
3     True
Name: a, dtype: bool

（为了完整起见，我会注意到 - 一元否定运算符也适用于布尔系列，但 ~ 是规范的选择，并且 - 已被已弃用 numpy 布尔数组。)

在 Pandas 数据框布尔索引中使用 "opposite boolean" 的正确方法

Proper way to use "opposite boolean" in Pandas data frame boolean indexing

python

indexing

boolean

pandas