Pandas None 逻辑索引混乱

Question

我是 pandas 的新用户。而且我不明白为什么代码会这样工作。为什么当元素实际上等于 None 时 returns 为真？

In [14]:
import pandas as pd
tweets = pd.DataFrame([None, None], columns=['country'])
print tweets['country'] != None

Out[14]:
0    True
1    True
Name: country, dtype: bool

谢谢。

Answer 1

我不确定表达式返回 true 的原因，但您可以使用 pandas 内置的空检查器来确定值是否为空：

print tweets.notnull()

country
0   False
1   False

对方是

print tweets.isnull()

country
0   True
1   True

Answer 2

简而言之，发生这种情况是因为 pandas 认为 None 在很大程度上等同于 NaN，而 np.nan == np.nan 是 False。正如@economy 和其他人所说，使用 isnull() 或 notnull() 方法来做你想做的事。

现在，说明为什么这不是错误。相等运算符由 pandas.lib 中的 Cython 代码定义。具体来说，当您编写 tweets['country'] == None 时，会调用 pandas.lib.scalar_compare。注意 scalar_compare 是如何工作的：

>>> pd.lib.scalar_compare(np.array([None]), None, operator.ne)
array([ True], dtype=bool)

这就是您所看到的行为。现在，这不太可能是一个错误，因为如果我们查看明确处理 None 的 code for scalar_compare, it points us to a _checknull function。如果我们查看该代码，我们会发现它本质上（并且非常有意地）表示 None == None 是 False.

Pandas None 逻辑索引混乱

Pandas None logical indexing confusion

python

numpy

pandas