Pandas/Numpy NaN None 比较

Question

在Python Pandas和Numpy中，为什么比较结果不一样？

from pandas import Series
from numpy import NaN

NaN 不等于 NaN

>>> NaN == NaN
False

但是列表或元组中的 NaN 是

>>> [NaN] == [NaN], (NaN,) == (NaN,)
(True, True)

虽然 Series 与 NaN 再次不相等：

>>> Series([NaN]) == Series([NaN])
0    False
dtype: bool

和None：

>>> None == None, [None] == [None]
(True, True)

同时

>>> Series([None]) == Series([None])
0    False
dtype: bool

This answer 解释了 NaN == NaN 通常 False 的原因，但没有解释其在 python/pandas 集合中的行为。

Answer 1

如解释的那样here, and and in python docs检查序列相等性

element identity is compared first, and element comparison is performed only for distinct elements.

因为 np.nan 和 np.NaN 指的是同一个对象，即 (np.nan is np.nan is np.NaN) == True 这个等式成立 [np.nan] == [np.nan]，但另一方面 float('nan') 函数创建了一个每次调用都有新对象，所以 [float('nan')] == [float('nan')] 是 False.

Pandas/Numpy没有这个问题:

>>> pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0]
(False, False)

尽管特殊的 equals 方法将相同位置的 NaN 视为相等。

>>> pd.Series([np.NaN]).equals(pd.Series([np.NaN]))
True

None 区别对待。 numpy 认为它们相等：

>>> pd.Series([None, None]).values == (pd.Series([None, None])).values
array([ True,  True])

虽然pandas没有

>>> pd.Series([None, None]) == (pd.Series([None, None]))
0    False
1    False
dtype: bool

还有==操作符和eq方法不一致，讨论here:

>>> pd.Series([None, None]).eq(pd.Series([None, None]))
0    True
1    True
dtype: bool

测试于 pandas: 0.23.4 numpy: 1.15.0

Pandas/Numpy NaN None 比较

Pandas/Numpy NaN None comparison

python

nan

python-3.x

pandas

nonetype