pandas 系列中的重复 NaN 集

Set of repeated NaNs in pandas Series

我想检查数据框的一列是否包含多个不同的值,所以我取该列,将其作为一个集合,并检查其长度。但我对 NaN 有疑问。我预计所有 NaN 的列的长度都是零,但事实并非如此,为什么?

import pandas as pd
from numpy import nan

set([nan, nan, nan]) # set has one element
set(pd.Series([nan, nan, nan])) #set has three elements

numpy 数组也会发生同样的情况:

set(pd.np.array([nan, nan, nan])) #set has three elements

其他值不会发生这种情况:

set(pd.np.array([1,1,1])) #set has one element

Python 世界中的对象身份

>>> L = [nan, nan, nan]
>>> L[0] is L[1]
True

Non-Python 数据框中的值是副本

>>> s = pd.Series([nan, nan, nan])
>> s[0] is s[1]
False

NaN 始终不相等

>>> s[0] == s[1]
False

>>> L[0] == L[1]
False

你可以比较nunique to count:

s1 = pd.Series([np.nan, np.nan, 1, 1, 2])
s2 = pd.Series([np.nan, np.nan, 1, 2, 3])

>>> s1.count() == s1.nunique()
False

>>> s2.count() == s2.nunique()
True

两种方法都排除了所有 non-NA/null 观察结果。