检查 pandas 中的 'NaN' 值时排除 'None'

Question

我正在清理NaN到运行线性回归的数据集，在这个过程中，我用None替换了一些NaN。完成此操作后，我使用以下代码检查具有 NaN 值的剩余列，其中 houseprice 是数据框的名称

def cols_NaN():
    return houseprice.columns[houseprice.isnull().any()].tolist()
print houseprice[cols_NaN()].isnull().sum()

问题是上面的结果还包括 None 值。我想要 select 那些具有 NaN 值的列。我该怎么做？

Answer 1

我唯一能想到的就是检查元素是否为 float，因为 np.nan 属于 float 类型并且为空。

考虑数据帧df

df = pd.DataFrame(dict(A=[1., None, np.nan]), dtype=np.object)

print(df)

      A
0     1
1  None
2   NaN

然后我们测试是否float和isnull

df.A.apply(lambda x: isinstance(x, float)) & df.A.isnull()

0    False
1    False
2     True
Name: A, dtype: bool

Answer 2

对于列名的处理有点不同，因为需要 map and pandas.isnull:

对于 houseprice.columns.apply() 并且如果 houseprice.columns.isnull() 出现错误：

AttributeError: 'Index' object has no attribute 'apply'

AttributeError: 'Index' object has no attribute 'isnull'

houseprice = pd.DataFrame(columns = [np.nan, None, 'a'])

print (houseprice)
Empty DataFrame
Columns: [nan, None, a]

print (houseprice.columns[(houseprice.columns.map(type) == float) & 
                          (pd.isnull(houseprice.columns))].tolist())
[nan]

并且检查 DataFrame 中的所有值是必要的 applymap:

houseprice = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[np.nan,8,9],
                   'D':[1,3,5],
                   'E':['a','s',None],
                   'F':[np.nan,4,3]})

print (houseprice)
   A  B    C  D     E    F
0  1  4  NaN  1     a  NaN
1  2  5  8.0  3     s  4.0
2  3  6  9.0  5  None  3.0

print (houseprice.columns[(houseprice.applymap(lambda x: isinstance(x, float)) & 
                           houseprice.isnull()).any()])
Index(['C', 'F'], dtype='object')

总而言之，这段代码更简单 - sum True boolean mask 中的值：

print ((houseprice.applymap(lambda x: isinstance(x, float)) & 
        houseprice.isnull()).any().sum())
2

检查 pandas 中的 'NaN' 值时排除 'None'

Excluding 'None' when checking for 'NaN' values in pandas

python

numpy

nan

pandas