df.isna().sum() 不计算 nan 值

Question

我在 Dataframe 中有几个值，如下所示：

    Price_(zł)    Area_(m2) Rooms   Market      Building_type      Flat_level
0   1264850       62        3       secondary   apartment building  7
1   790000        80        4       secondary   block               0
2   606128        73,28     3       new         block               5
3   499000        70,50     4       secondary   nan                 nan
4   519000        40,86     2       new         block               5
5   508240        58,40     4       new         block               0
6   447568        50,86     3       new         block               0
7   Zapytajocenę  58,50     3       new         nan                 6
8   739375        84,50     4       new         apartment building  3
9   322400        52        3       new         nan                 1

来自：

df['Flat_level'] = df['Flat_level'].apply(lambda x: str(x).replace (' parter', '0') if x != np.NaN else x == np.NaN)
df['Flat_level'] = df['Flat_level'].apply(lambda x: str(x).replace (' suterena', '-1') if x != np.NaN else x == np.NaN)
df['Flat_level'] = df['Flat_level'].apply(lambda x: str(x).replace (' > 10', '20') if x != np.NaN else x == np.NaN)
df['Flat_level'] = df['Flat_level'].apply(lambda x: str(x).replace (' poddasze', '30') if x != np.NaN else x == np.NaN)

在这些之前的更改之前：

类型：

type(df['Flat_level'][3])

float

尝试计算 NaN 值时：

df.isna().sum()

“Flat_level”列没有 'NaN' 值：

Price_(zł)                0   
Area_(m2)                 0   
Rooms                     0   
Market                    0   
Building_type             0   
Flat_level                0   
Building_flat_levels      1249
Windows                   0   
Heating                   0   
Year_of_construction      1734
Finishing_level           0   
Property_form             0   
Construction_materials    0   
latitude                  0   
longitude                 0   
link                      0   
dtype: int64

知道为什么吗？谢谢

Answer 1

您最好使用 numpy 函数 np.isnan() 而不是原生 Python 来了解值是否为 nan。您还需要更新 apply() 方法的末尾，否则您的数据框中将只有布尔值而不是 nan 值。你可以这样做：

df['Flat_level'] = df['Flat_level'].apply(
    lambda x: str(x).replace (' parter', '0') if (type(x) == float and not np.isnan(x)) or type(x)!=float else np.NaN
)

df.isna().sum() 不计算 nan 值

df.isna().sum() not counting nan values

nan

pandas