将 np.nan 转换为 pd.NA

Convert np.nan to pd.NA

鉴于 pd.DataFrame 包含 float,我如何将 np.nan 转换为新的 pd.NA 格式?

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7

df = df.convert_dtypes()

type(df.iloc[0, 0])  # numpy.float64 - I'am expecting pd.NA

df 包含 float 时,使用 pd.convert_dtypes() 似乎不起作用。但是,当 df 包含 int.

时,此转换工作正常

fillna 适合你吗?

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7

df = df.fillna(pd.NA)

df

      A     B
0  <NA>   1.5
1  <NA>  <NA>
2  <NA>  <NA>
3   4.7  <NA>

看类型

type(df.iloc[0, 0]) 

输出:

pandas._libs.missing.NAType

从 v1.2 开始,这现在默认使用浮点数,如果您想要整数,请使用 convert_floating=False 参数。

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7

df = df.convert_dtypes()
df.info()

输出

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       1 non-null      Float64
 1   B       1 non-null      Float64
dtypes: Float64(2)
memory usage: 104.0 bytes

使用整数

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1
df.iloc[3, 0] = 4

df = df.convert_dtypes(convert_floating=False)
df.info()

输出

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       1 non-null      Int64
 1   B       1 non-null      Int64
dtypes: Int64(2)
memory usage: 104.0 bytes