将 np.nan 转换为 pd.NA
Convert np.nan to pd.NA
鉴于 pd.DataFrame
包含 float
,我如何将 np.nan
转换为新的 pd.NA
格式?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7
df = df.convert_dtypes()
type(df.iloc[0, 0]) # numpy.float64 - I'am expecting pd.NA
当 df
包含 float
时,使用 pd.convert_dtypes()
似乎不起作用。但是,当 df
包含 int
.
时,此转换工作正常
fillna 适合你吗?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7
df = df.fillna(pd.NA)
df
A B
0 <NA> 1.5
1 <NA> <NA>
2 <NA> <NA>
3 4.7 <NA>
看类型
type(df.iloc[0, 0])
输出:
pandas._libs.missing.NAType
从 v1.2 开始,这现在默认使用浮点数,如果您想要整数,请使用 convert_floating=False
参数。
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7
df = df.convert_dtypes()
df.info()
输出
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 1 non-null Float64
1 B 1 non-null Float64
dtypes: Float64(2)
memory usage: 104.0 bytes
使用整数
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1
df.iloc[3, 0] = 4
df = df.convert_dtypes(convert_floating=False)
df.info()
输出
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 1 non-null Int64
1 B 1 non-null Int64
dtypes: Int64(2)
memory usage: 104.0 bytes
鉴于 pd.DataFrame
包含 float
,我如何将 np.nan
转换为新的 pd.NA
格式?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7
df = df.convert_dtypes()
type(df.iloc[0, 0]) # numpy.float64 - I'am expecting pd.NA
当 df
包含 float
时,使用 pd.convert_dtypes()
似乎不起作用。但是,当 df
包含 int
.
fillna 适合你吗?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7
df = df.fillna(pd.NA)
df
A B
0 <NA> 1.5
1 <NA> <NA>
2 <NA> <NA>
3 4.7 <NA>
看类型
type(df.iloc[0, 0])
输出:
pandas._libs.missing.NAType
从 v1.2 开始,这现在默认使用浮点数,如果您想要整数,请使用 convert_floating=False
参数。
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7
df = df.convert_dtypes()
df.info()
输出
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 1 non-null Float64
1 B 1 non-null Float64
dtypes: Float64(2)
memory usage: 104.0 bytes
使用整数
import numpy as np
import pandas as pd
df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1
df.iloc[3, 0] = 4
df = df.convert_dtypes(convert_floating=False)
df.info()
输出
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 1 non-null Int64
1 B 1 non-null Int64
dtypes: Int64(2)
memory usage: 104.0 bytes