我正在尝试将 pandas 中具有数字数据类型的行中的所有 NaN 值填充为零

Question

我有一个混合了字符串和浮点行的 DateFrame。 float 行仍然是整数，只是因为缺少值而更改为 float。我想用零填充所有 NaN 行，同时将 NaN 留在字符串列中。这是我目前拥有的。

df.select_dtypes(include=['int', 'float']).fillna(0, inplace=True)

这不起作用，我认为这是因为 .select_dtypes() returns DataFrame 的视图，所以 .fillna() 不起作用。有没有类似的方法只在浮点行上填充所有NaN。

Answer 1

你的pandas.DataFrame.select_dtypes方法很好；你刚要越过终点线：

>>> df = pd.DataFrame({'A': [np.nan, 'string', 'string', 'more string'], 'B': [np.nan, np.nan, 3, 4], 'C': [4, np.nan, 5, 6]})
>>> df
             A    B    C
0          NaN  NaN  4.0
1       string  NaN  NaN
2       string  3.0  5.0
3  more string  4.0  6.0

不要尝试在此处执行就地 fillna（inplace=True 有时间和地点，但这里没有）。你是对的，select_dtypes 返回的基本上是一个视图。创建一个名为 filled 的新数据框，并将填充的（或 "fixed"）列与原始数据连接起来：

>>> filled = df.select_dtypes(include=['int', 'float']).fillna(0)
>>> filled
     B    C
0  0.0  4.0
1  0.0  0.0
2  3.0  5.0
3  4.0  6.0
>>> df = df.join(filled, rsuffix='_filled')
>>> df
             A    B    C  B_filled  C_filled
0          NaN  NaN  4.0       0.0       4.0
1       string  NaN  NaN       0.0       0.0
2       string  3.0  5.0       3.0       5.0
3  more string  4.0  6.0       4.0       6.0

然后您可以删除任何您必须保留的原始列，只保留 "filled" 个：

>>> df.drop([x[:x.find('_filled')] for x in df.columns if '_filled' in x], axis=1, inplace=True)
>>> df
             A  B_filled  C_filled
0          NaN       0.0       4.0
1       string       0.0       0.0
2       string       3.0       5.0
3  more string       4.0       6.0

Answer 2

使用 DF.combine_first（不作为 inplace）：

df.combine_first(df.select_dtypes(include=[np.number]).fillna(0))

或DF.update（修改inplace）：

df.update(df.select_dtypes(include=[np.number]).fillna(0))

fillna 失败的原因是因为 DF.select_dtypes returns 一个全新的数据框，虽然形成了原始 DF 的子集，但并不是真正的一部分它的。它本身就像一个全新的实体。所以对它所做的任何修改都不会影响它从中派生的DF。

注意np.number选择所有数字类型。

Answer 3

考虑这样的数据框

    col1    col2    col3    id
0   1       1       1       a
1   0       NaN     1       a
2   NaN     1       1       NaN
3   1       0       1       b

您可以 select 数字列和 fillna

num_cols = df.select_dtypes(include=[np.number]).columns
df[num_cols]=df.select_dtypes(include=[np.number]).fillna(0)


    col1    col2    col3    id
0   1       1       1       a
1   0       0       1       a
2   0       1       1       NaN
3   1       0       1       b

我正在尝试将 pandas 中具有数字数据类型的行中的所有 NaN 值填充为零

I am trying to fill all NaN values in rows with number data types to zero in pandas

python

missing-data

pandas