数据框列不会用空字符串替换 nans

dataframe column won't replace nans with null string

我正在阅读一个包含 200 多列的非常大的 CSV 文件。有些列是完全空的。 当我将其作为数据框读取时,它强制这些列为 float64 类型。

我强制它是一个字符串:

if df['OtherValidationAuthority5ValidationAuthorityEntityID'].dtype == 'float64':
    df['OtherValidationAuthority5ValidationAuthorityEntityID'] = 
        df['OtherValidationAuthority5ValidationAuthorityEntityID'].astype(str)

问题是当我打印出那一列时,所有的值都是 nan。它们必须是空字符串,所以我使用

  df['OtherValidationAuthority5ValidationAuthorityEntityID'] = 
        df['OtherValidationAuthority5ValidationAuthorityEntityID'].replace(np.nan, '', regex=True)

然后我打印该列,它们仍然是 nans!我在 Whosebug 上看到过其他示例,他们推荐的是这样的 SEEMS。版本有什么变化吗? (我使用的是 3.7)。

我错过了什么?

附录。 我使用此代码更改列。它适用于某些人,但不适用于其他人。

for colname in df.columns:
    if df[colname].dtype == 'float64' or df[colname].dtype == 'int64':
        df[colname] = df[colname].astype(str)
    df[colname] = df[colname].replace({'nan': ''})

当我打印数据类型时,它们都是 'object',如我所料,但是当我打印值时,它们是

('001GPB6A9XPE8XJICC14', 'FIDELITY ADVISOR SERIES I - Fidelity Advisor Leveraged Company Stock Fund', nan, nan, nan, nan, nan, '', nan, nan, '', nan, nan, '', nan, '', '' , '', nan, nan, nan, '', '', '', '', '', '', '', '', '', '', '', '', nan, '245夏街', '', nan, '', nan, nan, nan, 'BOSTON', 'US-MA', 'US', '02110', nan, '245 Summer Street', '' , nan, '', nan, nan, nan, 'Boston', 'US-MA', 'US', '02210', nan, nan, nan, '', '', '', nan , '', '', nan, nan, nan, nan, '', '', '', '', '', '', '', '', '', '', '', '' , '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '' , '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ' ', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '' , '', '', '', '', '', 'RA000665', 南, 'S000005113', 'US-MA', 'FUND', '8888', 'OTHER', nan, nan, '', '', 'ACTIVE', nan, nan, nan, nan, '', '2012-11-29T16:33:0 0.000Z', '2020-06-03T14:33:00.000Z', 'ISSUED', '2021-05-29T07:50:00.000Z', 'EVK05KS7XY1DEII3R011', 'FULLY_CORROBORATED', 'RA000665', nan, 'S000005113', '', '', '', '', '', '', '', '', '', '', '', '', '' , '', '')

更改 replace

df['xxxx'] = df['xxxx'].replace({'nan': ''})