我该如何处理这种情况:'n/a' 在 pandas 数据框中显示为 'nan',但无法对其进行字符串匹配和替换
How do I handle this situation: 'n/a' shows up as 'nan' in pandas dataframe, but cannot string-match it and replace
我有一个 CSV 文件,其中几行是 n/a
。当我将它加载为 pandas 数据帧时,它显示为 nan
.
当我在这几行中使用 split
、lower
等函数时,这会导致问题。
data_df['column'][104]
>>> nan
data_df['column'][104].split()
>>>
AttributeError Traceback (most recent call last)
<ipython-input-38-6efe06f0a4ec> in <module>()
----> 1 data_df['column'][104].split()
AttributeError: 'float' object has no attribute 'split'
data_df['column'][104].lower()
>>>
AttributeError Traceback (most recent call last)
<ipython-input-41-c80cc9ae0712> in <module>()
----> 1 data_df['column'][104].lower()
AttributeError: 'float' object has no attribute 'lower'
当我尝试用 fillna
方法用空格替换 nan
时(这不会导致这些错误),它没有做任何事情:
data_df.fillna('')
data_df['column'][104]
>>> nan
所以我尝试将其替换为字符串:
for i in range(len(data_df)):
if data_df['column'][i]=='nan':
data_df['column'][i]=''
data_df['column'][104]
>>> nan
for i in range(len(data_df)):
if data_df['column'][i]=='n/a':
data_df['column'][i]=''
data_df['column'][104]
>>> nan
以下不打印任何内容:
for i in range(len(data_df)):
if (data_df['column'][i]=='nan' or data_df['column'][i]=='n/a'):
print(data_df['column'][i])
为什么我无法捕获和替换 nan
或 n/a
?我该如何解决?
我想我们可以在一开始就解决它
df=pd.read_csv('Yourfile.csv',na_values=['n/a']).fillna('')
data_df.fillna('')
创建数据框的副本。如果要更改原始数据框,请调用 data_df.fillna('', inplace=True)
.
我有一个 CSV 文件,其中几行是 n/a
。当我将它加载为 pandas 数据帧时,它显示为 nan
.
当我在这几行中使用 split
、lower
等函数时,这会导致问题。
data_df['column'][104]
>>> nan
data_df['column'][104].split()
>>>
AttributeError Traceback (most recent call last)
<ipython-input-38-6efe06f0a4ec> in <module>()
----> 1 data_df['column'][104].split()
AttributeError: 'float' object has no attribute 'split'
data_df['column'][104].lower()
>>>
AttributeError Traceback (most recent call last)
<ipython-input-41-c80cc9ae0712> in <module>()
----> 1 data_df['column'][104].lower()
AttributeError: 'float' object has no attribute 'lower'
当我尝试用 fillna
方法用空格替换 nan
时(这不会导致这些错误),它没有做任何事情:
data_df.fillna('')
data_df['column'][104]
>>> nan
所以我尝试将其替换为字符串:
for i in range(len(data_df)):
if data_df['column'][i]=='nan':
data_df['column'][i]=''
data_df['column'][104]
>>> nan
for i in range(len(data_df)):
if data_df['column'][i]=='n/a':
data_df['column'][i]=''
data_df['column'][104]
>>> nan
以下不打印任何内容:
for i in range(len(data_df)):
if (data_df['column'][i]=='nan' or data_df['column'][i]=='n/a'):
print(data_df['column'][i])
为什么我无法捕获和替换 nan
或 n/a
?我该如何解决?
我想我们可以在一开始就解决它
df=pd.read_csv('Yourfile.csv',na_values=['n/a']).fillna('')
data_df.fillna('')
创建数据框的副本。如果要更改原始数据框,请调用 data_df.fillna('', inplace=True)
.