将具有 <NA> 值的数据类型 Int64 的列转换为具有 nan 值的对象
Convert a column of data type Int64 with <NA> values to object with nan values
一个教程有这个数据框 sequels
如下:
title sequel
id
19995 Avatar nan
862 Toy Story 863
863 Toy Story 2 10193
597 Titanic nan
24428 The Avengers nan
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
title 4803 non-null object
sequel 4803 non-null object
dtypes: object(2)
memory usage: 272.6+ KB
教程提供了一个文件sequels.p
。但是,当我读入文件时,我的数据框与教程中的数据框不同
my_sequels = pd.read_pickle('data/pandas/sequels.p')
my_sequels.set_index('id', inplace=True)
my_sequels.head()
title sequel
id
19995 Avatar <NA>
862 Toy Story 863
863 Toy Story 2 10193
597 Titanic <NA>
24428 The Avengers <NA>
sequels.info()
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title 4803 non-null object
1 sequel 90 non-null Int64
dtypes: Int64(1), object(1)
memory usage: 117.3+ KB
我的问题是:有没有一种方法可以操纵 my_sequels
使其类似于 sequels
,也就是说,将 my_sequels['sequel']
作为 4803 非空的对象,其中 <NA>
变成 nan
?
编辑:我想让my_sequels
与sequels
相同的原因是为了避免后续步骤中的错误:
sequels_fin = my_sequels.merge(financials, on='id', how='left')
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
right_on='id', right_index=True,
suffixes=('_org','_seq'))
ValueError Traceback (most recent call last)
<ipython-input-5-7215de303684> in <module>
3 orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
4 right_on='id', right_index=True,
----> 5 suffixes=('_org','_seq'))
ValueError: cannot convert to 'int64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
我想你不会想要的。您看到此消息的原因是本教程基于 Pandas 比您正在使用的版本更旧的版本。
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
您可以像您预期的那样检测缺失值并对其进行操作。
arr = pd.array([1, 2, None], dtype=pd.Int64Dtype())
arr.isna()
array([False, False, True])
arr.fillna(0)
<IntegerArray>
[1, 2, 0]
Length: 3, dtype: Int64
第一个索引'id':
sequels_fin = sequels_fin.set_index('id')
之后:
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
right_on='id', right_index=True,
suffixes=('_org','_seq'))
一个教程有这个数据框 sequels
如下:
title sequel
id
19995 Avatar nan
862 Toy Story 863
863 Toy Story 2 10193
597 Titanic nan
24428 The Avengers nan
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
title 4803 non-null object
sequel 4803 non-null object
dtypes: object(2)
memory usage: 272.6+ KB
教程提供了一个文件sequels.p
。但是,当我读入文件时,我的数据框与教程中的数据框不同
my_sequels = pd.read_pickle('data/pandas/sequels.p')
my_sequels.set_index('id', inplace=True)
my_sequels.head()
title sequel
id
19995 Avatar <NA>
862 Toy Story 863
863 Toy Story 2 10193
597 Titanic <NA>
24428 The Avengers <NA>
sequels.info()
<class 'pandas.core.frame.DataFrame'>
Index: 4803 entries, 19995 to 185567
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 title 4803 non-null object
1 sequel 90 non-null Int64
dtypes: Int64(1), object(1)
memory usage: 117.3+ KB
我的问题是:有没有一种方法可以操纵 my_sequels
使其类似于 sequels
,也就是说,将 my_sequels['sequel']
作为 4803 非空的对象,其中 <NA>
变成 nan
?
编辑:我想让my_sequels
与sequels
相同的原因是为了避免后续步骤中的错误:
sequels_fin = my_sequels.merge(financials, on='id', how='left')
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
right_on='id', right_index=True,
suffixes=('_org','_seq'))
ValueError Traceback (most recent call last)
<ipython-input-5-7215de303684> in <module>
3 orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
4 right_on='id', right_index=True,
----> 5 suffixes=('_org','_seq'))
ValueError: cannot convert to 'int64'-dtype NumPy array with missing values. Specify an appropriate 'na_value' for this dtype.
我想你不会想要的。您看到此消息的原因是本教程基于 Pandas 比您正在使用的版本更旧的版本。
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
您可以像您预期的那样检测缺失值并对其进行操作。
arr = pd.array([1, 2, None], dtype=pd.Int64Dtype())
arr.isna()
array([False, False, True])
arr.fillna(0)
<IntegerArray>
[1, 2, 0]
Length: 3, dtype: Int64
第一个索引'id':
sequels_fin = sequels_fin.set_index('id')
之后:
orig_seq = sequels_fin.merge(sequels_fin, how='inner', left_on='sequel',
right_on='id', right_index=True,
suffixes=('_org','_seq'))