pandas 数据帧 returns NaN 之间的计算
Calculation between pandas dataframe returns NaN
我有一个名为 df_mod
的 pandas 数据框。此数据框中感兴趣的一个变量称为 Evap_mod
。当我使用命令 print(df_mod['Evap_mod'])
时,它 returns:
2003-12-20 00:30:00 1.930664
2003-12-21 00:30:00 1.789290
2003-12-22 00:30:00 2.318347
2003-12-23 00:30:00 1.741943
2003-12-24 00:30:00 1.686124
2003-12-25 00:30:00 1.852876
2003-12-26 00:30:00 1.759650
2003-12-27 00:30:00 1.566521
2003-12-28 00:30:00 1.496039
2003-12-29 00:30:00 1.540751
2003-12-30 00:30:00 2.006475
2003-12-31 00:30:00 1.920912
Name: Evap_mod, Length: 729, dtype: float32
我有另一个名为 dff
的 pandas 数据框。此数据框中感兴趣的一个变量称为 PET_PT
。当我使用命令 print(dff['PET_PT'])
时,它 returns:
2003-12-20 4.810697
2003-12-21 4.739378
2003-12-22 4.994467
2003-12-23 5.138086
2003-12-24 5.024226
2003-12-25 4.937206
2003-12-26 4.551416
2003-12-27 NaN
2003-12-28 NaN
2003-12-29 NaN
2003-12-30 NaN
2003-12-31 NaN
Freq: D, Name: PET_PT, Length: 729, dtype: float64
我想运行这两个变量之间的简单计算:
df_mod['ER_mod']=(df_mod['Evap_mod']+np.mean(ddf['PET_PT']))/(ddf['PET_PT']+np.mean(ddf['PET_PT']))
不幸的是,这个计算只是 returns NaN:
2003-12-20 00:30:00 NaN
2003-12-21 00:30:00 NaN
2003-12-22 00:30:00 NaN
2003-12-23 00:30:00 NaN
2003-12-24 00:30:00 NaN
2003-12-25 00:30:00 NaN
2003-12-26 00:30:00 NaN
2003-12-27 00:30:00 NaN
2003-12-28 00:30:00 NaN
2003-12-29 00:30:00 NaN
2003-12-30 00:30:00 NaN
2003-12-31 00:30:00 NaN
Name: ER_mod, Length: 729, dtype: float64
有谁知道为什么它 returns NaN 以及如何解决这个问题?
原因是索引值不同,所以除法后索引值不匹配并创建了 NaN
s。
解决方案是 map
系列 ddf['PET_PT']
由 DatetimeIndex.normalize
创建的辅助列 date
用于删除时间并使用 pandas mean
s函数:
#same index values like df_mod
new = df_mod.assign(date = df_mod.index.normalize())['date'].map(ddf['PET_PT'])
print (new)
2003-12-20 00:30:00 4.810697
2003-12-21 00:30:00 4.739378
2003-12-22 00:30:00 4.994467
2003-12-23 00:30:00 5.138086
2003-12-24 00:30:00 5.024226
2003-12-25 00:30:00 4.937206
2003-12-26 00:30:00 4.551416
2003-12-27 00:30:00 NaN
2003-12-28 00:30:00 NaN
2003-12-29 00:30:00 NaN
2003-12-30 00:30:00 NaN
2003-12-31 00:30:00 NaN
Name: date, dtype: float64
df_mod['ER_mod']= df_mod['Evap_mod'] + ddf['PET_PT'].mean())/(new+ddf['PET_PT'].mean()
print (df_mod)
Evap_mod ER_mod
2003-12-20 00:30:00 1.930664 0.702960
2003-12-21 00:30:00 1.789290 0.693480
2003-12-22 00:30:00 2.318347 0.729125
2003-12-23 00:30:00 1.741943 0.661170
2003-12-24 00:30:00 1.686124 0.663134
2003-12-25 00:30:00 1.852876 0.685986
2003-12-26 00:30:00 1.759650 0.704152
2003-12-27 00:30:00 1.566521 NaN
2003-12-28 00:30:00 1.496039 NaN
2003-12-29 00:30:00 1.540751 NaN
2003-12-30 00:30:00 2.006475 NaN
2003-12-31 00:30:00 1.920912 NaN
如果长度相同DataFrame
且索引值仅相差倍数,您可以将一个索引重新分配给另一个索引:
ddf.index = df_mod.index
df_mod['ER_mod'] = (df_mod['Evap_mod'] + ddf['PET_PT'].mean())/\
(ddf['PET_PT'] + ddf['PET_PT'].mean())
print (df_mod)
Evap_mod ER_mod
2003-12-20 00:30:00 1.930664 0.702960
2003-12-21 00:30:00 1.789290 0.693480
2003-12-22 00:30:00 2.318347 0.729125
2003-12-23 00:30:00 1.741943 0.661170
2003-12-24 00:30:00 1.686124 0.663134
2003-12-25 00:30:00 1.852876 0.685986
2003-12-26 00:30:00 1.759650 0.704152
2003-12-27 00:30:00 1.566521 NaN
2003-12-28 00:30:00 1.496039 NaN
2003-12-29 00:30:00 1.540751 NaN
2003-12-30 00:30:00 2.006475 NaN
2003-12-31 00:30:00 1.920912 NaN
您的列包含缺失数据,因此您应该根据您的 objective
使用不同的方法(均值、零、中值、随机等)来估算值
pandas
和 numpy
行为之间存在差异。每当您计算 np.mean(x)
如果 x
包含 NaN
时,您将得到 NaN
作为结果,同时使用 pandas NaN
将被忽略。以下应该有效
df_mod['ER_mod'] = (df_mod['Evap_mod'] + ddf['PET_PT'].mean())/\
(ddf['PET_PT'] + ddf['PET_PT'].mean())
否则你可以使用 np.nanmean
而不是 np.mean
。
我有一个名为 df_mod
的 pandas 数据框。此数据框中感兴趣的一个变量称为 Evap_mod
。当我使用命令 print(df_mod['Evap_mod'])
时,它 returns:
2003-12-20 00:30:00 1.930664
2003-12-21 00:30:00 1.789290
2003-12-22 00:30:00 2.318347
2003-12-23 00:30:00 1.741943
2003-12-24 00:30:00 1.686124
2003-12-25 00:30:00 1.852876
2003-12-26 00:30:00 1.759650
2003-12-27 00:30:00 1.566521
2003-12-28 00:30:00 1.496039
2003-12-29 00:30:00 1.540751
2003-12-30 00:30:00 2.006475
2003-12-31 00:30:00 1.920912
Name: Evap_mod, Length: 729, dtype: float32
我有另一个名为 dff
的 pandas 数据框。此数据框中感兴趣的一个变量称为 PET_PT
。当我使用命令 print(dff['PET_PT'])
时,它 returns:
2003-12-20 4.810697
2003-12-21 4.739378
2003-12-22 4.994467
2003-12-23 5.138086
2003-12-24 5.024226
2003-12-25 4.937206
2003-12-26 4.551416
2003-12-27 NaN
2003-12-28 NaN
2003-12-29 NaN
2003-12-30 NaN
2003-12-31 NaN
Freq: D, Name: PET_PT, Length: 729, dtype: float64
我想运行这两个变量之间的简单计算:
df_mod['ER_mod']=(df_mod['Evap_mod']+np.mean(ddf['PET_PT']))/(ddf['PET_PT']+np.mean(ddf['PET_PT']))
不幸的是,这个计算只是 returns NaN:
2003-12-20 00:30:00 NaN
2003-12-21 00:30:00 NaN
2003-12-22 00:30:00 NaN
2003-12-23 00:30:00 NaN
2003-12-24 00:30:00 NaN
2003-12-25 00:30:00 NaN
2003-12-26 00:30:00 NaN
2003-12-27 00:30:00 NaN
2003-12-28 00:30:00 NaN
2003-12-29 00:30:00 NaN
2003-12-30 00:30:00 NaN
2003-12-31 00:30:00 NaN
Name: ER_mod, Length: 729, dtype: float64
有谁知道为什么它 returns NaN 以及如何解决这个问题?
原因是索引值不同,所以除法后索引值不匹配并创建了 NaN
s。
解决方案是 map
系列 ddf['PET_PT']
由 DatetimeIndex.normalize
创建的辅助列 date
用于删除时间并使用 pandas mean
s函数:
#same index values like df_mod
new = df_mod.assign(date = df_mod.index.normalize())['date'].map(ddf['PET_PT'])
print (new)
2003-12-20 00:30:00 4.810697
2003-12-21 00:30:00 4.739378
2003-12-22 00:30:00 4.994467
2003-12-23 00:30:00 5.138086
2003-12-24 00:30:00 5.024226
2003-12-25 00:30:00 4.937206
2003-12-26 00:30:00 4.551416
2003-12-27 00:30:00 NaN
2003-12-28 00:30:00 NaN
2003-12-29 00:30:00 NaN
2003-12-30 00:30:00 NaN
2003-12-31 00:30:00 NaN
Name: date, dtype: float64
df_mod['ER_mod']= df_mod['Evap_mod'] + ddf['PET_PT'].mean())/(new+ddf['PET_PT'].mean()
print (df_mod)
Evap_mod ER_mod
2003-12-20 00:30:00 1.930664 0.702960
2003-12-21 00:30:00 1.789290 0.693480
2003-12-22 00:30:00 2.318347 0.729125
2003-12-23 00:30:00 1.741943 0.661170
2003-12-24 00:30:00 1.686124 0.663134
2003-12-25 00:30:00 1.852876 0.685986
2003-12-26 00:30:00 1.759650 0.704152
2003-12-27 00:30:00 1.566521 NaN
2003-12-28 00:30:00 1.496039 NaN
2003-12-29 00:30:00 1.540751 NaN
2003-12-30 00:30:00 2.006475 NaN
2003-12-31 00:30:00 1.920912 NaN
如果长度相同DataFrame
且索引值仅相差倍数,您可以将一个索引重新分配给另一个索引:
ddf.index = df_mod.index
df_mod['ER_mod'] = (df_mod['Evap_mod'] + ddf['PET_PT'].mean())/\
(ddf['PET_PT'] + ddf['PET_PT'].mean())
print (df_mod)
Evap_mod ER_mod
2003-12-20 00:30:00 1.930664 0.702960
2003-12-21 00:30:00 1.789290 0.693480
2003-12-22 00:30:00 2.318347 0.729125
2003-12-23 00:30:00 1.741943 0.661170
2003-12-24 00:30:00 1.686124 0.663134
2003-12-25 00:30:00 1.852876 0.685986
2003-12-26 00:30:00 1.759650 0.704152
2003-12-27 00:30:00 1.566521 NaN
2003-12-28 00:30:00 1.496039 NaN
2003-12-29 00:30:00 1.540751 NaN
2003-12-30 00:30:00 2.006475 NaN
2003-12-31 00:30:00 1.920912 NaN
您的列包含缺失数据,因此您应该根据您的 objective
使用不同的方法(均值、零、中值、随机等)来估算值pandas
和 numpy
行为之间存在差异。每当您计算 np.mean(x)
如果 x
包含 NaN
时,您将得到 NaN
作为结果,同时使用 pandas NaN
将被忽略。以下应该有效
df_mod['ER_mod'] = (df_mod['Evap_mod'] + ddf['PET_PT'].mean())/\
(ddf['PET_PT'] + ddf['PET_PT'].mean())
否则你可以使用 np.nanmean
而不是 np.mean
。