pandas 重采样后为 NaN
pandas NaN after resample
我有这个看似无辜的数据:
datetime,power
2022-02-14 15:09:58.163,114.07
2022-02-14 15:09:58.657,113.63
2022-02-14 15:10:32.237,114.28
2022-02-14 15:10:32.730,113.75
2022-02-14 15:10:33.195,113.76
2022-02-14 15:10:33.680,113.83
2022-02-14 15:10:34.195,114.44
2022-02-14 15:10:34.679,115.09
以下代码生成 NaN
_df = pd.read_csv('measurements/nan_df.csv')
_df['datetime'] = pd.to_datetime(_df['datetime'])
_df.set_index('datetime', inplace=True)
_df.resample('s').mean()
datetime,power
2022-02-14 15:09:58,113.85
2022-02-14 15:09:59,
2022-02-14 15:10:00,
2022-02-14 15:10:01,
2022-02-14 15:10:02,
2022-02-14 15:10:03,
2022-02-14 15:10:04,
2022-02-14 15:10:05,
2022-02-14 15:10:06,
2022-02-14 15:10:07,
2022-02-14 15:10:08,
2022-02-14 15:10:09,
知道为什么吗?
这是预期的,因为 pandas 默认情况下通过 DataFrame.resample
创建连续的 DatetimeIndex,如果不存在则添加值 NaN
s。
如果需要删除具有错误值的值,请使用:
df = df.resample('s').mean().dropna()
print (df)
power
datetime
2022-02-14 15:09:58 113.850
2022-02-14 15:10:32 114.015
2022-02-14 15:10:33 113.795
2022-02-14 15:10:34 114.765
我有这个看似无辜的数据:
datetime,power
2022-02-14 15:09:58.163,114.07
2022-02-14 15:09:58.657,113.63
2022-02-14 15:10:32.237,114.28
2022-02-14 15:10:32.730,113.75
2022-02-14 15:10:33.195,113.76
2022-02-14 15:10:33.680,113.83
2022-02-14 15:10:34.195,114.44
2022-02-14 15:10:34.679,115.09
以下代码生成 NaN
_df = pd.read_csv('measurements/nan_df.csv')
_df['datetime'] = pd.to_datetime(_df['datetime'])
_df.set_index('datetime', inplace=True)
_df.resample('s').mean()
datetime,power
2022-02-14 15:09:58,113.85
2022-02-14 15:09:59,
2022-02-14 15:10:00,
2022-02-14 15:10:01,
2022-02-14 15:10:02,
2022-02-14 15:10:03,
2022-02-14 15:10:04,
2022-02-14 15:10:05,
2022-02-14 15:10:06,
2022-02-14 15:10:07,
2022-02-14 15:10:08,
2022-02-14 15:10:09,
知道为什么吗?
这是预期的,因为 pandas 默认情况下通过 DataFrame.resample
创建连续的 DatetimeIndex,如果不存在则添加值 NaN
s。
如果需要删除具有错误值的值,请使用:
df = df.resample('s').mean().dropna()
print (df)
power
datetime
2022-02-14 15:09:58 113.850
2022-02-14 15:10:32 114.015
2022-02-14 15:10:33 113.795
2022-02-14 15:10:34 114.765