Pandas:按日期分组,然后 return 第一个具有匹配日期时间的有效值
Pandas: groupby by date then return first valid value with matching datetime
有:
df = pd.DataFrame({'datetime': pd.date_range('2022-05-01 10:00:00', periods=10, freq='10H'), 'value': [np.nan, np.nan, np.nan, -0.61, np.nan, 0.55, 0.63, np.nan, 0.15, np.nan]})
df
datetime value
0 2022-05-01 10:00:00 NaN
1 2022-05-01 20:00:00 NaN
2 2022-05-02 06:00:00 NaN
3 2022-05-02 16:00:00 -0.61
4 2022-05-03 02:00:00 NaN
5 2022-05-03 12:00:00 0.55
6 2022-05-03 22:00:00 0.63
7 2022-05-04 08:00:00 NaN
8 2022-05-04 18:00:00 0.15
9 2022-05-05 04:00:00 NaN
如何获取第一次出现的有效 value
groupby date
及其对应的 datetime
:
date datetime value
2022-05-02 2022-05-02 16:00:00 -0.61
2022-05-03 2022-05-03 12:00:00 0.55
2022-05-04 2022-05-04 18:00:00 0.15
我用过:df.groupby([df['datetime'].dt.date]).first()
但它给了我以下数据框,其中 datetime
是那天第一次出现,而不是值的 对应 datetime
我需要:
datetime value
datetime
2022-05-01 2022-05-01 10:00:00 NaN
2022-05-02 2022-05-02 06:00:00 -0.61
2022-05-03 2022-05-03 02:00:00 0.55
2022-05-04 2022-05-04 08:00:00 0.15
2022-05-05 2022-05-05 04:00:00 NaN
df1 = df.dropna(subset=['value']).groupby(df['datetime'].dt.date).first()
print (df1)
datetime value
datetime
2022-05-02 2022-05-02 16:00:00 -0.61
2022-05-03 2022-05-03 12:00:00 0.55
2022-05-04 2022-05-04 18:00:00 0.15
如果还需要缺失值:
d = df['datetime'].dt.date
df = df.groupby(d).bfill().set_index(d).loc[lambda x: ~x.index.duplicated()]
print (df)
datetime value
datetime
2022-05-01 2022-05-01 10:00:00 NaN
2022-05-02 2022-05-02 06:00:00 -0.61
2022-05-03 2022-05-03 02:00:00 0.55
2022-05-04 2022-05-04 08:00:00 0.15
2022-05-05 2022-05-05 04:00:00 NaN
另一种方式,dropna,从datetime提取的日期分组
df[df['value'].notna()].groupby(df['datetime'].dt.date).first()
datetime value
datetime
2022-05-02 2022-05-02 16:00:00 -0.61
2022-05-03 2022-05-03 12:00:00 0.55
2022-05-04 2022-05-04 18:00:00 0.15
有:
df = pd.DataFrame({'datetime': pd.date_range('2022-05-01 10:00:00', periods=10, freq='10H'), 'value': [np.nan, np.nan, np.nan, -0.61, np.nan, 0.55, 0.63, np.nan, 0.15, np.nan]})
df
datetime value
0 2022-05-01 10:00:00 NaN
1 2022-05-01 20:00:00 NaN
2 2022-05-02 06:00:00 NaN
3 2022-05-02 16:00:00 -0.61
4 2022-05-03 02:00:00 NaN
5 2022-05-03 12:00:00 0.55
6 2022-05-03 22:00:00 0.63
7 2022-05-04 08:00:00 NaN
8 2022-05-04 18:00:00 0.15
9 2022-05-05 04:00:00 NaN
如何获取第一次出现的有效 value
groupby date
及其对应的 datetime
:
date datetime value
2022-05-02 2022-05-02 16:00:00 -0.61
2022-05-03 2022-05-03 12:00:00 0.55
2022-05-04 2022-05-04 18:00:00 0.15
我用过:df.groupby([df['datetime'].dt.date]).first()
但它给了我以下数据框,其中 datetime
是那天第一次出现,而不是值的 对应 datetime
我需要:
datetime value
datetime
2022-05-01 2022-05-01 10:00:00 NaN
2022-05-02 2022-05-02 06:00:00 -0.61
2022-05-03 2022-05-03 02:00:00 0.55
2022-05-04 2022-05-04 08:00:00 0.15
2022-05-05 2022-05-05 04:00:00 NaN
df1 = df.dropna(subset=['value']).groupby(df['datetime'].dt.date).first()
print (df1)
datetime value
datetime
2022-05-02 2022-05-02 16:00:00 -0.61
2022-05-03 2022-05-03 12:00:00 0.55
2022-05-04 2022-05-04 18:00:00 0.15
如果还需要缺失值:
d = df['datetime'].dt.date
df = df.groupby(d).bfill().set_index(d).loc[lambda x: ~x.index.duplicated()]
print (df)
datetime value
datetime
2022-05-01 2022-05-01 10:00:00 NaN
2022-05-02 2022-05-02 06:00:00 -0.61
2022-05-03 2022-05-03 02:00:00 0.55
2022-05-04 2022-05-04 08:00:00 0.15
2022-05-05 2022-05-05 04:00:00 NaN
另一种方式,dropna,从datetime提取的日期分组
df[df['value'].notna()].groupby(df['datetime'].dt.date).first()
datetime value
datetime
2022-05-02 2022-05-02 16:00:00 -0.61
2022-05-03 2022-05-03 12:00:00 0.55
2022-05-04 2022-05-04 18:00:00 0.15