pandas 填充数据框中给定的缺失时间间隔

Question

我的 DataFrame 看起来像：

gap_id	species	time_start	time_stop
1	wheat	2021-11-22 00:01:00	2021-11-22 00:03:00
2	fescue	2021-12-18 05:52:00	2021-12-18 05:53:00

我想扩展 DataFrame 以便我得到与 time_start 和 [=36= 之间的分钟数一样多的行] 每个 gap_id:

gap_id	species	time
1	wheat	2021-11-22 00:01:00
1	wheat	2021-11-22 00:02:00
1	wheat	2021-11-22 00:03:00
2	fescue	2021-12-18 05:52:00
2	fescue	2021-12-18 05:53:00

我试过 pd.data_range 方法，但我不知道如何将它与在 gap_id 上制作的 groupby 结合使用

提前致谢

Answer 1

如果 DataFrame 小且性能不重要，则为每一行生成 date_range and then use DataFrame.explode:

df['time'] = df.apply(lambda x: pd.date_range(x['time_start'], x['time_stop'], freq='T'), axis=1)
df = df.drop(['time_start','time_stop'], axis=1).explode('time')

print (df)
   gap_id species                time
0       1   wheat 2021-11-22 00:01:00
0       1   wheat 2021-11-22 00:02:00
0       1   wheat 2021-11-22 00:03:00
1       2  fescue 2021-12-18 05:52:00
1       2  fescue 2021-12-18 05:53:00

对于大型 DataFrame，首先在分钟内按差异 start 和 stop 列重复索引，然后按 GroupBy.cumcount with convert to timedeltas by to_timedelta 添加计数器：

df['time_start'] = pd.to_datetime(df['time_start'])
df['time_stop'] = pd.to_datetime(df['time_stop'])

df = (df.loc[df.index.repeat(df['time_stop'].sub(df['time_start']).dt.total_seconds() // 60 + 1)]
        .drop('time_stop', axis=1)
        .rename(columns={'time_start':'time'}))
       
td = pd.to_timedelta(df.groupby(level=0).cumcount(), unit='Min')

df['time'] += td
df = df.reset_index(drop=True)
print (df)
   gap_id species                time
0       1   wheat 2021-11-22 00:01:00
1       1   wheat 2021-11-22 00:02:00
2       1   wheat 2021-11-22 00:03:00
3       2  fescue 2021-12-18 05:52:00
4       2  fescue 2021-12-18 05:53:00

pandas 填充数据框中给定的缺失时间间隔

pandas fill missing time intervals as given in a dataframe

python

datetime

pandas