如何在 python 中提取缺失的日期时间间隔

how to extract missing datetime interval in python

我有一个日期数据框,其中日期包含 15 分钟的间隔。我想找到丢失的日期时间间隔。 id 应该从上一行复制,但值应该是 nan '''

 date value id
   2021-12-02 07:00:00  12456677    693214
   2021-01-02 07:30:00  12456677  693214
   2021-01-02 07:45:00  12456677  693214
   2021-01-02 08:00:00  12456677 693214
   2021-01-02 08:15:00  12456665  693215
   2021-01-02 08:45:00  12456665  693215
   2021-01-03 09:00:00  12456666 693217
   2021-01-03 10:30:00  12456666   693217
预期输出是

date value id
   
   2021-01-02 08:30:00  NAN  693215
   2021-01-02 09:15:00 NAN    693217
   2021-01-03 09:30:00 NAN    693217
   2021-01-03 09:45:00 NAN    693217
   2021-01-03 10:00:00  NAN   693217

我正在尝试

df['Datetime'] = pd.to_datetime(df['date'])
df[ df['Datetime'].diff() > pd.Timedelta('15min') ]

但它只是给出了一个时间,之后的日期丢失了。不是丢失的日期 time.it 向我展示了这个输出

date value id

   2021-01-02 08:15:00  12456665  693215
   2021-01-03 09:00:00  12456666 693217
   2021-01-03 10:30:00  12456666   693217

有人可以指导我如何提取丢失的日期和时间吗? 提前致谢

每组使用 Series.asfreq 获取缺失间隔:

#create DatetimeIndex
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

#add 15 Minutes index per days and per id
df1 = (df.groupby([pd.Grouper(freq='D'), 'id'])['value']
         .apply(lambda x: x.asfreq('15min'))
         .reset_index(level=0, drop=True)
         .reset_index())
print (df1)
        id                date       value
0   693214 2021-01-02 07:30:00  12456677.0
1   693214 2021-01-02 07:45:00  12456677.0
2   693214 2021-01-02 08:00:00  12456677.0
3   693215 2021-01-02 08:15:00  12456665.0
4   693215 2021-01-02 08:30:00         NaN
5   693215 2021-01-02 08:45:00  12456665.0
6   693217 2021-01-03 09:00:00  12456666.0
7   693217 2021-01-03 09:15:00         NaN
8   693217 2021-01-03 09:30:00         NaN
9   693217 2021-01-03 09:45:00         NaN
10  693217 2021-01-03 10:00:00         NaN
11  693217 2021-01-03 10:15:00         NaN
12  693217 2021-01-03 10:30:00  12456666.0
13  693214 2021-12-02 07:00:00  12456677.0

测试 boolean indexing 中的缺失值:

df2 = df1[df1['value'].isna()]
print (df2)
        id                date  value
4   693215 2021-01-02 08:30:00    NaN
7   693217 2021-01-03 09:15:00    NaN
8   693217 2021-01-03 09:30:00    NaN
9   693217 2021-01-03 09:45:00    NaN
10  693217 2021-01-03 10:00:00    NaN
11  693217 2021-01-03 10:15:00    NaN