如何在 python 中提取缺失的日期时间间隔
how to extract missing datetime interval in python
我有一个日期数据框,其中日期包含 15 分钟的间隔。我想找到丢失的日期时间间隔。 id 应该从上一行复制,但值应该是 nan
'''
date value id
2021-12-02 07:00:00 12456677 693214
2021-01-02 07:30:00 12456677 693214
2021-01-02 07:45:00 12456677 693214
2021-01-02 08:00:00 12456677 693214
2021-01-02 08:15:00 12456665 693215
2021-01-02 08:45:00 12456665 693215
2021-01-03 09:00:00 12456666 693217
2021-01-03 10:30:00 12456666 693217
预期输出是
date value id
2021-01-02 08:30:00 NAN 693215
2021-01-02 09:15:00 NAN 693217
2021-01-03 09:30:00 NAN 693217
2021-01-03 09:45:00 NAN 693217
2021-01-03 10:00:00 NAN 693217
我正在尝试
df['Datetime'] = pd.to_datetime(df['date'])
df[ df['Datetime'].diff() > pd.Timedelta('15min') ]
但它只是给出了一个时间,之后的日期丢失了。不是丢失的日期 time.it 向我展示了这个输出
date value id
2021-01-02 08:15:00 12456665 693215
2021-01-03 09:00:00 12456666 693217
2021-01-03 10:30:00 12456666 693217
有人可以指导我如何提取丢失的日期和时间吗?
提前致谢
每组使用 Series.asfreq
获取缺失间隔:
#create DatetimeIndex
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
#add 15 Minutes index per days and per id
df1 = (df.groupby([pd.Grouper(freq='D'), 'id'])['value']
.apply(lambda x: x.asfreq('15min'))
.reset_index(level=0, drop=True)
.reset_index())
print (df1)
id date value
0 693214 2021-01-02 07:30:00 12456677.0
1 693214 2021-01-02 07:45:00 12456677.0
2 693214 2021-01-02 08:00:00 12456677.0
3 693215 2021-01-02 08:15:00 12456665.0
4 693215 2021-01-02 08:30:00 NaN
5 693215 2021-01-02 08:45:00 12456665.0
6 693217 2021-01-03 09:00:00 12456666.0
7 693217 2021-01-03 09:15:00 NaN
8 693217 2021-01-03 09:30:00 NaN
9 693217 2021-01-03 09:45:00 NaN
10 693217 2021-01-03 10:00:00 NaN
11 693217 2021-01-03 10:15:00 NaN
12 693217 2021-01-03 10:30:00 12456666.0
13 693214 2021-12-02 07:00:00 12456677.0
测试 boolean indexing
中的缺失值:
df2 = df1[df1['value'].isna()]
print (df2)
id date value
4 693215 2021-01-02 08:30:00 NaN
7 693217 2021-01-03 09:15:00 NaN
8 693217 2021-01-03 09:30:00 NaN
9 693217 2021-01-03 09:45:00 NaN
10 693217 2021-01-03 10:00:00 NaN
11 693217 2021-01-03 10:15:00 NaN
我有一个日期数据框,其中日期包含 15 分钟的间隔。我想找到丢失的日期时间间隔。 id 应该从上一行复制,但值应该是 nan '''
date value id
2021-12-02 07:00:00 12456677 693214
2021-01-02 07:30:00 12456677 693214
2021-01-02 07:45:00 12456677 693214
2021-01-02 08:00:00 12456677 693214
2021-01-02 08:15:00 12456665 693215
2021-01-02 08:45:00 12456665 693215
2021-01-03 09:00:00 12456666 693217
2021-01-03 10:30:00 12456666 693217
date value id
2021-01-02 08:30:00 NAN 693215
2021-01-02 09:15:00 NAN 693217
2021-01-03 09:30:00 NAN 693217
2021-01-03 09:45:00 NAN 693217
2021-01-03 10:00:00 NAN 693217
我正在尝试
df['Datetime'] = pd.to_datetime(df['date'])
df[ df['Datetime'].diff() > pd.Timedelta('15min') ]
但它只是给出了一个时间,之后的日期丢失了。不是丢失的日期 time.it 向我展示了这个输出
date value id
2021-01-02 08:15:00 12456665 693215
2021-01-03 09:00:00 12456666 693217
2021-01-03 10:30:00 12456666 693217
有人可以指导我如何提取丢失的日期和时间吗? 提前致谢
每组使用 Series.asfreq
获取缺失间隔:
#create DatetimeIndex
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
#add 15 Minutes index per days and per id
df1 = (df.groupby([pd.Grouper(freq='D'), 'id'])['value']
.apply(lambda x: x.asfreq('15min'))
.reset_index(level=0, drop=True)
.reset_index())
print (df1)
id date value
0 693214 2021-01-02 07:30:00 12456677.0
1 693214 2021-01-02 07:45:00 12456677.0
2 693214 2021-01-02 08:00:00 12456677.0
3 693215 2021-01-02 08:15:00 12456665.0
4 693215 2021-01-02 08:30:00 NaN
5 693215 2021-01-02 08:45:00 12456665.0
6 693217 2021-01-03 09:00:00 12456666.0
7 693217 2021-01-03 09:15:00 NaN
8 693217 2021-01-03 09:30:00 NaN
9 693217 2021-01-03 09:45:00 NaN
10 693217 2021-01-03 10:00:00 NaN
11 693217 2021-01-03 10:15:00 NaN
12 693217 2021-01-03 10:30:00 12456666.0
13 693214 2021-12-02 07:00:00 12456677.0
测试 boolean indexing
中的缺失值:
df2 = df1[df1['value'].isna()]
print (df2)
id date value
4 693215 2021-01-02 08:30:00 NaN
7 693217 2021-01-03 09:15:00 NaN
8 693217 2021-01-03 09:30:00 NaN
9 693217 2021-01-03 09:45:00 NaN
10 693217 2021-01-03 10:00:00 NaN
11 693217 2021-01-03 10:15:00 NaN