在具有日期时间值的 Dataframe 上应用 DatetimeIndex 作为过滤器

Question

好的，我只是在学习使用 DatetimeIndex 和 Dateframe 对象。我遇到了一个我无法直接看到解决方案的新问题，我希望有人可以使用我可能还不知道的 pandas 函数找到一个优雅的解决方案。

情况如下：一方面，我有一个非常大的Dataframe，有很多行和几列，包括一个名为starttime的列，它的值是时间戳。可能有两行或多行具有相同的 starttime 值。

                  starttime                 endtime  ...          y          x
id                                                   ...                      
0    2015-10-11 00:00:55+00  2015-10-11 00:00:55+00  ...          1      other
1    2015-10-11 15:10:42+00  2015-10-11 15:10:42+00  ...          1      other
2    2014-10-21 10:25:44+00  2014-10-21 10:25:44+00  ...          1      other
3    2014-10-21 10:27:28+00  2014-10-21 10:27:28+00  ...          1      other
4    2014-10-21 10:30:27+00  2014-10-21 10:30:27+00  ...          1      other
..                      ...                     ...  ...        ...        ...

另一方面，我有一个 DatetimeIndex 对象，其中包含许多不相交的日期。重要的是要知道，这些日期不构成日期 A 和 B 之间的完整范围，因此两者之间肯定有 "holes"，所以我不能简单地应用 date_range。

DatetimeIndex(['2014-12-12', '2014-12-15', '2014-12-16', '2014-12-17',
               '2014-12-18', '2014-12-19', '2014-12-20', '2014-12-21',
               '2015-03-02', '2015-03-03',
               ...],
              dtype='datetime64[ns]', length=xyz, freq=None)

问题来了：我现在需要的是将 Dataframe 的所有行放到 starttime 值未表示的位置在 DatetimeIndex 中有一个日期。 h:m:s 中的时间无关紧要，所以如果我有一个日期“2014-12-12”和两行“2014-12-12 00:00:55+00”和“2014-12-12 15:10:42+00" 两者都应该包括在内。生成的修剪后的 Dataframe 还应该包含它之前的所有列。

我的第一个迭代方法是在另一个日期之后获取 DatetimeIndex 的一个日期并遍历 Dataframe 的所有行，如果该行在同一天，则将其复制到一个新的 Frame 中，但我认为必须有一个更好的方法，因为如果 Dataframe 的行太多，我显然会遇到严重的性能问题。

Answer 1

删除时间 Series.dt.floor, compare by Series.isin and filter by boolean indexing:

#some value for match
idx = pd.DatetimeIndex(['2015-03-02', '2015-10-11'])

df['starttime'] = pd.to_datetime(df['starttime'])

df1 = df[df['starttime'].dt.floor('D').isin(idx)]
print (df1)
   id                 starttime                 endtime  y      x
0   0 2015-10-11 00:00:55+00:00  2015-10-11 00:00:55+00  1  other
1   1 2015-10-11 15:10:42+00:00  2015-10-11 15:10:42+00  1  other

详情:

print (df['starttime'].dt.floor('D'))
0   2015-10-11 00:00:00+00:00
1   2015-10-11 00:00:00+00:00
2   2014-10-21 00:00:00+00:00
3   2014-10-21 00:00:00+00:00
4   2014-10-21 00:00:00+00:00
Name: starttime, dtype: datetime64[ns, UTC]

print (df['starttime'].dt.floor('D').isin(idx))
0     True
1     True
2    False
3    False
4    False
Name: starttime, dtype: bool

在具有日期时间值的 Dataframe 上应用 DatetimeIndex 作为过滤器

Apply DatetimeIndex as filter on a Dateframe with datetime values

python

datetime

dataframe

pandas

datetimeindex