如何从时间序列数据中删除周末和节假日

How do I remove weekends and holidays from time series data

感谢您查看我的问题。我正在尝试从外汇市场的时间序列数据中删除周末和假期。我已经使用了 pd.bdate_range,但我不确定如何在初级书中实现它。如果您需要更多信息,请 lmk。

感谢帮助

Now = today.replace( second = 0, microsecond =0)
st = (today-timedelta(days = 14))
et=today
#Remove weekend
br= pd.bdate_range(st, et)
#Remove holidays

#Only time betweeen 7am to 5pm

PrimaryBook = _get_tsdb_primary_prices("audusd", st,et).fillna(method="ffill")
PrimaryBook = PrimaryBook.dropna(axis=1, how='all')
PrimaryBook = PrimaryBook.dropna()
PrimaryBookB = PrimaryBook.filter(regex=r'(BID|BSIZ)')
PrimaryBookA = PrimaryBook.filter(regex=r'(ASK|ASIZ)')
PrimaryBookB = PrimaryBook.filter(regex=r'(BID|BSIZ)')
PrimaryBookZ= PrimaryBook.filter(regex=r'(ASK|ASIZ)')
PrimaryBookZ

这是结果,但我想删除周末、节假日并设置时间范围为早上 7 点到下午 5 点

    BEST_ASK1   BEST_ASIZ1  BEST_ASK2   BEST_ASIZ2  BEST_ASK3   BEST_ASIZ3  BEST_ASK4   BEST_ASIZ4  BEST_ASK5   BEST_ASIZ5
Time                                        
2021-07-22 08:41:36.625573856+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.630647614+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.635475238+00:00 0.73725 1000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.640455282+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 4000000.0
2021-07-22 08:41:36.660516225+00:00 0.73725 2000000.0   0.73730 6000000.0   0.73735 4000000.0   0.73740 5000000.0   0.73745 5000000.0
... ... ... ... ... ... ... ... ... ... ...
2021-08-05 08:41:29.025629378+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 9000000.0
2021-08-05 08:41:29.450549198+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0
2021-08-05 08:41:30.346124376+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0
2021-08-05 08:41:31.586653810+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0
2021-08-05 08:41:31.840526198+00:00 0.73990 6000000.0   0.73995 4000000.0   0.74000 5000000.0   0.74005 5000000.0   0.74010 7000000.0

非常感谢您的帮助

由于假期因国家和年份而异,因此您需要为此使用套餐。

我建议使用 holidays:

import holidays

for day in holidays.UnitedStates(years=2021).items():
  print(day)

将为您提供相应年份中所有假期的日期时间对象列表:

(datetime.date(2021, 1, 1), "New Year's Day")
(datetime.date(2021, 12, 31), "New Year's Day (Observed)")
(datetime.date(2021, 1, 18), 'Martin Luther King Jr. Day')
(datetime.date(2021, 2, 15), "Washington's Birthday")
...

下一步是将您的日期转换为相同的格式:

import pandas as pd

df = pd.DataFrame([{"id":1, "day":"2021-07-22 08:41:36.625573856+00:00"}, {"id":1, "day":"2021-12-31 08:41:36.625573856+00:00"}])

df.day = pd.to_datetime(df.day)

之后很容易比较这一天是否包含在假期列表中:

df.loc[:,"isholiday"] = df.apply(lambda x: x.day.date() in [d[0] for d in holidays.UnitedStates(years=2021).items()], axis=1)

df
    id  day                                 isholiday
0   1   2021-07-22 08:41:36.625573856+00:00 False
1   1   2021-12-31 08:41:36.625573856+00:00 True

当然周末也一样,通过检查 dt.dayofweek 属性 是否在 [5,6](零索引天数)

我重置索引然后使用 dt.dayofweek <5 删除周末