pandas 以用户定义的时间间隔重新采样

pandas resampling with user defined time intervals

我在 30 分钟的时间间隔内有一个 OHLC 数据帧。

2017-04-30 11:00:00-04:00  239.06  239.39  239.04  239.33      28
2017-04-30 11:30:00-04:00  239.01  239.22  238.91  239.03      28
2017-04-30 12:00:00-04:00  239.02  239.28  238.99  239.03      29
2017-04-30 12:30:00-04:00  238.94  239.08  238.84  239.03      28
2017-04-30 13:00:00-04:00  239.01  239.11  238.93  238.94      27
2017-04-30 13:30:00-04:00  238.94  239.08  238.86  239.03      12

我想以每小时条形图对数据重新采样,但是有没有办法将每小时条形图定义为每 30 分钟结束一次,例如 9:30-10:309:00-10:00

import pandas as pd
df = {'A': {'2017-04-30 15:00:00': 239.06,
  '2017-04-30 15:30:00': 239.00999999999999,
  '2017-04-30 16:00:00': 239.02000000000001,
  '2017-04-30 16:30:00': 238.94,
  '2017-04-30 17:00:00': 239.00999999999999,
  '2017-04-30 17:30:00': 238.94},
 'B': {'2017-04-30 15:00:00': 239.38999999999999,
  '2017-04-30 15:30:00': 239.22,
  '2017-04-30 16:00:00': 239.28,
  '2017-04-30 16:30:00': 239.08000000000001,
  '2017-04-30 17:00:00': 239.11000000000001,
  '2017-04-30 17:30:00': 239.08000000000001},
 'C': {'2017-04-30 15:00:00': 239.03999999999999,
  '2017-04-30 15:30:00': 238.91,
  '2017-04-30 16:00:00': 238.99000000000001,
  '2017-04-30 16:30:00': 238.84,
  '2017-04-30 17:00:00': 238.93000000000001,
  '2017-04-30 17:30:00': 238.86000000000001},
 'D': {'2017-04-30 15:00:00': 239.33000000000001,
  '2017-04-30 15:30:00': 239.03,
  '2017-04-30 16:00:00': 239.03,
  '2017-04-30 16:30:00': 239.03,
  '2017-04-30 17:00:00': 238.94,
  '2017-04-30 17:30:00': 239.03},
 'E': {'2017-04-30 15:00:00': 28,
  '2017-04-30 15:30:00': 28,
  '2017-04-30 16:00:00': 29,
  '2017-04-30 16:30:00': 28,
  '2017-04-30 17:00:00': 27,
  '2017-04-30 17:30:00': 12}}

df.index = pd.to_datetime(df.index)
                          A       B       C       D   E
2017-04-30 15:00:00  239.06  239.39  239.04  239.33  28
2017-04-30 15:30:00  239.01  239.22  238.91  239.03  28
2017-04-30 16:00:00  239.02  239.28  238.99  239.03  29
2017-04-30 16:30:00  238.94  239.08  238.84  239.03  28
2017-04-30 17:00:00  239.01  239.11  238.93  238.94  27
2017-04-30 17:30:00  238.94  239.08  238.86  239.03  12

#if your data is stricly half-hourly, you can get the hourly data ending every 30 mins as below:
df.resample('1H').last()
                          A       B       C       D   E
2017-04-30 15:00:00  239.01  239.22  238.91  239.03  28
2017-04-30 16:00:00  238.94  239.08  238.84  239.03  28
2017-04-30 17:00:00  238.94  239.08  238.86  239.03  12
df = pd.DataFrame([['2017-04-30 11:00:00-04:00', '239.06', '239.39', '239.04', '239.33', '28'],
                   ['2017-04-30 11:30:00-04:00', '239.01', '239.22', '238.91', '239.03', '28'],
                   ['2017-04-30 12:00:00-04:00', '239.02', '239.28', '238.99', '239.03', '29'],
                   ['2017-04-30 12:30:00-04:00', '238.94', '239.08', '238.84', '239.03', '28'],
                   ['2017-04-30 13:00:00-04:00', '239.01', '239.11', '238.93', '238.94', '27'],
                   ['2017-04-30 13:30:00-04:00', '238.94', '239.08', '238.86', '239.03', '12']], 
                  columns=['Time', 'O', 'H', 'L', 'C', 'V'])

df.Time = pd.to_datetime(df.Time)

df.loc[df.Time.dt.minute==30] # choose minutes equal to 30
df.loc[df.Time.dt.minute==0] # choose minutes equal to 0

df.set_index('Time', inplace=True)

df.resample('1H', base=0).last() # base=0 means start from 0H
df.resample('1H', base=0.5).last() # base=0.5 means start from 0.5H (30 mins)

要重新采样到采样周期的偏移量,请使用 base 参数到 (resample)

base : int, default 0

For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals. For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0

代码:

df = df.resample('1H', base=0.5).last()

测试代码:

df = pd.read_fwf(StringIO(u"""
    Date                      O       H       L       C
    2017-04-30T11:00:00-0400  239.06  239.39  239.04  239.33
    2017-04-30T11:30:00-0400  239.01  239.22  238.91  239.03
    2017-04-30T12:00:00-0400  239.02  239.28  238.99  239.03
    2017-04-30T12:30:00-0400  238.94  239.08  238.84  239.03
    2017-04-30T13:00:00-0400  239.01  239.11  238.93  238.94
    2017-04-30T13:30:00-0400  238.94  239.08  238.86  239.03"""
), header=1)
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

df = df.resample('1H', base=0.5).last()
print(df)

结果:

                          O       H       L       C
Date                                               
2017-04-30 14:30:00  239.06  239.39  239.04  239.33
2017-04-30 15:30:00  239.02  239.28  238.99  239.03
2017-04-30 16:30:00  239.01  239.11  238.93  238.94
2017-04-30 17:30:00  238.94  239.08  238.86  239.03