Python Pandas 时间序列操作

Question

我有一个具有以下结构的 pandas 数据框：

                     Date     Open     High      Low    Close  Volume
0     2003-10-01 00:00:00  1.16500  1.16700  1.16400  1.16690    1125
1     2003-10-01 01:00:00  1.16680  1.16790  1.16600  1.16720     933
............

这些是连续时间值，因为它是 Eur/Usd 数据。我想重新采样创建一个每日数据框，该数据框使用日期 XXXX-XX-XX 09:00:00 的打开列值作为打开值，关闭值使用 XXXX-XX-XX 16:00:00 的关闭列值.高低应该是XXXX-XX-XX09:00:00和XXXX-XX-XX16:00:00之间较高的高和较低的低。交易量应为 XXXX-XX-XX 09:00:00 和 XXXX-XX-XX 16:00:00 之间的交易量之和。在 pandas 中有没有简单的方法来做到这一点？怎么做？

谢谢

Answer 1

这是一个 two-step 过程。首先，您需要删除超出每日每小时范围的数据；那么您需要将其重新采样为每日频率。

假设这是我们的时间序列：

import pandas as pd
import numpy as np
ts = pd.Series(np.random.random(72), index=pd.date_range('1/1/2011', periods=72, freq='H'))

要按小时过滤，我们可以创建一个布尔数组，询问数据中的每个时间戳是否在我们感兴趣的时间内，然后使用它来索引我们的时间序列：

ts_filtered = ts[ts.index.map(lambda time: 9 <= time.hour <= 17)]

然后，要重新采样，只需使用 resample:

daily_stats = ts_filtered.resample('D').mean()

这让我们：

2011-01-01    0.507943
2011-01-02    0.416317
2011-01-03    0.573760
Freq: D, dtype: float64

Answer 2

谢谢，我也找到了这个解决方案：

ohlc_dict = {                                                                                                             
'Open':'first',                                                                                                    
'High':'max',                                                                                                       
'Low':'min',                                                                                                        
'Close': 'last',                                                                                                    
'Volume': 'sum'  }

df_filtered_daily = df_filtered.resample('D', how=ohlc_dict, closed='left', label='left')

Answer 3

只有 09:00:00 和 16:00:00 之间的次数。

between_time 是获得所需时间的好简单方法

ts = ts.between_time('9:00','16:00')

使用推荐语法重新采样：

为了避免 'future warning' 与你的实施 resample 试试这个：

ohlc_dict = {                                                                                                             
'Open':'first',                                                                                                    
'High':'max',                                                                                                       
'Low':'min',                                                                                                        
'Close': 'last',                                                                                                    
'Volume': 'sum'  }

dailyData = ts.resample('1d').agg(ohlc_dict)

Python Pandas 时间序列操作

Python Pandas Time Series Manipulation

python

time-series

resampling

pandas