使用 python 将大型数据集的数据分组为每周、每月和每年？

Question

我有 datasets 以 Dataframe 格式记录了 20 年的 'X' 值。 X记录了3小时平均值的数据，数据样本如下。

   Time_stamp             X
 1992-01-01 03:00:00    10.2
 1992-01-01 06:00:00    10.4
 1992-01-01 09:00:00    11.8
 1992-01-01 12:00:00    12.0
 1992-01-01 15:00:00    10.4
 1992-01-01 18:00:00    9.4
 1992-01-01 21:00:00    10.4
 1992-01-02 00:00:00    13.6
 1992-01-02 03:00:00    13.2
 1992-01-02 06:00:00    11.8
 1992-01-02 09:00:00    12.0
 1992-01-02 12:00:00    12.8
 1992-01-02 15:00:00    12.6
 1992-01-02 18:00:00    11.0
 1992-01-02 21:00:00    12.2
 1992-01-03 00:00:00    13.8
 1992-01-03 03:00:00    14.0
 1992-01-03 06:00:00    13.4
 1992-01-03 09:00:00    14.2
 1992-01-03 12:00:00    16.2
 1992-01-03 15:00:00    13.2
 1992-01-03 18:00:00    13.4
 1992-01-03 21:00:00    13.8
 1992-01-04 00:00:00    14.8
 1992-01-04 03:00:00    13.8
 1992-01-04 06:00:00    7.6
 1992-01-04 09:00:00    5.8
 1992-01-04 12:00:00    4.4
 1992-01-04 15:00:00    5.6
 1992-01-04 18:00:00    6.0
 1992-01-04 21:00:00    7.0
 1992-01-05 00:00:00    6.8
 1992-01-05 03:00:00    3.4
 1992-01-05 06:00:00    5.8
 1992-01-05 09:00:00    10.6
 1992-01-05 12:00:00    9.2
 1992-01-05 15:00:00    10.6
 1992-01-05 18:00:00    9.8
 1992-01-05 21:00:00    11.2
 1992-01-06 00:00:00    12.0
 1992-01-06 03:00:00    10.2
 1992-01-06 06:00:00    9.0
 1992-01-06 09:00:00    9.0
 1992-01-06 12:00:00    8.6
 1992-01-06 15:00:00    8.4
 1992-01-06 18:00:00    8.2
 1992-01-06 21:00:00    8.8
 1992-01-07 00:00:00    10.0
 1992-01-07 03:00:00    9.6
 1992-01-07 06:00:00    8.0
 1992-01-07 09:00:00    9.6
 1992-01-07 12:00:00    10.8
 1992-01-07 15:00:00    10.2
 1992-01-07 18:00:00    9.8
 1992-01-07 21:00:00    10.2
 1992-01-08 00:00:00    9.4
 1992-01-08 03:00:00    11.4
 1992-01-08 06:00:00    12.6
 1992-01-08 09:00:00    12.8
 1992-01-08 12:00:00    10.4
 1992-01-08 15:00:00    11.2
 1992-01-08 18:00:00    9.0
 1992-01-08 21:00:00    10.2
 1992-01-09 00:00:00    8.2

我想创建单独的数据框来计算和记录给定数据集的年均值、周均值和日均值。我是 python 的新手，刚开始处理时间序列数据。我在 Whosebug 上发现了一些与此相关的问题，但没有找到与此相关的合适答案，也不知道如何开始。有什么帮助吗？到目前为止我写了这段代码，

import pandas as pd
import numpy as np

datasets['date_minus_time'] = df["Time_stamp"].apply( lambda df : 
datetime.datetime(year=datasets.year, month=datasets.month, 
day=datasets.day))  
datasets.set_index(df["date_minus_time"],inplace=True)

df['count'].resample('D', how='sum')
df['count'].resample('W', how='sum')
df['count'].resample('M', how='sum')

但不知道如何每 3 小时包含一次该数据记录。以及下一步应该怎么做才能达到我想要的结果。

Answer 1

您可以使用：

df['Time_stamp'] = pd.to_datetime(df['Time_stamp'], format='%Y-%m-%d %H:%M:%S')
df.set_index('Time_stamp',inplace=True)
df_monthly = df.resample('M').mean()

df_monthly 输出：

                    X
Time_stamp           
1992-01-31  10.403125

每日平均使用：df_daily = df.resample('D').mean() 输出：

                    X
Time_stamp           
1992-01-01  10.657143
1992-01-02  12.400000
1992-01-03  14.000000
1992-01-04   8.125000
1992-01-05   8.425000
1992-01-06   9.275000
1992-01-07   9.775000
1992-01-08  10.875000
1992-01-09   8.200000

Answer 2

使用带有参数 on 的 to_datetime for datetimes in column for improve performance and then DataFrame.resample 来指定日期时间列：

df['Time_stamp'] = pd.to_datetime(df['Time_stamp'])

df_daily = df.resample('D', on='Time_stamp').mean()
df_monthly = df.resample('M', on='Time_stamp').mean()
df_weekly = df.resample('W', on='Time_stamp').mean()

使用 python 将大型数据集的数据分组为每周、每月和每年？

Data grouping into weekyly, monthly and yearly for large datasets using python?

python

time-series

python-datetime

pandas

pandas-groupby