Pandas 重采样不适用于 mean() 方法

Pandas Resample does not work with mean() method

我正在对 12 天的频率时间序列进行重采样。我想通过按月对值进行分组,将其重新采样为一个月的频率。当我按求和和计数而不是均值重新采样时,它工作正常。

这是我正在使用的代码:

date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']

values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]

df = pd.DataFrame([date,values]).T    # If not transposed it's not well organized
df.columns = ['Date','Values']
df.Date = df.Date.map(lambda x: pd.to_datetime(x,dayfirst=True)) 
df.reset_index()
df = df.set_index(['Date'])
df.resample('M').mean()

时间数据采用 DateTime 格式,时间序列值是浮点数。

尽管如此,这是不断出现的错误:

df.resample('M').mean()

File "C:\WPy64-3760\python-3.7.6.amd64\lib\site-packages\pandas\core\groupby\generic.py", line 188, in _cython_agg_blocks
    raise DataError("No numeric types to aggregate")

DataError: No numeric types to aggregate

重要的是,并非时间序列的所有月份都包含一个以上的值。更有什者,有些月份可能没有数据。我认为这不会造成麻烦。顺便说一下,我使用的是 Pandas 版本 0.25.3

我不知道会发生什么。

  • 当使用 pd.DataFrame([date,values]).T 创建数据框时,列都被识别为对象。 Values 类型永远不会设置为 float
import pandas as pd

# data
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
        '01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
        '24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']

values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
          -0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
          -0.057302,-0.052964,-0.076545]

# create dataframe
# Values is properly recognized as a float 
df = pd.DataFrame({'Date': date, 'Values': values})

# Convert Date to a datetime and set as the index
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index(['Date'])

# resample
df.resample('M').mean()
  • dataframe 每个月只有一个值,因此重新采样不会发生任何事情;给定月份必须有多个值。
    • 如果每月只有 one/none 个值,则不会出现错误。

每月重新采样的工作示例

import pandas as pd
import numpy as np
from datetime import datetime

# data
np.random.seed(365)
data = {'a': [np.random.randint(10) for _ in range(40)],
        'b': [np.random.randint(10) for _ in range(40)],
        'c': [np.random.randint(10) for _ in range(40)],
        'd': [np.random.randint(10) for _ in range(40)],
        'e': [np.random.randint(10) for _ in range(40)],
        'date': pd.bdate_range(datetime.today(), freq='w', periods=40).tolist()}

# dataframe
df = pd.DataFrame(data)

# set index
df.set_index('date', inplace=True)

print(df.head())

            a  b  c  d  e
date                     
2020-05-17  2  1  6  8  6
2020-05-24  4  4  5  9  1
2020-05-31  1  0  7  9  5
2020-06-07  5  9  7  7  7
2020-06-14  2  6  9  5  6

# resample
df.resample('M').mean()

                   a         b     c         d     e
date                                                
2020-05-31  2.333333  1.666667  6.00  8.666667  4.00
2020-06-30  4.500000  6.500000  6.25  3.500000  6.00
2020-07-31  3.750000  4.750000  2.25  3.500000  5.75
2020-08-31  4.800000  6.000000  2.00  3.800000  4.00
2020-09-30  4.250000  4.500000  3.00  4.750000  6.75
2020-10-31  5.500000  3.500000  5.00  6.750000  7.25
2020-11-30  5.400000  6.600000  5.60  5.200000  4.20
2020-12-31  6.250000  6.750000  5.75  4.500000  3.25
2021-01-31  7.200000  3.200000  3.20  5.200000  4.20
2021-02-28  4.500000  3.500000  2.50  5.500000  3.50