Pandas 重采样不适用于 mean() 方法
Pandas Resample does not work with mean() method
我正在对 12 天的频率时间序列进行重采样。我想通过按月对值进行分组,将其重新采样为一个月的频率。当我按求和和计数而不是均值重新采样时,它工作正常。
这是我正在使用的代码:
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']
values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]
df = pd.DataFrame([date,values]).T # If not transposed it's not well organized
df.columns = ['Date','Values']
df.Date = df.Date.map(lambda x: pd.to_datetime(x,dayfirst=True))
df.reset_index()
df = df.set_index(['Date'])
df.resample('M').mean()
时间数据采用 DateTime 格式,时间序列值是浮点数。
尽管如此,这是不断出现的错误:
df.resample('M').mean()
File "C:\WPy64-3760\python-3.7.6.amd64\lib\site-packages\pandas\core\groupby\generic.py", line 188, in _cython_agg_blocks
raise DataError("No numeric types to aggregate")
DataError: No numeric types to aggregate
重要的是,并非时间序列的所有月份都包含一个以上的值。更有什者,有些月份可能没有数据。我认为这不会造成麻烦。顺便说一下,我使用的是 Pandas 版本 0.25.3
我不知道会发生什么。
- 当使用
pd.DataFrame([date,values]).T
创建数据框时,列都被识别为对象。 Values
类型永远不会设置为 float
。
import pandas as pd
# data
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']
values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]
# create dataframe
# Values is properly recognized as a float
df = pd.DataFrame({'Date': date, 'Values': values})
# Convert Date to a datetime and set as the index
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index(['Date'])
# resample
df.resample('M').mean()
- dataframe 每个月只有一个值,因此重新采样不会发生任何事情;给定月份必须有多个值。
- 如果每月只有 one/none 个值,则不会出现错误。
每月重新采样的工作示例
import pandas as pd
import numpy as np
from datetime import datetime
# data
np.random.seed(365)
data = {'a': [np.random.randint(10) for _ in range(40)],
'b': [np.random.randint(10) for _ in range(40)],
'c': [np.random.randint(10) for _ in range(40)],
'd': [np.random.randint(10) for _ in range(40)],
'e': [np.random.randint(10) for _ in range(40)],
'date': pd.bdate_range(datetime.today(), freq='w', periods=40).tolist()}
# dataframe
df = pd.DataFrame(data)
# set index
df.set_index('date', inplace=True)
print(df.head())
a b c d e
date
2020-05-17 2 1 6 8 6
2020-05-24 4 4 5 9 1
2020-05-31 1 0 7 9 5
2020-06-07 5 9 7 7 7
2020-06-14 2 6 9 5 6
# resample
df.resample('M').mean()
a b c d e
date
2020-05-31 2.333333 1.666667 6.00 8.666667 4.00
2020-06-30 4.500000 6.500000 6.25 3.500000 6.00
2020-07-31 3.750000 4.750000 2.25 3.500000 5.75
2020-08-31 4.800000 6.000000 2.00 3.800000 4.00
2020-09-30 4.250000 4.500000 3.00 4.750000 6.75
2020-10-31 5.500000 3.500000 5.00 6.750000 7.25
2020-11-30 5.400000 6.600000 5.60 5.200000 4.20
2020-12-31 6.250000 6.750000 5.75 4.500000 3.25
2021-01-31 7.200000 3.200000 3.20 5.200000 4.20
2021-02-28 4.500000 3.500000 2.50 5.500000 3.50
我正在对 12 天的频率时间序列进行重采样。我想通过按月对值进行分组,将其重新采样为一个月的频率。当我按求和和计数而不是均值重新采样时,它工作正常。
这是我正在使用的代码:
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']
values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]
df = pd.DataFrame([date,values]).T # If not transposed it's not well organized
df.columns = ['Date','Values']
df.Date = df.Date.map(lambda x: pd.to_datetime(x,dayfirst=True))
df.reset_index()
df = df.set_index(['Date'])
df.resample('M').mean()
时间数据采用 DateTime 格式,时间序列值是浮点数。
尽管如此,这是不断出现的错误:
df.resample('M').mean()
File "C:\WPy64-3760\python-3.7.6.amd64\lib\site-packages\pandas\core\groupby\generic.py", line 188, in _cython_agg_blocks
raise DataError("No numeric types to aggregate")
DataError: No numeric types to aggregate
重要的是,并非时间序列的所有月份都包含一个以上的值。更有什者,有些月份可能没有数据。我认为这不会造成麻烦。顺便说一下,我使用的是 Pandas 版本 0.25.3
我不知道会发生什么。
- 当使用
pd.DataFrame([date,values]).T
创建数据框时,列都被识别为对象。Values
类型永远不会设置为float
。
import pandas as pd
# data
date = ['09/03/2015','02/04/2015','26/04/2015','08/05/2015','20/05/2015',
'01/06/2015','13/06/2015','25/06/2015','07/07/2015','31/07/2015','12/08/2015',
'24/08/2015','23/10/2015','04/11/2015','16/11/2015','28/11/2015','22/12/2015']
values = [4.2e-05,-0.003414,0.016886,0.010597,-0.015756,-0.011592,
-0.018709,-0.031948,-0.000361,0.033206,0.122711,0.092198,0.067306,0.000668,
-0.057302,-0.052964,-0.076545]
# create dataframe
# Values is properly recognized as a float
df = pd.DataFrame({'Date': date, 'Values': values})
# Convert Date to a datetime and set as the index
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index(['Date'])
# resample
df.resample('M').mean()
- dataframe 每个月只有一个值,因此重新采样不会发生任何事情;给定月份必须有多个值。
- 如果每月只有 one/none 个值,则不会出现错误。
每月重新采样的工作示例
import pandas as pd
import numpy as np
from datetime import datetime
# data
np.random.seed(365)
data = {'a': [np.random.randint(10) for _ in range(40)],
'b': [np.random.randint(10) for _ in range(40)],
'c': [np.random.randint(10) for _ in range(40)],
'd': [np.random.randint(10) for _ in range(40)],
'e': [np.random.randint(10) for _ in range(40)],
'date': pd.bdate_range(datetime.today(), freq='w', periods=40).tolist()}
# dataframe
df = pd.DataFrame(data)
# set index
df.set_index('date', inplace=True)
print(df.head())
a b c d e
date
2020-05-17 2 1 6 8 6
2020-05-24 4 4 5 9 1
2020-05-31 1 0 7 9 5
2020-06-07 5 9 7 7 7
2020-06-14 2 6 9 5 6
# resample
df.resample('M').mean()
a b c d e
date
2020-05-31 2.333333 1.666667 6.00 8.666667 4.00
2020-06-30 4.500000 6.500000 6.25 3.500000 6.00
2020-07-31 3.750000 4.750000 2.25 3.500000 5.75
2020-08-31 4.800000 6.000000 2.00 3.800000 4.00
2020-09-30 4.250000 4.500000 3.00 4.750000 6.75
2020-10-31 5.500000 3.500000 5.00 6.750000 7.25
2020-11-30 5.400000 6.600000 5.60 5.200000 4.20
2020-12-31 6.250000 6.750000 5.75 4.500000 3.25
2021-01-31 7.200000 3.200000 3.20 5.200000 4.20
2021-02-28 4.500000 3.500000 2.50 5.500000 3.50