从每月平均数据帧到插值的每日时间序列
Going from monthly average dataframe to an interpolated daily timeseries
我有兴趣获取每个月的月平均值,并将月平均值设置为每个月第 15 天的值(在每日时间序列内)。
我从以下开始(这些是我得到的每月平均值):
m_avg = pd.DataFrame({'Month': ['1.527013956', '1.899169054', '1.669356146','1.44920871', '1.188557788', '1.017035727', '0.950243755', '1.022453993', '1.203913739', '1.369545041','1.441827406','1.48621651']
编辑:我向数据框添加了一个值,现在有 12 个值。
接下来,我想将每个月的值放在以下时间段的第 15 天(每个月内):
ts = pd.date_range(start='1/1/1950', end='12/31/1999', freq='D')
我知道如何使用已经存在的每日时间序列的第 15 天提取日期:
df= df.loc[(df.index.day==15)] # Where df is any daily timeseries
最后,我知道如何在每个月的第 15 天获得月平均值后对这些值进行插值,使用:
df.loc[:, ['Col1']] = df.loc[:, ['Col1']].interpolate(method='linear', limit_direction='both', limit=100)
我如何从每月的 DataFrame 到插值的每日 DataFrame,我在每个月的第 15 天之间线性插值,这是我的原始 DataFrame 通过构造的月值?
编辑:
您建议使用 np.tile() 很好,但我最终需要对多个列执行此操作。我使用了 np.tile 而不是
index = pd.date_range(start='1/1/1950', end='12/31/1999', freq='MS')
m_avg = pd.concat([month]*49,axis=0).set_index(index)
可能有更好的解决方案,但目前为止这对我的需求有效。
这是一种方法:
import pandas as pd
import numpy as np
# monthly averages, note these should be cast to float
month = np.array(['1.527013956', '1.899169054', '1.669356146',
'1.44920871', '1.188557788', '1.017035727',
'0.950243755', '1.022453993', '1.203913739',
'1.369545041', '1.441827406', '1.48621651'], dtype='float')
# expand this to 51 years, with the same monthly averages repeating each year
# (obviously not very efficient, probably there are better ways to attack the problem,
# but this was the question)
month = np.tile(month, 51)
# create DataFrame with these values
m_avg = pd.DataFrame({'Month': month})
# set the date index to the desired time period
m_avg.index = pd.date_range(start='1/1/1950', end='12/1/2000', freq='MS')
# shift the index by 14 days to get the 15th of each month
m_avg = m_avg.tshift(14, freq='D')
# expand the index to daily frequency
daily = m_avg.asfreq(freq='D')
# interpolate (linearly) the missing values
daily = daily.interpolate()
# show result
display(daily)
输出:
Month
1950-01-15 1.527014
1950-01-16 1.539019
1950-01-17 1.551024
1950-01-18 1.563029
1950-01-19 1.575034
... ...
2000-12-11 1.480298
2000-12-12 1.481778
2000-12-13 1.483257
2000-12-14 1.484737
2000-12-15 1.486217
18598 rows × 1 columns
我有兴趣获取每个月的月平均值,并将月平均值设置为每个月第 15 天的值(在每日时间序列内)。
我从以下开始(这些是我得到的每月平均值):
m_avg = pd.DataFrame({'Month': ['1.527013956', '1.899169054', '1.669356146','1.44920871', '1.188557788', '1.017035727', '0.950243755', '1.022453993', '1.203913739', '1.369545041','1.441827406','1.48621651']
编辑:我向数据框添加了一个值,现在有 12 个值。
接下来,我想将每个月的值放在以下时间段的第 15 天(每个月内):
ts = pd.date_range(start='1/1/1950', end='12/31/1999', freq='D')
我知道如何使用已经存在的每日时间序列的第 15 天提取日期:
df= df.loc[(df.index.day==15)] # Where df is any daily timeseries
最后,我知道如何在每个月的第 15 天获得月平均值后对这些值进行插值,使用:
df.loc[:, ['Col1']] = df.loc[:, ['Col1']].interpolate(method='linear', limit_direction='both', limit=100)
我如何从每月的 DataFrame 到插值的每日 DataFrame,我在每个月的第 15 天之间线性插值,这是我的原始 DataFrame 通过构造的月值?
编辑:
您建议使用 np.tile() 很好,但我最终需要对多个列执行此操作。我使用了 np.tile 而不是
index = pd.date_range(start='1/1/1950', end='12/31/1999', freq='MS')
m_avg = pd.concat([month]*49,axis=0).set_index(index)
可能有更好的解决方案,但目前为止这对我的需求有效。
这是一种方法:
import pandas as pd
import numpy as np
# monthly averages, note these should be cast to float
month = np.array(['1.527013956', '1.899169054', '1.669356146',
'1.44920871', '1.188557788', '1.017035727',
'0.950243755', '1.022453993', '1.203913739',
'1.369545041', '1.441827406', '1.48621651'], dtype='float')
# expand this to 51 years, with the same monthly averages repeating each year
# (obviously not very efficient, probably there are better ways to attack the problem,
# but this was the question)
month = np.tile(month, 51)
# create DataFrame with these values
m_avg = pd.DataFrame({'Month': month})
# set the date index to the desired time period
m_avg.index = pd.date_range(start='1/1/1950', end='12/1/2000', freq='MS')
# shift the index by 14 days to get the 15th of each month
m_avg = m_avg.tshift(14, freq='D')
# expand the index to daily frequency
daily = m_avg.asfreq(freq='D')
# interpolate (linearly) the missing values
daily = daily.interpolate()
# show result
display(daily)
输出:
Month
1950-01-15 1.527014
1950-01-16 1.539019
1950-01-17 1.551024
1950-01-18 1.563029
1950-01-19 1.575034
... ...
2000-12-11 1.480298
2000-12-12 1.481778
2000-12-13 1.483257
2000-12-14 1.484737
2000-12-15 1.486217
18598 rows × 1 columns