加权数据问题,mean 没问题,但是Covar 和std 看起来不对,我该如何调整?
Weighted data problems, mean is fine, but Covar and std look wrong, how do I adjust?
我正在尝试对数据应用加权过滤器,而不是在计算统计数据、mu、std 和 covar 之前使用原始数据。但结果显然需要调整。
# generate some data and a filter
f_n = 100.
np.random.seed(seed=101);
foo = np.random.rand(f_n,3)
foo = DataFrame(foo).add(1).pct_change()
f_filter = np.arange(f_n,.0,-1)
f_filter = 1.0 / (f_filter**(f_filter/f_n))
# nominalise the filter ... This could be where I'm going wrong?
f_filter = f_filter * (f_n / f_filter.sum())
现在我们可以查看一些结果了
print foo.mul(f_filter,axis=0).mean()
print foo.mean()
0 0.039147
1 0.039013
2 0.037598
dtype: float64
0 0.035006
1 0.042244
2 0.041956
dtype: float64
意味着看起来都在一条直线上,但是当我们查看 covar 和 std 时,它们在规模和方向上都有显着差异
print foo.mul(f_filter,axis=0).cov()
print foo.cov()
0 1 2
0 0.124766 -0.038954 0.027256
1 -0.038954 0.204269 0.056185
2 0.027256 0.056185 0.203934
0 1 2
0 0.070063 -0.014926 0.010434
1 -0.014926 0.099249 0.015573
2 0.010434 0.015573 0.087060
print foo.mul(f_filter,axis=0).std()
print foo.std()
0 0.353223
1 0.451961
2 0.451590
dtype: float64
0 0.264694
1 0.315037
2 0.295060
dtype: float64
知道为什么吗?我们如何调整过滤器或调整协变矩阵以使其更具可比性?
问题出在你的加权函数上。 (你想要高斯随机数还是统一的r.v。?)看这个图
f_n = 100.
np.random.seed(seed=101);
# ??? you want uniform random variable? or is this just a typo and you want normal random variable?
foo = np.random.rand(f_n,3)
foo = DataFrame(foo)
f_filter = np.arange(f_n,.0,-1)
# here is the problem, uneven weight makes a artificial trend, causing non-stationary. covariance only works for stationary data.
# =============================================
f_filter = 1.0 / (f_filter**(f_filter/f_n))
fig, ax = plt.subplots()
ax.plot(f_filter)
权重不均造成人为趋势(你的随机数都是正均匀),造成非平稳。协方差仅适用于固定数据。看看最终的加权数据。
我正在尝试对数据应用加权过滤器,而不是在计算统计数据、mu、std 和 covar 之前使用原始数据。但结果显然需要调整。
# generate some data and a filter
f_n = 100.
np.random.seed(seed=101);
foo = np.random.rand(f_n,3)
foo = DataFrame(foo).add(1).pct_change()
f_filter = np.arange(f_n,.0,-1)
f_filter = 1.0 / (f_filter**(f_filter/f_n))
# nominalise the filter ... This could be where I'm going wrong?
f_filter = f_filter * (f_n / f_filter.sum())
现在我们可以查看一些结果了
print foo.mul(f_filter,axis=0).mean()
print foo.mean()
0 0.039147
1 0.039013
2 0.037598
dtype: float64
0 0.035006
1 0.042244
2 0.041956
dtype: float64
意味着看起来都在一条直线上,但是当我们查看 covar 和 std 时,它们在规模和方向上都有显着差异
print foo.mul(f_filter,axis=0).cov()
print foo.cov()
0 1 2
0 0.124766 -0.038954 0.027256
1 -0.038954 0.204269 0.056185
2 0.027256 0.056185 0.203934
0 1 2
0 0.070063 -0.014926 0.010434
1 -0.014926 0.099249 0.015573
2 0.010434 0.015573 0.087060
print foo.mul(f_filter,axis=0).std()
print foo.std()
0 0.353223
1 0.451961
2 0.451590
dtype: float64
0 0.264694
1 0.315037
2 0.295060
dtype: float64
知道为什么吗?我们如何调整过滤器或调整协变矩阵以使其更具可比性?
问题出在你的加权函数上。 (你想要高斯随机数还是统一的r.v。?)看这个图
f_n = 100.
np.random.seed(seed=101);
# ??? you want uniform random variable? or is this just a typo and you want normal random variable?
foo = np.random.rand(f_n,3)
foo = DataFrame(foo)
f_filter = np.arange(f_n,.0,-1)
# here is the problem, uneven weight makes a artificial trend, causing non-stationary. covariance only works for stationary data.
# =============================================
f_filter = 1.0 / (f_filter**(f_filter/f_n))
fig, ax = plt.subplots()
ax.plot(f_filter)
权重不均造成人为趋势(你的随机数都是正均匀),造成非平稳。协方差仅适用于固定数据。看看最终的加权数据。