加权数据问题,mean 没问题,但是Covar 和std 看起来不对,我该如何调整?

Weighted data problems, mean is fine, but Covar and std look wrong, how do I adjust?

我正在尝试对数据应用加权过滤器,而不是在计算统计数据、mu、std 和 covar 之前使用原始数据。但结果显然需要调整。

# generate some data and a filter
f_n = 100.
np.random.seed(seed=101); 
foo = np.random.rand(f_n,3)
foo = DataFrame(foo).add(1).pct_change()
f_filter = np.arange(f_n,.0,-1)
f_filter = 1.0 / (f_filter**(f_filter/f_n))
# nominalise the filter ... This could be where I'm going wrong?
f_filter = f_filter * (f_n / f_filter.sum())

现在我们可以查看一些结果了

print foo.mul(f_filter,axis=0).mean()
print foo.mean()

0    0.039147
1    0.039013
2    0.037598
dtype: float64
0    0.035006
1    0.042244
2    0.041956
dtype: float64

意味着看起来都在一条直线上,但是当我们查看 covar 和 std 时,它们在规模和方向上都有显着差异

print foo.mul(f_filter,axis=0).cov()
print foo.cov()

          0         1         2
0  0.124766 -0.038954  0.027256
1 -0.038954  0.204269  0.056185
2  0.027256  0.056185  0.203934

      0         1         2
0  0.070063 -0.014926  0.010434
1 -0.014926  0.099249  0.015573
2  0.010434  0.015573  0.087060

print foo.mul(f_filter,axis=0).std()
print foo.std()

0    0.353223
1    0.451961
2    0.451590
dtype: float64
0    0.264694
1    0.315037
2    0.295060
dtype: float64

知道为什么吗?我们如何调整过滤器或调整协变矩阵以使其更具可比性?

问题出在你的加权函数上。 (你想要高斯随机数还是统一的r.v。?)看这个图

f_n = 100.
np.random.seed(seed=101); 
# ??? you want uniform random variable? or is this just a typo and you want normal random variable?
foo = np.random.rand(f_n,3)
foo = DataFrame(foo)
f_filter = np.arange(f_n,.0,-1)

# here is the problem, uneven weight makes a artificial trend, causing non-stationary. covariance only works for stationary data.
# =============================================
f_filter = 1.0 / (f_filter**(f_filter/f_n))

fig, ax = plt.subplots()
ax.plot(f_filter)

权重不均造成人为趋势(你的随机数都是正均匀),造成非平稳。协方差仅适用于固定数据。看看最终的加权数据。