Dataframe 与旋转值的相关性(循环?)

Dataframe Correlation with rotating values (Loops?)

我有一个以下格式的 Dataframe,我正在尝试创建 df['New'],其中它是一个旋转值,我将使用它来计算 Alpha 和 New[= 之间的相关性13=]

Date       Alpha Bravo Charlie   New                         Correlation
2018-01-03    1     3      2       3 (from bravo column)          NaN
2018-01-04    2     6      4       6 (from bravo column)          NaN
2018-01-05    3     9      6       9 (from bravo column)          NaN
2018-01-06    4    12      8      12 (from bravo column)          NaN
2018-01-07    5    15     10      10 (from Charlie column)         X

下一个日期:

Date       Alpha Bravo Charlie   New                         Correlation
2018-01-03    1     3      2       3 (from bravo column)          NaN
2018-01-04    2     6      4       6 (from bravo column)          NaN
2018-01-05    3     9      6       9 (from bravo column)          NaN
2018-01-06    4    12      8      12 (from bravo column)          NaN
2018-01-07    5    15     10      15 (from bravo column)           X  
2018-01-08    6    18     12      12 (from Charlie column)         Y

df['Correlation'] = df['Alpha'].rolling(window=5).corr(other=df['New'])

有什么建议可以创建这个具有旋转值的新列吗? (这样我之前的相关性将保持不变,因为 X。我最后的 objective 是获取相关性列,而新列仅用于计算相关性)

换句话说,每次计算相关列时,它都会使用最新的值作为 Charlie,而将之前的所有值作为 Bravo。

另一种解释方式是始终使用 Charlie 列的最后日期和过去 4 天的 bravo 来计算与 Alpha 的相关性,如下所示:

我认为您需要先添加 NaNs,然后添加 this solutionstrides,然后得到相关矩阵:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

N = 5
a = np.concatenate([[np.nan] * (N-1), df['Bravo'].values])
b = np.concatenate([[np.nan] * (N-1), df['Alpha'].values])
a1 = rolling_window(a, N)
a2 = rolling_window(b, N)

删除 a1 的最后一列并添加 Charlie 列的值:

c = np.c_[a1[:, :-1], df['Charlie'].values[:, None]] 
print (c)
[[nan nan nan nan  2.]
 [nan nan nan  3.  4.]
 [nan nan  3.  6.  6.]
 [nan  3.  6.  9.  8.]
 [ 3.  6.  9. 12. 10.]
 [ 6.  9. 12. 15. 12.]
 [ 9. 12. 15. 18. 15.]]

创建 DataFrame 并删除前 NaNiloc:

a = pd.DataFrame(a2, index=df.index).iloc[N-1:]
b = pd.DataFrame(c, index=df.index).iloc[N-1:]
df['Correlation1'] = a.corrwith(b, axis=1)
#for improve performance
#
df['Correlation2'] = corr2_coeff_rowwise(a2, c)

print (df)
        Date  Alpha  Bravo  Charlie  Correlation1  Correlation2
0 2018-01-03      1      3        2           NaN           NaN
1 2018-01-04      2      6        4           NaN           NaN
2 2018-01-05      3      9        6           NaN           NaN
3 2018-01-06      4     12        8           NaN           NaN
4 2018-01-07      5     15       10      0.894427      0.894427
5 2018-01-08      6     18       12      0.832050      0.832050
6 2018-01-09      7     21       15      0.832050      0.832050